Prototype and test transaction log based approach to the state store
Background
See related issues:
- https://github.com/gchq/sleeper/issues/1355
- https://github.com/gchq/sleeper/issues/1700
Description
See design document.
We'd like to know whether the transaction log approach to the state store can improve on the speed and contention problems we've had for updates to the S3 state store.
Analysis
A prototype version of this could just support adding files to the state store and performing compaction on those files. We could just support the ParallelCompactionsIT system test, and see how far we can increase the parallelism of compactions.
We've split out several issues:
- https://github.com/gchq/sleeper/issues/2187
- https://github.com/gchq/sleeper/issues/2120
- https://github.com/gchq/sleeper/issues/2192
- https://github.com/gchq/sleeper/issues/2171
- https://github.com/gchq/sleeper/issues/2204
- https://github.com/gchq/sleeper/issues/2198
- https://github.com/gchq/sleeper/issues/2122
- https://github.com/gchq/sleeper/issues/2121
- https://github.com/gchq/sleeper/issues/2316
- https://github.com/gchq/sleeper/issues/2275
I'll make some comments here - can we update the description above based on these comments until we've iterated to consensus?
- "This should mean quicker updates even compared to the DynamoDB state store, since we only need to save one item per transaction." - This is speculation. It's not clear that there will by any significant performance difference between applying a single update and multiple items.
- We need to note that a DynamoDB item can have a maximum size of 400KB. It's unlikely a single transaction will exceed that but we'd have to guard against it, probably by having a flag which says whether the transaction is in DynamoDB entry or has been written as a blob to S3 as it was too large.
- "We use a conditional check to refuse the update if there's already a transaction with that number. We then need to retry if we're out of date." The description needs to emphasise that there are still potential concurrency issues with this approach, and it is not clear whether the transaction log approach will reduce contention issues by a few percent relative to the S3 state store (in which case the transaction log approach doesn't solve the problem), or eliminate them completely. The parent issue https://github.com/gchq/sleeper/issues/1355 notes that we will need a system test to help evaluate whether any new approach has actually solved the problem.
- "This retry is comparable to an update in the S3 state store, but each change is much quicker to apply because you don't need to store the whole state." Again this is speculation. It is a smaller update and so should be quicker but whether it's 5% quicker or 5x quicker isn't clear. And now reading the current state is more expensive than it is in the S3 state store as you need to read a snapshot from S3 and then query DynamoDB for the latest transactions, so the read side of committing to the state store is slower than it is in the S3 state store.
- "With a relational database, larger transactions involve locking many records. This would pose similar problems to DynamoDB in terms of limiting atomicity of updates, as we would be heavily incentivised to keep transactions small." - I'm not sure this is true. I can't find any limit on the number of items within a transaction against say PostgreSQL and without any evidence that the performance costs of a large update is severe, we can't say that "we would be heavily incentivised to keep transactions small".
- "This loses some of the promise of Sleeper in terms of serverless deployment, and only running when something needs to happen." Aurora Serverless v2 supports automatic scaling up and down between minimum and maximum limits. If you know Sleeper will be idle for a while you can stop the database and then only be charged for the storage. We already have a concept of pausing Sleeper so that the periodic lambdas don't run. With Aurora Serverless this wouldn't be too much different.
Closing as we're releasing the first version of the transaction log state store in this release.