DepositTreeSnapshot database persistence
PR Description
Stores DepositTreeSnapshot to DB (always) and loads it on Teku start if --Xdeposit-snapshot-storage-enabled flag is toggled
Part №3 of #5432
Fixed Issue(s)
Documentation
- [x] I thought about documentation and added the
doc-change-requiredlabel to this PR if updates are required.
Changelog
- [x] I thought about adding a changelog entry, and added one if I deemed necessary.
I'm a bit confused with finalized/hot parts and complicated StoreTransaction flow, so I could be (should be) wrong there. Also shouldn't we add !optimisticTransitionBlockRootSet here? https://github.com/ConsenSys/teku/blob/4f181937d0552571c6d0bd38c5a8f94aa6790dfd/storage/api/src/main/java/tech/pegasys/teku/storage/api/StorageUpdate.java#L68-L76
Otherwise I think this PR is ready for review
PS: btw, full deposits replay took just 1 second for me on Goerli, was it that fast before? Did I miss some changes in replay?
And yes we probably should be checking that flag in StorageUpdate - I've fixed it in #6202
Deposit replay is fast, but will keep getting slower as more and more deposits get added and it's more stuff to store in the database that we don't really need. The main aim of these stories though will be to ship a deposit snapshot with Teku to avoid having to load most of the deposits from the chain on first startup.
@ajsutton I understand the purpose, but I made this PR assuming that it could be toggled off or on when needed with a flag. The next step could be removing the flag (and toggling it always on) and so on removing deposits from the storage. Until this flag could be turned off or on on any restart, if we remove the deposits now and user toggles snapshot flag back to off, he will start without snapshot and deposits in DB. Also there is no sense to remove pre-snapshot deposits as we load deposits from EL, not DB, if snapshot is used. So we could remove them all when the feature is always on.
Another question is a way of obtaining StorageQueryChannel in PowChainService. It's already complicated in 2 parts of the code and I've added the 3rd place with similar code. I'm not sure what is the best way to fix it. I guess our intention is to switch fully to async soon, so maybe it's not a big deal and we could later use just CombinedStorageChannel in all palces
Sorry for the delay in getting back to this.
I understand the purpose, but I made this PR assuming that it could be toggled off or on when needed with a flag. The next step could be removing the flag (and toggling it always on) and so on removing deposits from the storage. Until this flag could be turned off or on on any restart, if we remove the deposits now and user toggles snapshot flag back to off, he will start without snapshot and deposits in DB. Also there is no sense to remove pre-snapshot deposits as we load deposits from EL, not DB, if snapshot is used. So we could remove them all when the feature is always on.
The aim of feature toggles is to let us test the new feature and get it fully working before we enable it for everyone. So I'm not worried if you can't reliably turn it back off once you've enabled it - it's more important to fully test the new behaviour before we enable it. So I think we want to add in the deletion so we can see the full results of the feature.
Another question is a way of obtaining StorageQueryChannel in PowChainService. It's already complicated in 2 parts of the code and I've added the 3rd place with similar code. I'm not sure what is the best way to fix it. I guess our intention is to switch fully to async soon, so maybe it's not a big deal and we could later use just CombinedStorageChannel in all palces
Yeah I wouldn't worry about this, I think we can probably remove the feature toggle for async storage pretty soon given it's been in use successfully for a while now.
Also there is no sense to remove pre-snapshot deposits as we load deposits from EL, not DB, if snapshot is used. So we could remove them all when the feature is always on.
Missed this bit. Not storing any deposits and just loading from the EL is probably ok. Would be something to test and see what the size of requests to the EL are in practice - people often have issues with the EL timing out when searching for deposit logs but hopefully the snapshot is usually recent enough that it won't be an issue or we can reduce the default number of blocks we cover in each request to lower the load since we don't have to scan the full chain anymore (or at least won't have to once we start including a snapshot in releases for known networks).
Removing deposits finished, need to test somehow
Finished on this. Tested in real network, all works fine after the fix. Even if we remove flag --Xdeposit-snapshot-storage-enabled after usage, client restores all deposit events info successfully
- Moved "Eth1Data height not found in cache. Skipping DepositTree pruning" from warn to debug level, it's spamming a lot and it's not critical, especially considering that this snapshot could be already stored, just we've moved forward in the blockchain and height data was already pruned
- Added some other logging, almost everything in debug, only pruning of deposit events (1 time in life) to INFO level. Not sure on this. I'd rather add some info at least on loading/not loading snapshot from storage but looks like we prefer minimum noise on INFO level, so I've not increased it