teku icon indicating copy to clipboard operation
teku copied to clipboard

DepositTreeSnapshot database persistence

Open zilm13 opened this issue 3 years ago • 6 comments

PR Description

Stores DepositTreeSnapshot to DB (always) and loads it on Teku start if --Xdeposit-snapshot-storage-enabled flag is toggled Part №3 of #5432

Fixed Issue(s)

Documentation

  • [x] I thought about documentation and added the doc-change-required label to this PR if updates are required.

Changelog

  • [x] I thought about adding a changelog entry, and added one if I deemed necessary.

zilm13 avatar Sep 12 '22 19:09 zilm13

I'm a bit confused with finalized/hot parts and complicated StoreTransaction flow, so I could be (should be) wrong there. Also shouldn't we add !optimisticTransitionBlockRootSet here? https://github.com/ConsenSys/teku/blob/4f181937d0552571c6d0bd38c5a8f94aa6790dfd/storage/api/src/main/java/tech/pegasys/teku/storage/api/StorageUpdate.java#L68-L76 Otherwise I think this PR is ready for review

PS: btw, full deposits replay took just 1 second for me on Goerli, was it that fast before? Did I miss some changes in replay?

zilm13 avatar Sep 14 '22 20:09 zilm13

And yes we probably should be checking that flag in StorageUpdate - I've fixed it in #6202

Deposit replay is fast, but will keep getting slower as more and more deposits get added and it's more stuff to store in the database that we don't really need. The main aim of these stories though will be to ship a deposit snapshot with Teku to avoid having to load most of the deposits from the chain on first startup.

ajsutton avatar Sep 15 '22 01:09 ajsutton

@ajsutton I understand the purpose, but I made this PR assuming that it could be toggled off or on when needed with a flag. The next step could be removing the flag (and toggling it always on) and so on removing deposits from the storage. Until this flag could be turned off or on on any restart, if we remove the deposits now and user toggles snapshot flag back to off, he will start without snapshot and deposits in DB. Also there is no sense to remove pre-snapshot deposits as we load deposits from EL, not DB, if snapshot is used. So we could remove them all when the feature is always on.

Another question is a way of obtaining StorageQueryChannel in PowChainService. It's already complicated in 2 parts of the code and I've added the 3rd place with similar code. I'm not sure what is the best way to fix it. I guess our intention is to switch fully to async soon, so maybe it's not a big deal and we could later use just CombinedStorageChannel in all palces

zilm13 avatar Sep 20 '22 20:09 zilm13

Sorry for the delay in getting back to this.

I understand the purpose, but I made this PR assuming that it could be toggled off or on when needed with a flag. The next step could be removing the flag (and toggling it always on) and so on removing deposits from the storage. Until this flag could be turned off or on on any restart, if we remove the deposits now and user toggles snapshot flag back to off, he will start without snapshot and deposits in DB. Also there is no sense to remove pre-snapshot deposits as we load deposits from EL, not DB, if snapshot is used. So we could remove them all when the feature is always on.

The aim of feature toggles is to let us test the new feature and get it fully working before we enable it for everyone. So I'm not worried if you can't reliably turn it back off once you've enabled it - it's more important to fully test the new behaviour before we enable it. So I think we want to add in the deletion so we can see the full results of the feature.

Another question is a way of obtaining StorageQueryChannel in PowChainService. It's already complicated in 2 parts of the code and I've added the 3rd place with similar code. I'm not sure what is the best way to fix it. I guess our intention is to switch fully to async soon, so maybe it's not a big deal and we could later use just CombinedStorageChannel in all palces

Yeah I wouldn't worry about this, I think we can probably remove the feature toggle for async storage pretty soon given it's been in use successfully for a while now.

ajsutton avatar Oct 04 '22 22:10 ajsutton

Also there is no sense to remove pre-snapshot deposits as we load deposits from EL, not DB, if snapshot is used. So we could remove them all when the feature is always on.

Missed this bit. Not storing any deposits and just loading from the EL is probably ok. Would be something to test and see what the size of requests to the EL are in practice - people often have issues with the EL timing out when searching for deposit logs but hopefully the snapshot is usually recent enough that it won't be an issue or we can reduce the default number of blocks we cover in each request to lower the load since we don't have to scan the full chain anymore (or at least won't have to once we start including a snapshot in releases for known networks).

ajsutton avatar Oct 04 '22 22:10 ajsutton

Removing deposits finished, need to test somehow

zilm13 avatar Oct 06 '22 15:10 zilm13

Finished on this. Tested in real network, all works fine after the fix. Even if we remove flag --Xdeposit-snapshot-storage-enabled after usage, client restores all deposit events info successfully

  • Moved "Eth1Data height not found in cache. Skipping DepositTree pruning" from warn to debug level, it's spamming a lot and it's not critical, especially considering that this snapshot could be already stored, just we've moved forward in the blockchain and height data was already pruned
  • Added some other logging, almost everything in debug, only pruning of deposit events (1 time in life) to INFO level. Not sure on this. I'd rather add some info at least on loading/not loading snapshot from storage but looks like we prefer minimum noise on INFO level, so I've not increased it

zilm13 avatar Nov 04 '22 14:11 zilm13