substrate icon indicating copy to clipboard operation
substrate copied to clipboard

Allow longer state pruning history

Open arkpar opened this issue 3 years ago • 7 comments

Currently when state pruning is enabled state-db reqeuires O(n) memory w.r.t. the number of blocks pending pruning. Therefore long pruning history requires a lot of memory. It should be possible to get rid of the memory requirement using the reference counting feature of parity-db.

This basically means removing death_rows and death_index fields in RefWindow struct. Journals that are currently kept in death_rows should be loaded on demand from the database, although I'd keep an in-memory cache for pruning window size <= 256. death_index should be replace with reference counting on the DB level.

arkpar avatar Jul 25 '22 16:07 arkpar

So that first requires that we get rid off Rocksdb?

bkchr avatar Jul 25 '22 17:07 bkchr

@bkchr Not necessarily. There's a compatibility layer that adds reference counting support for rocksdb as well, albeit inefficiently.

arkpar avatar Jul 25 '22 17:07 arkpar

I'd keep an in-memory cache for pruning window size <= 256. death_index should be replace with reference counting on the DB level.

Or perhaps the latest (up to) 256 blocks in in-memory cache and the rest loaded on demand from db?

rvalle avatar Jul 25 '22 20:07 rvalle

death_rows is not actual block data, but a journal. A list of keys that would need to be removed from the database when the block is purged. There's no point in keeping recent journals in memory. When you insert block N and need to prune block N - 10000000 which was inserted months ago, the journal won't be in the cache. Only with small pruning window can you keep journals in memory and avoid a database query.

arkpar avatar Jul 26 '22 05:07 arkpar

Hi @arkpar , looks like RefWindow already utilized the reference counting feature of parity-db by setting RefWindow::count_insertions to false, which can eliminate the memory needed by RefWindow::death_index (also the disk space of JournalRecord::inserted)

But DeathRow::deleted may still be required, because in parity-db the reference counter is placed alongside with the referencing kv pair, and upon pruning we still need a way to keep track of which kv needs to be deleted before we can actually delete them on the backend db.

NingLin-P avatar Aug 01 '22 21:08 NingLin-P

Hi @arkpar , looks like RefWindow already utilized the reference counting feature of parity-db by setting RefWindow::count_insertions to false, which can eliminate the memory needed by RefWindow::death_index (also the disk space of JournalRecord::inserted)

Right, this part is already implemented :)

But DeathRow::deleted may still be required, because in parity-db the reference counter is placed alongside with the referencing kv pair, and upon pruning we still need a way to keep track of which kv needs to be deleted before we can actually delete them on the backend db.

It is required indeed, but we don't need to keep it in memory. On each block import a journal record is written to the database here which contains a list of deleted keys. When it is time to prune a block we can get that record from the database.

arkpar avatar Aug 02 '22 08:08 arkpar

It is required indeed, but we don't need to keep it in memory. On each block import a journal record is written to the database here which contains a list of deleted keys. When it is time to prune a block we can get that record from the database.

Right, I would like to have a try.

NingLin-P avatar Aug 02 '22 08:08 NingLin-P