substrate pallet-mmr: prune stale offchain db entries

MMR nodes are saved in off-chain db under fork-resistant keys based on frame_system::block_hash(), as such only nodes added by latest frame_system::BlockHashCount can be accessed that way.

Nodes added by blocks older than frame_system::BlockHashCount are moved by offchain_worker to "canon" (not fork resistant) location - thus being made available regardless of age. See https://github.com/paritytech/substrate/pull/11594 for more details.

PR https://github.com/paritytech/substrate/pull/12498 fixed the issue that offchain_worker doesn't run for blocks imported during initial sync. So for initial syncs > frame_system::BlockHashCount (practically any node syncing from scratch), MMR nodes aren't moved to canon position and access to them is lost. The fix was that runtime block execution saves node to both fork-resistant and canon locations:

initial sync nodes directly end up in the right location,
for the rest, we do a superfluous offchain_index::set() because it will be overwritten by offchain_worker with canon values anyway.

This results in working system, but with bloated offchain-db where MMR nodes added during initial sync are taking 2x storage than they should (they are duplicated in two locations in the offchain-db).

Oct 17 '22 09:10 acatangiu

Possible solution outlined here https://github.com/paritytech/substrate/pull/12498#discussion_r996677117:

change fork-resistant key to (prefix, pos, parent_hash) and hope (I need to verify) there is functionality to get all entries matching (prefix, pos) partial key,
keep offchain db entry last_pruned_leaf_number
in offchain worker attempt to prune (last_pruned_leaf_number, current - window] rather than just [current - window].
we can even do this in batches; put a limit on how many to clean up in each offchain_worker invocation so that we spread the load over time.

Oct 17 '22 09:10 acatangiu

Unfortunately, OFFCHAIN column uses default hash (and not btree) structure, meaning "key prefix" cannot be used for retrieving items :cry:

Oct 17 '22 09:10 acatangiu

I think we need to create a client-side gadget, and move all the offchain stuff there.

PROs:

pallet-mmr is simplified, only cares about runtime state/storage,
no more offchain_worker canonicalization required; it is simplified and moved to client-side code/gadget,
we can actually follow finality and "properly" canonicalize offchain entries,
simplifies runtime api (generating proofs currently works only when --indexing-api is enabled) - generating proofs is handled by client-side gadget.

CONs:

extra gadget deployed client-side - pallet can maintain pruned MMR without the gadget, but cannot generate proofs without it.

@andresilva @svyatonik WDYT?

Oct 17 '22 10:10 acatangiu

I can't say much - @andresilva was there when it was designed that way (offchain worker) - maybe there's some strong reasoning behind that. But do you want to keep using offfchain storage for storing MMR nodes or regular DB?

Oct 19 '22 06:10 svyatonik

But do you want to keep using offfchain storage for storing MMR nodes or regular DB?

We would still use offchain column in the DB so that runtime can still add to it through indexing-api. That way, building the MMR still happens on-chain, but canonicalization and pruning of non-canon forks would be done by client-side gadget and could be driven by GRANDPA finality.

Also generating proofs RPC would not change, but instead of calling runtime API it would call some gadget API.

Oct 19 '22 08:10 acatangiu

I have talked with @andresilva and we're in agreement that client-side gadget to handle all off-chain duties is the best way forward.

Nov 07 '22 09:11 acatangiu