substrate Append overlay refactor proposal

This branch propose to avoid clones in append by storing offset and size in previous overlay depth. That way on rollback we can just truncate and change size of existing value. To avoid copy it also means that :

append on new overlay layer if there is an existing value: create a new Append entry with previous offsets, and take memory of previous overlay value.
rollback on append: restore value by applying offsets and put it back in previous overlay value
commit on append: appended value overwrite previous value (is an empty vec as the memory was taken). offsets of commited layer are dropped, if there is offset in previous overlay layer they are maintained.
set value (or remove) when append offsets are present: current appended value is moved back to previous overlay value with offset applied and current empty entry is overwrite (no offsets kept).

The modify mechanism is not needed anymore. This branch lacks testing and break some existing genericity (bit of duplicated code), but good to have to check direction.

Generally I am not sure if it is worth or we just should favor differents directions (transients blob storage for instance), as the current append mechanism is a bit tricky (having a variable length in first position means we sometime need to insert in front of a vector).

Apr 18 '23 10:04 cheme

Please don't review yet (will change design a bit to be closest to https://github.com/paritytech/polkadot-sdk/issues/30)

Apr 25 '23 07:04 cheme

Generally I am not sure if it is worth or we just should favor differents directions (transients blob storage for instance), as the current append mechanism is a bit tricky (having a variable length in first position means we sometime need to insert in front of a vector).

So this applies to all appendable storage structus? Not sure if it is worth it, but could we selectively add this with to a StorageValue with an append_only attribute?
AFAIK we only need this fast truncate on append-only vectors like System::Events, but the advantage is that they are never read.

Otherwise we should probably add a test to check that this solution has a low memory footprint on deeply nested reverts (as expected).

Apr 25 '23 17:04 ggwpez

So this applies to all appendable storage structus?

yes all storage item containing a value. But as I did comment a bit in the code, using a same key value with sp_io::set and later with sp_io::append is really bad and no sensible runtime should allow it. (if you got a value that is encoded Vec of length 3 so starting with a compact, call to append with a 4 byte encoded vec will change the current byte length to 4 and append the content, then reading will get nothing sensible.

Not sure if it is worth it, but could we selectively add this with to a StorageValue with an append_only attribute?

So yes in practice it could be in a different key space, but that is basically what transient storage does (need to copy final value in some finalize to a standard state value though). Alternatively (or extending the transient storage) a child trie/state could be exclusive to append value.

AFAIK we only need this fast truncate on append-only vectors like System::Events, but the advantage is that they are never read.

yes, but we cannot ensure it. In my next change, the storage will make this usecase better (with https://github.com/paritytech/polkadot-sdk/issues/30, if only writes followed by a single read (root calculation), then we will not have anymore the costy resize each time the number of element Compact change in size (or a single one)).

Apr 26 '23 07:04 cheme

I think I did align with https://github.com/paritytech/polkadot-sdk/issues/30 , should be reviewable (requires more test, especially since most test access on every transaction change and thus render the data).

Apr 26 '23 14:04 cheme

If this works out, then we maybe dont need https://github.com/paritytech/substrate/pull/14120

May 16 '23 16:05 ggwpez

If this works out, then we maybe dont need paritytech/substrate#14120

I really think storing big value is a mistake so at some point paged list really make sense. (ability to produce way smaller proofs at small cost)

May 16 '23 16:05 cheme

If this works out, then we maybe dont need paritytech/substrate#14120

I really think storing big value is a mistake so at some point paged list really make sense. (ability to produce way smaller proofs at small cost)

Ah right. So it depends on the use-case. For System::Events, we can just use your MR. But for para-chain cases we maybe rather want to use the paged list.

May 16 '23 17:05 ggwpez

For events we should use transient storage (only write root of an event trie in state and still index data in an external db), but this is probably not here soon, so yes short term this is more pragmatic.

May 16 '23 17:05 cheme

I went through a bit of fuzzing and fix a few issue, think this would be good to review now.

May 17 '23 09:05 cheme

The CI pipeline was cancelled due to failure one of the required jobs. Job name: test-linux-stable-int Logs: https://gitlab.parity.io/parity/mirrors/substrate/-/jobs/2850187

May 17 '23 09:05 paritytech-cicd-pr