MMR Leaf Data: off-by-one error in beefy_next_authority_set
A given Leaf with MMR 0-based leaf index N-1 is constructed and added to MMR during construction of block N here.
The Leaf data/contents can be seen populated here and it consists of:
// contents for leaf index <N-1> added by block <N>
MmrLeaf {
version: <leaf-data-format-version>,
(
parent_num: <N-1>,
parent_hash: <hash_of_<N-1>>,
),
extra_data: <para_heads_of_<N-1>>,
next_auth_set: <next_auth_set_of<N>>,
}
While this is not a show-stopper - (light)clients can simply handle this in their code - it introduces cognitive complexity and code logic corner cases.
We could avoid this and make it easier for users by changing Leaf<N-1> contents to:
// contents for leaf index <N-1> added by block <N>
MmrLeaf {
version: <leaf-data-format-version>,
(
parent_num: <N-1>,
parent_hash: <hash_of_<N-1>>,
),
extra_data: <para_heads_of_<N-1>>,
- next_auth_set: <next_auth_set_of<N>>,
+ next_auth_set: <next_auth_set_of<N-1>>,
}
AFAICT an easy fix is to move Mmr: pallet_mmr before pallet_session so that it builds its leaf before session is rotated.
But I would prefer a fix contained in pallet_mmr itself since above is prone to developer error when adding pallet_mmr to their runtime.
AFAICT an easy fix is to move Mmr: pallet_mmr before pallet_session so that it builds its leaf before session is rotated.
So since session is rotated prior to leaf construction currently, next_auth_set: <next_auth_set_of<N>> (i.e. Pallet::<T>::beefy_next_authorities()) can be distinct on forks of depth 1?
We haven't really been affected by this off-by-one issue while developing our light client (or we just worked around it).
The proposed change may affect how our light client detects authority handovers. But I have a feeling there shouldn't be any impact. If there is we can update our code.
Some background on our light client: It receives updates like the following at the start of every beefy session (Assume a session length of 10 blocks). Each update contains the beefy commitment for block N (first block of session) and the MMR leaf for block N-1 (last block of the previous session).
Update:
BeefyCommitment(block=11) { validatorSetId: 1, ... }
MmrLeaf(block=10) { nextValidatorSetId: 2, ...}
Update:
BeefyCommitment(block=21) { validatorSetId: 2, ... }
MmrLeaf(block=20) { nextvalidatorSetId: 3, ... }
Update:
BeefyCommitment(block=31) { validatorSetId: 3, ... }
MmrLeaf(block=30) { nextvalidatorSetId: 4, ... }
The light client knows it can perform an authority handover when the latest update contains a leaf.nextValidatorSetId that is greater than that of the previously observed value (See code here).
AFAICT an easy fix is to move Mmr: pallet_mmr before pallet_session so that it builds its leaf before session is rotated.
So since session is rotated prior to leaf construction currently,
next_auth_set: <next_auth_set_of<N>>(i.e.Pallet::<T>::beefy_next_authorities()) can be distinct on forks of depth 1?
There is no issue for forks in either option (current behavior & suggested change) because the validator set is identical for blocks at height N (or N+1, ..., N+m) across all forks.
The issue is more about usability in the sense that the BEEFY payload for block N is a MMR root where latest latest leaf is a set of primitives coming from a mix of blocks N-1 and N; it could be simplified to:
(block numbers are 1-indexed whereas leaf indexes are 0-indexed)
BEEFY payload for block N contains MMR root built from leaves indexed L in 0..=N-1, with each Leaf index L containing primitives coming exclusively from block number L.
At block N, the latest leaf L=N-1 contains primitives values pertaining to block N-1 (instead of some from N-1 and some from N).
Marking this as "Some day maybe" since existing clients already successfully handle current behavior.
actually have encountered this issue and i created this pr as a fix: https://github.com/paritytech/substrate/pull/10907
actually have encountered this issue and i created this pr as a fix: #10907
this is actually something different - by building leaf on_finalize() you actually get:
// contents for leaf index <N-1> added by block <N>
MmrLeaf {
version: <leaf-data-format-version>,
(
parent_num: <N-1>,
parent_hash: <hash_of_<N-1>>,
),
- extra_data: <para_heads_of_<N-1>>,
+ extra_data: <para_heads_of_<N>>,
next_auth_set: <next_auth_set_of<N>>,
}
and at this point you also run into trouble with 1-block deep forks writing different leaf contents to offchain db for same (parent, pos) key.
The offchain-db thing can be fixed by also doing https://github.com/paritytech/substrate/issues/11799.
@seunlanlege Is the mixed layout of MmrLeaf above really better?
Isn't light-client code simpler if it simply references leaf index N when interested in extra_data of block N?
(i.e. for N=5: leaf index 5 is the leaf added by block 6 containing extra_data of block 5)
Isn't light-client code simpler if it simply references leaf index N when interested in extra_data of block N?
So you're correct that this makes sense, but only from a leaf index perspective.
But light clients won't be tracking leaf indexes, they'll be tracking block heights.
And it's a bit wonky from that perspective that a leaf emitted a block N contains data about N-1 especially when there's no clear advantage to having it this way.
Whereas it's easier to reason that a leaf emitted at block N contains information about block N.
Also the leaves in this N-1 will miss the authority set change. So you'll need a leaf N+1 where N is the authority set change height to know exactly the beginning of a new epoch. Which is just cognitive overhead.
Aha, that makes sense. So to summarize, you're saying for a light client, the best experience to interact with pallet-mmr is if it exposed leafs with following format:
// contents for leaf added by block <N>
MmrLeaf {
version: <leaf-data-format-version>,
(
parent_num: <N-1>,
parent_hash: <hash_of_block<N-1>>,
),
extra_data: <para_heads_at_block<N>>,
next_auth_set: <next_auth_set_at_block<N>>,
}
right?
Aha, that makes sense. So to summarize, you're saying for a light client, the best experience to interact with pallet-mmr is if it exposed leafs with following format:
yes correct.
I'm happy with this change too. It makes sense for me and would allow us to improve some of the code in our off-chain relayers. As the relationship between block-height, leaf-index, and the desired leaf.paras-root was previously a bit confusing.
+1
Aha, that makes sense. So to summarize, you're saying for a light client, the best experience to interact with
pallet-mmris if it exposed leafs with following format:// contents for leaf added by block <N> MmrLeaf { version: <leaf-data-format-version>, ( parent_num: <N-1>, parent_hash: <hash_of_block<N-1>>, ), extra_data: <para_heads_at_block<N>>, next_auth_set: <next_auth_set_at_block<N>>, }
I'm sorry @vgeddes @seunlanlege but this doesn't seem to be possible :cry: see https://github.com/paritytech/substrate/pull/12446#issuecomment-1279011413
Closing for now, please reopen if you have any other ideas.
We decided we want to do at least this (since https://github.com/paritytech/substrate/pull/12446 isn't possible):
// contents for leaf index <N-1> added by block <N>
MmrLeaf {
version: <leaf-data-format-version>,
(
parent_num: <N-1>,
parent_hash: <hash_of_<N-1>>,
),
extra_data: <para_heads_of_<N-1>>,
- next_auth_set: <next_auth_set_of<N>>,
+ next_auth_set: <next_auth_set_of<N-1>>,
}
which could be as easy as: https://github.com/paritytech/substrate/issues/11797#issuecomment-1177485479
We decided we want to do at least this (since #12446 isn't possible):
// contents for leaf index <N-1> added by block <N> MmrLeaf { version: <leaf-data-format-version>, ( parent_num: <N-1>, parent_hash: <hash_of_<N-1>>, ), extra_data: <para_heads_of_<N-1>>, - next_auth_set: <next_auth_set_of<N>>, + next_auth_set: <next_auth_set_of<N-1>>, }which could be as easy as: #11797 (comment)
I can confirm this works:
- change (only moving
pallet_mmrahead ofpallet_sessionin runtime enum) in https://github.com/Lederstrumpf/polkadot/tree/beefy-mmr-leaf-off-by-one, with log bumps for debugging in https://github.com/Lederstrumpf/substrate/tree/beefy-mmr-leaf-off-by-one, - logs showing that it's taking
next_auth_set_of<N-1>: https://hackmd.io/Ve-FPPAJSdeGNc3rc-FmrA
So this would make the leaf content consistent wrt. block numbering.
However, looking at Adrian's PR #12446 again, the reason for closing it was that offchain worker is not guaranteed to run on every block, but since #12753 moved the offchain storage handling to client-side gadget, there's nothing blocking doing next_auth_set_of<N> as implemented in #12446, if I'm not mistaken?
Otherwise, I'll open up PR for next_auth_set_of<N-1>.
but since https://github.com/paritytech/substrate/pull/12753 moved the offchain storage handling to client-side gadget, there's nothing blocking doing next_auth_set_of<N> as implemented in https://github.com/paritytech/substrate/pull/12446, if I'm not mistaken?
it could be done at the expense of extra runtime storage - keep non-canon full data in runtime storage instead of offchain storage and "back it up" to offchain on client-side finality, but then you still have problems when finality lags and aggressive pruning is configured; and obviously increased storage costs for the chain.
I think the wise path is to take the slightly increased code complexity on light-client side and use <N-1>, and thus avoid all the extra runtime storage required to work directly on <N>.
Otherwise, I'll open up PR for
next_auth_set_of<N-1>.
I think this is the right call.
@vgeddes @seunlanlege just to confirm there is no difference in light-client "runtime costs" between:
MmrLeaf @ block N {
version: <leaf-data-format-version>,
(
parent_num: <N-1>,
parent_hash: <hash_of_<N-1>>,
),
extra_data: <para_heads_of_<N-1>>,
next_auth_set: <next_auth_set_of<N-1>>,
}
and
MmrLeaf @ block N {
version: <leaf-data-format-version>,
(
parent_num: <N-1>,
parent_hash: <hash_of_<N-1>>,
),
extra_data: <para_heads_of_<N>>, <---- diff here
next_auth_set: <next_auth_set_of<N>>, <---- diff here
}
@vgeddes @seunlanlege just to confirm there is no difference in light-client "runtime costs" between:
Yes correct. The light client only needs to be aware of a single header, rather than 2.
Should not be a problem for our light client, I think.
Fixed in https://github.com/paritytech/polkadot/pull/6577