substrate icon indicating copy to clipboard operation
substrate copied to clipboard

MMR Leaf Data: off-by-one error in beefy_next_authority_set

Open acatangiu opened this issue 3 years ago • 10 comments

A given Leaf with MMR 0-based leaf index N-1 is constructed and added to MMR during construction of block N here.

The Leaf data/contents can be seen populated here and it consists of:

// contents for leaf index <N-1> added by block <N>
MmrLeaf {
    version: <leaf-data-format-version>,
    (
        parent_num: <N-1>,
        parent_hash: <hash_of_<N-1>>,
    ),
    extra_data: <para_heads_of_<N-1>>,
    next_auth_set: <next_auth_set_of<N>>,
}

While this is not a show-stopper - (light)clients can simply handle this in their code - it introduces cognitive complexity and code logic corner cases.

We could avoid this and make it easier for users by changing Leaf<N-1> contents to:

// contents for leaf index <N-1> added by block <N>
MmrLeaf {
    version: <leaf-data-format-version>,
    (
        parent_num: <N-1>,
        parent_hash: <hash_of_<N-1>>,
    ),
    extra_data: <para_heads_of_<N-1>>,
-   next_auth_set: <next_auth_set_of<N>>,
+   next_auth_set: <next_auth_set_of<N-1>>,
}

acatangiu avatar Jul 07 '22 11:07 acatangiu

AFAICT an easy fix is to move Mmr: pallet_mmr before pallet_session so that it builds its leaf before session is rotated.

But I would prefer a fix contained in pallet_mmr itself since above is prone to developer error when adding pallet_mmr to their runtime.

acatangiu avatar Jul 07 '22 11:07 acatangiu

AFAICT an easy fix is to move Mmr: pallet_mmr before pallet_session so that it builds its leaf before session is rotated.

So since session is rotated prior to leaf construction currently, next_auth_set: <next_auth_set_of<N>> (i.e. Pallet::<T>::beefy_next_authorities()) can be distinct on forks of depth 1?

Lederstrumpf avatar Jul 07 '22 12:07 Lederstrumpf

We haven't really been affected by this off-by-one issue while developing our light client (or we just worked around it).

The proposed change may affect how our light client detects authority handovers. But I have a feeling there shouldn't be any impact. If there is we can update our code.

Some background on our light client: It receives updates like the following at the start of every beefy session (Assume a session length of 10 blocks). Each update contains the beefy commitment for block N (first block of session) and the MMR leaf for block N-1 (last block of the previous session).

Update:
  BeefyCommitment(block=11) { validatorSetId: 1, ... }
  MmrLeaf(block=10) { nextValidatorSetId: 2, ...}

Update:
  BeefyCommitment(block=21) { validatorSetId: 2, ... }
  MmrLeaf(block=20) { nextvalidatorSetId: 3, ... }

Update:
  BeefyCommitment(block=31) { validatorSetId: 3, ... }
  MmrLeaf(block=30) { nextvalidatorSetId: 4, ... }

The light client knows it can perform an authority handover when the latest update contains a leaf.nextValidatorSetId that is greater than that of the previously observed value (See code here).

vgeddes avatar Jul 07 '22 12:07 vgeddes

AFAICT an easy fix is to move Mmr: pallet_mmr before pallet_session so that it builds its leaf before session is rotated.

So since session is rotated prior to leaf construction currently, next_auth_set: <next_auth_set_of<N>> (i.e. Pallet::<T>::beefy_next_authorities()) can be distinct on forks of depth 1?

There is no issue for forks in either option (current behavior & suggested change) because the validator set is identical for blocks at height N (or N+1, ..., N+m) across all forks.

The issue is more about usability in the sense that the BEEFY payload for block N is a MMR root where latest latest leaf is a set of primitives coming from a mix of blocks N-1 and N; it could be simplified to:

(block numbers are 1-indexed whereas leaf indexes are 0-indexed) BEEFY payload for block N contains MMR root built from leaves indexed L in 0..=N-1, with each Leaf index L containing primitives coming exclusively from block number L. At block N, the latest leaf L=N-1 contains primitives values pertaining to block N-1 (instead of some from N-1 and some from N).


Marking this as "Some day maybe" since existing clients already successfully handle current behavior.

acatangiu avatar Jul 07 '22 13:07 acatangiu

actually have encountered this issue and i created this pr as a fix: https://github.com/paritytech/substrate/pull/10907

seunlanlege avatar Aug 11 '22 02:08 seunlanlege

actually have encountered this issue and i created this pr as a fix: #10907

this is actually something different - by building leaf on_finalize() you actually get:

// contents for leaf index <N-1> added by block <N>
MmrLeaf {
    version: <leaf-data-format-version>,
    (
        parent_num: <N-1>,
        parent_hash: <hash_of_<N-1>>,
    ),
-   extra_data: <para_heads_of_<N-1>>,
+   extra_data: <para_heads_of_<N>>,
    next_auth_set: <next_auth_set_of<N>>,
}

and at this point you also run into trouble with 1-block deep forks writing different leaf contents to offchain db for same (parent, pos) key.

The offchain-db thing can be fixed by also doing https://github.com/paritytech/substrate/issues/11799.


@seunlanlege Is the mixed layout of MmrLeaf above really better?

Isn't light-client code simpler if it simply references leaf index N when interested in extra_data of block N? (i.e. for N=5: leaf index 5 is the leaf added by block 6 containing extra_data of block 5)

acatangiu avatar Sep 22 '22 12:09 acatangiu

Isn't light-client code simpler if it simply references leaf index N when interested in extra_data of block N?

So you're correct that this makes sense, but only from a leaf index perspective.

But light clients won't be tracking leaf indexes, they'll be tracking block heights.

And it's a bit wonky from that perspective that a leaf emitted a block N contains data about N-1 especially when there's no clear advantage to having it this way.

Whereas it's easier to reason that a leaf emitted at block N contains information about block N.

Also the leaves in this N-1 will miss the authority set change. So you'll need a leaf N+1 where N is the authority set change height to know exactly the beginning of a new epoch. Which is just cognitive overhead.

seunlanlege avatar Sep 22 '22 12:09 seunlanlege

Aha, that makes sense. So to summarize, you're saying for a light client, the best experience to interact with pallet-mmr is if it exposed leafs with following format:

// contents for leaf added by block <N>
MmrLeaf {
    version: <leaf-data-format-version>,
    (
        parent_num: <N-1>,
        parent_hash: <hash_of_block<N-1>>,
    ),
    extra_data: <para_heads_at_block<N>>,
    next_auth_set: <next_auth_set_at_block<N>>,
}

right?

acatangiu avatar Sep 22 '22 12:09 acatangiu

Aha, that makes sense. So to summarize, you're saying for a light client, the best experience to interact with pallet-mmr is if it exposed leafs with following format:

yes correct.

seunlanlege avatar Sep 22 '22 12:09 seunlanlege

I'm happy with this change too. It makes sense for me and would allow us to improve some of the code in our off-chain relayers. As the relationship between block-height, leaf-index, and the desired leaf.paras-root was previously a bit confusing.

+1

vgeddes avatar Sep 22 '22 17:09 vgeddes

Aha, that makes sense. So to summarize, you're saying for a light client, the best experience to interact with pallet-mmr is if it exposed leafs with following format:

// contents for leaf added by block <N>
MmrLeaf {
    version: <leaf-data-format-version>,
    (
        parent_num: <N-1>,
        parent_hash: <hash_of_block<N-1>>,
    ),
    extra_data: <para_heads_at_block<N>>,
    next_auth_set: <next_auth_set_at_block<N>>,
}

I'm sorry @vgeddes @seunlanlege but this doesn't seem to be possible :cry: see https://github.com/paritytech/substrate/pull/12446#issuecomment-1279011413

Closing for now, please reopen if you have any other ideas.

acatangiu avatar Oct 14 '22 13:10 acatangiu

We decided we want to do at least this (since https://github.com/paritytech/substrate/pull/12446 isn't possible):

// contents for leaf index <N-1> added by block <N>
MmrLeaf {
    version: <leaf-data-format-version>,
    (
        parent_num: <N-1>,
        parent_hash: <hash_of_<N-1>>,
    ),
    extra_data: <para_heads_of_<N-1>>,
-   next_auth_set: <next_auth_set_of<N>>,
+   next_auth_set: <next_auth_set_of<N-1>>,
}

which could be as easy as: https://github.com/paritytech/substrate/issues/11797#issuecomment-1177485479

acatangiu avatar Dec 06 '22 15:12 acatangiu

We decided we want to do at least this (since #12446 isn't possible):

// contents for leaf index <N-1> added by block <N>
MmrLeaf {
    version: <leaf-data-format-version>,
    (
        parent_num: <N-1>,
        parent_hash: <hash_of_<N-1>>,
    ),
    extra_data: <para_heads_of_<N-1>>,
-   next_auth_set: <next_auth_set_of<N>>,
+   next_auth_set: <next_auth_set_of<N-1>>,
}

which could be as easy as: #11797 (comment)

I can confirm this works:

  1. change (only moving pallet_mmr ahead of pallet_session in runtime enum) in https://github.com/Lederstrumpf/polkadot/tree/beefy-mmr-leaf-off-by-one, with log bumps for debugging in https://github.com/Lederstrumpf/substrate/tree/beefy-mmr-leaf-off-by-one,
  2. logs showing that it's taking next_auth_set_of<N-1>: https://hackmd.io/Ve-FPPAJSdeGNc3rc-FmrA

So this would make the leaf content consistent wrt. block numbering.

However, looking at Adrian's PR #12446 again, the reason for closing it was that offchain worker is not guaranteed to run on every block, but since #12753 moved the offchain storage handling to client-side gadget, there's nothing blocking doing next_auth_set_of<N> as implemented in #12446, if I'm not mistaken?

Otherwise, I'll open up PR for next_auth_set_of<N-1>.

Lederstrumpf avatar Jan 12 '23 10:01 Lederstrumpf

but since https://github.com/paritytech/substrate/pull/12753 moved the offchain storage handling to client-side gadget, there's nothing blocking doing next_auth_set_of<N> as implemented in https://github.com/paritytech/substrate/pull/12446, if I'm not mistaken?

it could be done at the expense of extra runtime storage - keep non-canon full data in runtime storage instead of offchain storage and "back it up" to offchain on client-side finality, but then you still have problems when finality lags and aggressive pruning is configured; and obviously increased storage costs for the chain.

I think the wise path is to take the slightly increased code complexity on light-client side and use <N-1>, and thus avoid all the extra runtime storage required to work directly on <N>.

Otherwise, I'll open up PR for next_auth_set_of<N-1>.

I think this is the right call.

@vgeddes @seunlanlege just to confirm there is no difference in light-client "runtime costs" between:

MmrLeaf @ block N {
    version: <leaf-data-format-version>,
    (
        parent_num: <N-1>,
        parent_hash: <hash_of_<N-1>>,
    ),
    extra_data: <para_heads_of_<N-1>>,
    next_auth_set: <next_auth_set_of<N-1>>,
}

and

MmrLeaf @ block N {
    version: <leaf-data-format-version>,
    (
        parent_num: <N-1>,
        parent_hash: <hash_of_<N-1>>,
    ),
    extra_data: <para_heads_of_<N>>,        <---- diff here
    next_auth_set: <next_auth_set_of<N>>,   <---- diff here
}

acatangiu avatar Jan 12 '23 10:01 acatangiu

@vgeddes @seunlanlege just to confirm there is no difference in light-client "runtime costs" between:

Yes correct. The light client only needs to be aware of a single header, rather than 2.

seunlanlege avatar Jan 12 '23 12:01 seunlanlege

Should not be a problem for our light client, I think.

vgeddes avatar Jan 12 '23 17:01 vgeddes

Fixed in https://github.com/paritytech/polkadot/pull/6577

acatangiu avatar Jan 19 '23 12:01 acatangiu