Random coin for leader election
The leader election in Tusk is a post-proposal leader pick based on a random coin.
The code we have right now is not a random coin, but rather a round-robin: https://github.com/MystenLabs/narwhal/blob/35319b17ece0bcfc6839fb23188138424da86a27/consensus/src/lib.rs#L210-L228
We should implement a real leader election based on a public coin.
- the most common incarnation of this is a random beacon extracted from threshold signatures (à la BLS),
- the signature is provisioned in a DKG. For this, we should eschew the traditional Feldman-VSS (which is hard to deploy and tune in production) and rather go for a scalable DKG. This could be Gurkan-Maller 2021: https://eprint.iacr.org/2021/005 https://github.com/kobigurk/aggregatable-dkg
- the implementation should be modular enough to allow us to replace this piecemeal with a customer's DKG when available (e.g. Celo)
Note that Alberto and co have a new variant of Tusk that does not do randomized leader election, but requires partial synchrony. However, it would suffer from the selective leader DoS problem. Maybe worth chatting with him before going ahead with leader election implementation.
Thankfully this doesn't come up before stage 3 in the roadmap.
My notes of a discussion with @punwai :
On random beacons:
-
We propose using threshold BLS-based randomness beacon to generate randomness, which is unbiasable and doesn’t provide lookahead to an attacker. The beacon for the next epoch is setup after validators for this new epoch have been elected. An epoch length is currently ball-parked to a week, but we don't want downtime at the epoch change.
-
Deploying this kind of system requires running a DKG protocol. Since the validators of the next epoch are known only after the epoch ends, this poses the challenge that we have to do the setup fast enough so the beacon doesn’t suffer downtime. DKGs can be asynchronous (and complex) or synchronous (and require delay tuning).
-
the posterchild for ADKGs is the work of @LefKok in 2019 and 2021
-
For synchronous DKGs, there are several alternatives: common DKG protocols, such as the 2/3-round Joint-Feldman DKG (1 2), work by broadcasting secret shares of random polynomials and proceed with a complaint round, where a participant can claim they haven’t received their share and a justification round, where a blamed participant can reveal the correct share. They work in the synchronous communication model. Since there’s great importance in knowing if a participant received their share, as otherwise they won’t be able to participate in randomness generation in the future, we practically have to make the round length long, causing a slow DKG.
- Fouque and Stern 3 describe a publicly verifiable secret sharing scheme that is used to build a one-round DKG protocol. The advantage is that we don’t need the complaint and justification rounds anymore - once a participant broadcasts their shares, all the other participants can verify the correctness of all of the shares and based on that decide which shared secrets contribute to the DKG output. There's only one value to tune (the initial round!). The bad thing is this uses unusual (Pailler) encryption.
- Another interesting instance is the Gurkan DKG, which works with a fixed aggregation topology. It shaves off a log(n) factor in messaging complexity. GH repo
What the algorithm looks like:
- At round r, a set of
nvalidators selected - Do the set up
t-of-nDKG of pubkey P, each validator stores their own secret shares_i - At round r, each validator creates their own partial signature
sig_ion messageH(r || full BLS signature at round r-1)withs_i - When there are at least
tvalidators broadcastedsig_i, each validator can recover full BLS signature at round r (the random beaconcoin) - Use the
cointo selectcommittee.leader(coin mod validator set size)
For step 2, the two existing rust library we can leverage are: https://github.com/kobigurk/aggregatable-dkg and https://github.com/ZenGo-X/fs-dkr
You got the gist of it.
At round r, a set of n validators selected
It's actually at the start of one epoch, which marks a change in the # of participants. (one epoch is many rounds)
For step 2, the two existing rust library we can leverage are: https://github.com/kobigurk/aggregatable-dkg and https://github.com/ZenGo-X/fs-dkr
Note the later is doing key resharing, i.e. they do something a bit different, and they explain the difference in their readme.
I have some new research on resharing/refreshing the keys that saves off an O(n/logn) leveraging the existence of a blockchain that produces randomness. We have not yet benchmarked it but it should be fast enough for epoch rotations especially if we only need f+1 out of 3f+1 threshold keys. Happy to share a rough draft in private