narwhal icon indicating copy to clipboard operation
narwhal copied to clipboard

Random coin for leader election

Open huitseeker opened this issue 4 years ago • 6 comments

The leader election in Tusk is a post-proposal leader pick based on a random coin.

The code we have right now is not a random coin, but rather a round-robin: https://github.com/MystenLabs/narwhal/blob/35319b17ece0bcfc6839fb23188138424da86a27/consensus/src/lib.rs#L210-L228

We should implement a real leader election based on a public coin.

  • the most common incarnation of this is a random beacon extracted from threshold signatures (à la BLS),
  • the signature is provisioned in a DKG. For this, we should eschew the traditional Feldman-VSS (which is hard to deploy and tune in production) and rather go for a scalable DKG. This could be Gurkan-Maller 2021: https://eprint.iacr.org/2021/005 https://github.com/kobigurk/aggregatable-dkg
  • the implementation should be modular enough to allow us to replace this piecemeal with a customer's DKG when available (e.g. Celo)

huitseeker avatar Nov 16 '21 14:11 huitseeker

Note that Alberto and co have a new variant of Tusk that does not do randomized leader election, but requires partial synchrony. However, it would suffer from the selective leader DoS problem. Maybe worth chatting with him before going ahead with leader election implementation.

gdanezis avatar Jan 07 '22 20:01 gdanezis

Thankfully this doesn't come up before stage 3 in the roadmap.

huitseeker avatar Jan 07 '22 21:01 huitseeker

My notes of a discussion with @punwai :

On random beacons:

  1. We propose using threshold BLS-based randomness beacon to generate randomness, which is unbiasable and doesn’t provide lookahead to an attacker. The beacon for the next epoch is setup after validators for this new epoch have been elected. An epoch length is currently ball-parked to a week, but we don't want downtime at the epoch change.

    • Doing this fast requires being efficient in the aggregation phase of the threshold signature. This is done by Alin Tomescu (blog, paper), but to my knowledge no one else (?).
  2. Deploying this kind of system requires running a DKG protocol. Since the validators of the next epoch are known only after the epoch ends, this poses the challenge that we have to do the setup fast enough so the beacon doesn’t suffer downtime. DKGs can be asynchronous (and complex) or synchronous (and require delay tuning).

  3. the posterchild for ADKGs is the work of @LefKok in 2019 and 2021

  4. For synchronous DKGs, there are several alternatives: common DKG protocols, such as the 2/3-round Joint-Feldman DKG (1 2), work by broadcasting secret shares of random polynomials and proceed with a complaint round, where a participant can claim they haven’t received their share and a justification round, where a blamed participant can reveal the correct share. They work in the synchronous communication model. Since there’s great importance in knowing if a participant received their share, as otherwise they won’t be able to participate in randomness generation in the future, we practically have to make the round length long, causing a slow DKG.

    • Fouque and Stern 3 describe a publicly verifiable secret sharing scheme that is used to build a one-round DKG protocol. The advantage is that we don’t need the complaint and justification rounds anymore - once a participant broadcasts their shares, all the other participants can verify the correctness of all of the shares and based on that decide which shared secrets contribute to the DKG output. There's only one value to tune (the initial round!). The bad thing is this uses unusual (Pailler) encryption.
    • Another interesting instance is the Gurkan DKG, which works with a fixed aggregation topology. It shaves off a log(n) factor in messaging complexity. GH repo

huitseeker avatar Jul 13 '22 16:07 huitseeker

What the algorithm looks like:

  1. At round r, a set of n validators selected
  2. Do the set up t-of-n DKG of pubkey P, each validator stores their own secret share s_i
  3. At round r, each validator creates their own partial signature sig_i on message H(r || full BLS signature at round r-1) with s_i
  4. When there are at least t validators broadcasted sig_i, each validator can recover full BLS signature at round r (the random beacon coin)
  5. Use the coin to select committee.leader(coin mod validator set size)

For step 2, the two existing rust library we can leverage are: https://github.com/kobigurk/aggregatable-dkg and https://github.com/ZenGo-X/fs-dkr

joyqvq avatar Jul 15 '22 21:07 joyqvq

You got the gist of it.

At round r, a set of n validators selected

It's actually at the start of one epoch, which marks a change in the # of participants. (one epoch is many rounds)

For step 2, the two existing rust library we can leverage are: https://github.com/kobigurk/aggregatable-dkg and https://github.com/ZenGo-X/fs-dkr

Note the later is doing key resharing, i.e. they do something a bit different, and they explain the difference in their readme.

huitseeker avatar Jul 15 '22 23:07 huitseeker

I have some new research on resharing/refreshing the keys that saves off an O(n/logn) leveraging the existence of a blockchain that produces randomness. We have not yet benchmarked it but it should be fast enough for epoch rotations especially if we only need f+1 out of 3f+1 threshold keys. Happy to share a rough draft in private

LefKok avatar Jul 20 '22 13:07 LefKok