Start next ledger trigger timer after nomination accept
Description
Helps alleviate https://github.com/stellar/stellar-core-internal/issues/343.
This change makes validators base the next ledger trigger timer on nomination accept instead of prepare. Specifically, validators start the next ledger timer when they accept the first nomination message for the given ledger. Because we trigger at acceptance, there's still a rough synchronization point for the timer. Moving the timer trigger earlier in consensus should bring block times closer to the target 5s value.
Checklist
- [x] Reviewed the contributing document
- [x] Rebased on top of master (no merge commits)
- [x] Ran
clang-formatv8.0.0 (viamake formator the Visual Studio extension) - [x] Compiles
- [x] Ran all tests
- [ ] If change impacts performance, include supporting evidence per the performance document
No unit tests? Maybe AI can help write some.
I've run some more tests on larger topologies of about 100 nodes (tier 1 + some watchers) and saw a modest improvement of about 200 ms of average block time. The test was comparing current release/v22.3 with this commit @ 900 TPS:
Control on left, changes on right.
Average ledger age across all pods with acceptance timer change:
Average ledger age across all pods before change:
It looks like we have slightly more timeouts with this change (probably because we start nomination earlier, so there's less "free" time before starting our timeout timer), but overall nomination latency and block time decreases. Once SSC is stable again, I'll run pubnet simulation withthe full topology.
It looks like we have slightly more timeouts with this change (probably because we start nomination earlier, so there's less "free" time before starting our timeout timer), but overall nomination latency and block time decreases. Once SSC is stable again, I'll run pubnet simulation withthe full topology.
Can you expand on this? I imagine you're talking about timeouts during nomination? We need to understand this better: timeouts during nomination are the worst type of timeouts as they imply picking a new leader (and therefore flooding more transaction sets that are very expensive).
Is it that you have large variance in the time it takes for the ballot protocol (between nodes) or that the time between "first nomination" and "ballot protocol starts" has a lot of variance? The timeout could also be observed only on a very small number of nodes, which should not be too much of a problem.
Is it that you have large variance in the time it takes for the ballot protocol (between nodes) or that the time between "first nomination" and "ballot protocol starts" has a lot of variance?
The variance is between "first nomination" and "ballot protocol starts", mostly due to TX set flooding, which is the most expensive part of consensus from both a bandwidth and compute standpoint. The additional timeouts were rare, and only experienced by a few nodes, not the whole network.
I think this probably happens because our timer logic has moved to being based a little less on global timing and more based on local node performance. Before, we started the timer at ballot prepare, which is more strongly synchronized. Now, a node starts it's timer when it votes to accept for the first time. This is still a rough synchronization point, since it won't vote to accept until it's heard nominations from the network, but I think there is more variance in "first vote to accept" than there is in starting ballot phase. Also, since we're moving up the point at which we start the timer, this might let fast nodes drift more than before, since there's less "dead time" we spend spinning waiting for the next ledger trigger.
Then it sounds like this change should probably be done after we ensure that SCP traffic does not get impacted by other traffic. Like I said above, if this ends up triggering more timeouts in nomination, this may have some pretty bad impact on the network in periods of high activity.
Closing, given that we want to test using ledger close time instread.