cadence icon indicating copy to clipboard operation
cadence copied to clipboard

Cadence Shard Manager

Open demirkayaender opened this issue 1 year ago • 0 comments

Cadence has been using consistent hashing based host-shard management. However this logic comes up with significant drawbacks which come to surface at high scale. Issues include:

  • Shard Load Skew
  • Shard Data Payload Skew
  • Heterogeneous Host SKUs
  • Shared Hosts
  • Individual Host Problems
  • Shutdowns, Restarts, Host Turn-up

and the list goes on.

With consistent hashing, shards are pinned to hosts. When a host is having a problem due to the factors listed above, adding a new host doesn't guarantee taking load from the problematic host. Similarly, removing a host might just shift the issues to the next host. Even though Cadence uses and recommends using many hosts in a cluster (~16K), these issues would still happen.

A host, as long as it reports itself as healthy to the membership protocol, could get traffic even if it was shutting down or just starting up.

With a high frequency, high availability environments, this logic poses an availability risk. Cadence will implement a load aware and smarter Shard Manager that can move shards among hosts to uniformly distribute the traffic and gracefully handle host maintenance events.

demirkayaender avatar Apr 28 '25 18:04 demirkayaender