freno icon indicating copy to clipboard operation
freno copied to clipboard

Allow X or X% hosts to have bad values

Open shlomi-noach opened this issue 8 years ago • 0 comments

Say we're measuring replication lag and we have 10 replicas. For some apps, it would be OK if one lags. Maybe two. And it would be better let them lag and have ongoing operations, as opposed to stalling everything.

The suggestion is to have a per-cluster config that indicates how many hosts can be down. This would either be an absolute number, or a ratio/percentile. For smaller setups it makes more sense to have an absolute number (e.g. "1 host can be lagging"). For larger setups it may be better to work by percentile ("up to 5% of hosts may be lagging").

I'm unsure whether to support both.

shlomi-noach avatar Apr 13 '17 05:04 shlomi-noach