cortex icon indicating copy to clipboard operation
cortex copied to clipboard

Single node down when joining and leaving will cause total outage

Open bboreham opened this issue 6 years ago • 5 comments

If you have 3 ingesters, and replication factor 3, then a rolling update will give this error:

at least 3 live ingesters required, could only find 2

This is because the distributor increases the size of the quorum required when it finds ingesters joining and leaving.

Strikes me there is something wrong here.

bboreham avatar Mar 17 '19 17:03 bboreham

I'm facing the same problem

Serrvosky avatar Jul 02 '19 09:07 Serrvosky

#1488

Serrvosky avatar Jul 02 '19 10:07 Serrvosky

Same issue hits when you have more than 3 ingesters, one is LEAVING, and another one goes bad. Distributors will fail the entire request back to the sender with a 500 code because it can only find 2 ingesters for some subset of the series.

bboreham avatar Feb 20 '20 20:02 bboreham

I think if this happened when extended write was set to true, this issue was fixed by https://github.com/cortexproject/cortex/issues/4626

alanprot avatar Feb 18 '22 21:02 alanprot

#4626 is another issue; did you mean #4636?

bboreham avatar Feb 23 '22 12:02 bboreham