Bryan Boreham
Bryan Boreham
@anarcher it is quite possible there is a bug causing this high memory usage in the HA-tracker code. Without pinpointing the true cause we cannot say for sure.
Returning to this: > could you have thousands of distinct values of `cortex_ha_cluster` and `__replica__` ? These metrics use strings which point into the incoming buffer, hence will cause it...
Can you upload the actual profile? I find those graphviz diagrams unreadable. Optionally using https://share.polarsignals.com/.
The large number looks the same as #4324. We have a limit added in #3992, `-ingester.instance-limits.max-inflight-push-requests`. With the full profile I could check to see if anything unusual was blocking...
Now it looks like #3349. We had [a PR proposed](#3097) which sharded locks. I think you're saying `MemPostings.addFor()` does a sort which is slow. It does look suspiciously like a...
I looked at the Prometheus code here: https://github.com/prometheus/prometheus/blob/bb05485c79084fecd3602eceafca3d554ab88987/tsdb/index/postings.go#L326-L331 which, if series IDs were in random order, would be very slow. However in Cortex the series IDs are generated by incrementing...
Not fixed by #3192 which addresses a similar problem in the querier. I was thinking of code like this: https://github.com/cortexproject/cortex/blob/0a33f9d4c8ff23ec00972e33889b388e302a1df8/pkg/chunk/series_store.go#L248 I also found https://github.com/cortexproject/cortex/blob/0a33f9d4c8ff23ec00972e33889b388e302a1df8/pkg/distributor/distributor.go#L717 https://github.com/cortexproject/cortex/blob/0a33f9d4c8ff23ec00972e33889b388e302a1df8/pkg/distributor/query.go#L114-L120
Whilst the first example can be dropped when we deprecate chunks (#4268), the others still seem to be valid.
#858 talks about a similar situation - we need to limit the number of "backgrounded" requests. It sounds like a smaller timeout would help. #736 was done for good reasons,...
Yes, if you cancel the 3rd push every time then each ingester will have a random sprinkling of holes in the data, so the checksums won't match.