Aaditya Sondhi
Aaditya Sondhi
bors r=ajwerner
Discussed offline with @sumeerbhola, closing this issue as the changes we want to do already have separate issues that are being tracked in the backlog with priorities assigned to them....
This came up in the internal scale test for 23.2: https://cockroachlabs.slack.com/archives/C05VAJ5H3QS/p1697475681139809. Re-adding a previously stopped node caused an overload in the cluster.
We saw some raft log catch-up work happening as well. However, I think the primary cause of the overload was that the raft log had been truncated on this node,...
This issue has been resolved through our optimizations in pebble (see https://github.com/cockroachdb/cockroach/pull/117116). We no longer see the inverted LSM problem as described here due to our use of excise in...
Ack, that's fine with me. It is worth noting that we have a tracking issue for the bandwidth specific problem: https://github.com/cockroachdb/cockroach/issues/86857. It is a problem we plan to solve soon....
[Link](https://cockroachlabs.slack.com/archives/C03V96V2S4C/p1715898604026739) to grafana dashboard showing this. Note: I manually adjusted setting in that run to show how the shaping is working. In this test, we set those limits right away...
This only fails when the metamorphic constant is set to true. It is set to false by default on release builds. So not a release blocker, but will fix this...
This has me scratching my head a little. Before calling `Admit()` in `Pace()` (the only caller of `Admit()`), we explicitly do a nil check on `SnapshotQueue`, but inside `Admit()` this...
It is a flaky failure, only fails in ~1/20 runs even with the cluster setting manually set to true.