release-23.2: pkg/cloud: add metrics for readers/writers/listings/conn reuse/tls
Backport 6/6 commits from #115517, and one from #119571, on behalf of @dt.
/cc @cockroachdb/release
See commits.
In the first commit I ripped out some of the extra layers of indirection and interfaces of metricsRecorder: being an interface with only one implementation wasn't really buying us while a the potentially-nil-interface argument can be tricky.
Release justification: low-risk, high impact observability improvement.
Thanks for opening a backport.
Please check the backport criteria before merging:
- [x] Backports should only be created for serious issues or test-only changes.
- [x] Backports should not break backwards-compatibility.
- [x] Backports should change as little code as possible.
- [x] Backports should not change on-disk formats or node communication protocols.
- [ ] Backports should not add new functionality (except as defined here).
- [x] Backports must not add, edit, or otherwise modify cluster versions; or add version gates.
- [ ] All backports must be reviewed by the owning areas TL and one additional TL. For more information as to how that review should be conducted, please consult the backport policy.
If your backport adds new functionality, please ensure that the following additional criteria are satisfied:
- [x] There is a high priority need for the functionality that cannot wait until the next release and is difficult to address in another way.
- [ ] The new functionality is additive-only and only runs for clusters which have specifically “opted in” to it (e.g. by a cluster setting).
- [ ] New code is protected by a conditional check that is trivial to verify and ensures that it only runs for opt-in clusters. State changes must be further protected such that nodes running old binaries will not be negatively impacted by the new state (with a mixed version test added).
- [ ] The PM and TL on the team that owns the changed code have signed off that the change obeys the above rules.
- [ ] Your backport must be accompanied by a post to the appropriate Slack channel (#db-backports-point-releases or #db-backports-XX-X-release) for awareness and discussion.
Also, please add a brief release justification to the body of your PR to justify this backport.
It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR?
:owl: Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.
Backports should only be created for serious issues or test-only changes.
In an L2 incident involving a cluster that became unavailable and was deemed a P0 incident, the lack of visibility into cloud connections hindered determining what was going on and the ideal remediation. That seems to qualify.
Backports should not break backwards-compatibility.
No change.
Backports should change as little code as possible.
✅
Backports should not change on-disk formats or node communication protocols.
✅
Backports should not add new functionality (except as in wiki).
A couple new metrics are, strictly speaking, "new functionality" though we rarely treat added log messages/trace spans/metrics as such. But even if we do, I think we can argue conformance with the rules or spirit of the rules:
"high priority business need for the functionality" is covered above. "additive-only" is easy; only runs for customers who have specifically “opted in” is debatable -- you could say you opt-in by looking at the metric or you don't, but there is an also a cluster setting gating collection of the metric (which is default on, since by the time we know we need this metric it is often too late to start collecting it).
Backports must not add, edit, or otherwise modify cluster versions; or add version gates.
✅