release-23.1: catalog: remove the chart catalog
Backport 1/1 commits from #99788.
/cc @cockroachdb/release
NB: This commit has been altered to reduce the overall # of changes, to
make it more backport friendly. Original commit message is contained
below, although the kept changes solely aim to ensure that the list of
metrics contained within tsdump are sourced dynamically, instead of
from a static chart catalog.
Roughly the following, all in the same commit (sorry - discovered what needed to be done as I went, if reviewers find it hard to grok I can try to chop it into commits):
- tsdump: simplify code a bit
- tsdump: query AllMetricMetadata endpoint to get all timeseries names (rather than filling it server-side in the TSDB, which doesn't really have access to it now)
- tsdump: return error instead of filling in metric names
- serverpb: add GetInternalTimeSeriesNamesFromServer, this is code that
essentially lived in
catalog, but with some cleanups. We need this to go from "metric name" to "actual name the TSDB stores this under in KV" because that's whattsdumpoperates on. - catalog: return a flat mock stub in GenerateCatalog
- catalog: pretty much delete everything else, including that list of metrics that needed to be manually adjusted every time a metric was added or removed.
A 23.1 cli will fail to fetch the timeseries from a 23.2+ host, but the error message is informative and I don't think additional complexity to smoothen mixed-version deploys is warranted. A 23.2+ cli will work fine with older releases.
Closes https://github.com/cockroachdb/cockroach/issues/99791.
Tidy up a few issues that had to do with auto-generating Grafana dashboards from the catalog:
Closes https://github.com/cockroachdb/cockroach/issues/54178. Closes https://github.com/cockroachdb/cockroach/issues/78232. Closes https://github.com/cockroachdb/cockroach/issues/81035.
Epic: none
Release note: None
Release justification: necessary improvement to observability with respect to the tsdump tool, which currently omits important histogram metrics without these changes.
Thanks for opening a backport.
Please check the backport criteria before merging:
- [x] Backports should only be created for serious issues or test-only changes.
- [x] Backports should not break backwards-compatibility.
- [x] Backports should change as little code as possible.
- [x] Backports should not change on-disk formats or node communication protocols.
- [x] Backports should not add new functionality (except as defined here).
- [x] Backports must not add, edit, or otherwise modify cluster versions; or add version gates.
- [x] All backports must be reviewed by the owning areas TL and one additional TL. For more information as to how that review should be conducted, please consult the backport policy.
If your backport adds new functionality, please ensure that the following additional criteria are satisfied:
- [ ] There is a high priority need for the functionality that cannot wait until the next release and is difficult to address in another way.
- [ ] The new functionality is additive-only and only runs for clusters which have specifically “opted in” to it (e.g. by a cluster setting).
- [ ] New code is protected by a conditional check that is trivial to verify and ensures that it only runs for opt-in clusters. State changes must be further protected such that nodes running old binaries will not be negatively impacted by the new state (with a mixed version test added).
- [ ] The PM and TL on the team that owns the changed code have signed off that the change obeys the above rules.
- [ ] Your backport must be accompanied by a post to the appropriate Slack channel (#db-backports-point-releases or #db-backports-XX-X-release) for awareness and discussion.
Also, please add a brief release justification to the body of your PR to justify this backport.
Your pull request contains more than 1000 changes. It is strongly encouraged to split big PRs into smaller chunks.
:owl: Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.
Tagging @jbowens as secondary TL for review.
Context for the motivation of this backport here: https://github.com/cockroachlabs/support/issues/2829#issuecomment-1954572245
@abarganier Can we fix the problem of missing metrics in tsdump for 23.1 with a smaller diff? This one feels quite large and I'm not sure if the risk is justified.
@dhartunian sure, let me see what I can remove from this backport.
Okay @dhartunian, reduced the scope to just sourcing the list of metrics names dynamically, and removed all the diffs that got rid of the chart catalog. RFAL.
TFTR!