cockroach icon indicating copy to clipboard operation
cockroach copied to clipboard

release-23.1: catalog: remove the chart catalog

Open abarganier opened this issue 1 year ago • 7 comments

Backport 1/1 commits from #99788.

/cc @cockroachdb/release

NB: This commit has been altered to reduce the overall # of changes, to make it more backport friendly. Original commit message is contained below, although the kept changes solely aim to ensure that the list of metrics contained within tsdump are sourced dynamically, instead of from a static chart catalog.


Roughly the following, all in the same commit (sorry - discovered what needed to be done as I went, if reviewers find it hard to grok I can try to chop it into commits):

  • tsdump: simplify code a bit
  • tsdump: query AllMetricMetadata endpoint to get all timeseries names (rather than filling it server-side in the TSDB, which doesn't really have access to it now)
  • tsdump: return error instead of filling in metric names
  • serverpb: add GetInternalTimeSeriesNamesFromServer, this is code that essentially lived in catalog, but with some cleanups. We need this to go from "metric name" to "actual name the TSDB stores this under in KV" because that's what tsdump operates on.
  • catalog: return a flat mock stub in GenerateCatalog
  • catalog: pretty much delete everything else, including that list of metrics that needed to be manually adjusted every time a metric was added or removed.

A 23.1 cli will fail to fetch the timeseries from a 23.2+ host, but the error message is informative and I don't think additional complexity to smoothen mixed-version deploys is warranted. A 23.2+ cli will work fine with older releases.

Closes https://github.com/cockroachdb/cockroach/issues/99791.

Tidy up a few issues that had to do with auto-generating Grafana dashboards from the catalog:

Closes https://github.com/cockroachdb/cockroach/issues/54178. Closes https://github.com/cockroachdb/cockroach/issues/78232. Closes https://github.com/cockroachdb/cockroach/issues/81035.

Epic: none Release note: None Release justification: necessary improvement to observability with respect to the tsdump tool, which currently omits important histogram metrics without these changes.

abarganier avatar Feb 21 '24 16:02 abarganier

Thanks for opening a backport.

Please check the backport criteria before merging:

  • [x] Backports should only be created for serious issues or test-only changes.
  • [x] Backports should not break backwards-compatibility.
  • [x] Backports should change as little code as possible.
  • [x] Backports should not change on-disk formats or node communication protocols.
  • [x] Backports should not add new functionality (except as defined here).
  • [x] Backports must not add, edit, or otherwise modify cluster versions; or add version gates.
  • [x] All backports must be reviewed by the owning areas TL and one additional TL. For more information as to how that review should be conducted, please consult the backport policy.
If your backport adds new functionality, please ensure that the following additional criteria are satisfied:
  • [ ] There is a high priority need for the functionality that cannot wait until the next release and is difficult to address in another way.
  • [ ] The new functionality is additive-only and only runs for clusters which have specifically “opted in” to it (e.g. by a cluster setting).
  • [ ] New code is protected by a conditional check that is trivial to verify and ensures that it only runs for opt-in clusters. State changes must be further protected such that nodes running old binaries will not be negatively impacted by the new state (with a mixed version test added).
  • [ ] The PM and TL on the team that owns the changed code have signed off that the change obeys the above rules.
  • [ ] Your backport must be accompanied by a post to the appropriate Slack channel (#db-backports-point-releases or #db-backports-XX-X-release) for awareness and discussion.

Also, please add a brief release justification to the body of your PR to justify this backport.

blathers-crl[bot] avatar Feb 21 '24 16:02 blathers-crl[bot]

Your pull request contains more than 1000 changes. It is strongly encouraged to split big PRs into smaller chunks.

:owl: Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

blathers-crl[bot] avatar Feb 21 '24 16:02 blathers-crl[bot]

This change is Reviewable

cockroach-teamcity avatar Feb 21 '24 16:02 cockroach-teamcity

Tagging @jbowens as secondary TL for review.

Context for the motivation of this backport here: https://github.com/cockroachlabs/support/issues/2829#issuecomment-1954572245

abarganier avatar Feb 21 '24 17:02 abarganier

@abarganier Can we fix the problem of missing metrics in tsdump for 23.1 with a smaller diff? This one feels quite large and I'm not sure if the risk is justified.

dhartunian avatar Feb 22 '24 15:02 dhartunian

@dhartunian sure, let me see what I can remove from this backport.

abarganier avatar Feb 22 '24 15:02 abarganier

Okay @dhartunian, reduced the scope to just sourcing the list of metrics names dynamically, and removed all the diffs that got rid of the chart catalog. RFAL.

abarganier avatar Feb 22 '24 18:02 abarganier

TFTR!

abarganier avatar Feb 27 '24 18:02 abarganier