admission: additional observability
In order of importance and/or done-ness:
- [x] #87883;
- [x] #87424;
- [ ] #88076;
- [ ] Metric capturing compaction bandwidth out of L0 (which is used to generate write tokens in admission control);
- [ ] We could log the {max,min} slot count and {max,min} runnable goroutine count every second, or export metrics for it. In internal experimentation we find ourselves reaching for it.
Jira issue: CRDB-16641
Perhaps exporting Go's
/sched/latencies:secondsto have visibility in Go scheduler latencies.
This has proven extremely valuable to do in internal AC-related experiments (re: #75066). https://github.com/irfansharif/cockroach/tree/220614.export-tracing is a prototype that grafts together the prometheus-compatible data from https://github.com/prometheus/client_golang/blob/main/prometheus/go_collector_latest.go, and looks as follows:

Through it we were able to correlate foreground latency spikes to Go scheduler latency spikes.
From an internal doc, re: "Information needed from Go runtime":
Runnable info: Minimally, we need the number of runnable goroutines, sampled at
some reasonably high rate (100hz?). It would be preferable to get a delta value
of total duration spent in Runnable and Running state since the last sample (or
a cumulative number, from which we can compute the delta). The duration is less
sensitive to observing spikes in runnable goroutines, which quickly get
scheduled, which does not necessarily represent scarcity of cpu resources.
IIUC, this is exactly the total sum of everything captured within /sched/latencies:seconds.
Exporting segmented latency histograms by different priority levels as seen by admission control, to capture what classes of requests are observing queuing and by how much;
We need this to make sense of mixed workload behavior (e.g. conversation in https://cockroachlabs.slack.com/archives/C038JEXC5AT/p1658247509643359?thread_ts=1657630075.576439&cid=C038JEXC5AT)
Adding Andrew here too to pick through the list within the next two weeks, it'll be a good way to get our feet wet.
@andrewbaptist: I'm working on the "Exporting Go's /sched/latencies:seconds" as a histogram. Want to take on the remaining?
Discussed offline with @sumeerbhola, closing this issue as the changes we want to do already have separate issues that are being tracked in the backlog with priorities assigned to them. The rest of them, we don't want to invest time into doing.