execstats: add latch wait time into cumulative contention time
sql: add setting to enable registering latch wait contention events
This commit adds a session setting (default off) that will cause contention
events emitted by the latch table to be included in the *_contention_events
virtual tables. This behavior is hidden behind a setting because latch wait
events are more common than events emitted by the lock table. Tracking them
in the virtual tables could increase system load and flood the statement
insights with uninteresting events. For now, this setting does nothing, but
the following commit will use it.
Informs #103713
Release note: None
execstats: add latch wait time into cumulative contention time
This patch adds a field IsLatch to the ContentionEvent protobuf struct that
indicates that a contention event was emitted by the latch table. The event is
added to the trace and used to increment the contention time metrics, but is
only added to the contention registry if
register_latch_wait_contention_events is set to true.
This commit also propagates the transaction ID into the latch manager and the contention events listener. This information is used during a latch wait event to distinguish between time waiting for a transaction's own requests (e.g. for QueryIntent) and for real contention between different transactions.
Fixes #103713
Release note (ui change): The contention time metric now includes time spent
waiting to acquire latches in addition to time spent acquiring locks. This
applies to both the cumulative contention time visible in the db console and
the annotations of an EXPLAIN ANALYZE output.
It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR?
:owl: Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.
Hi @DrewKimball, this looks great. Is there anything KV can do to help this proceed?
Hi @DrewKimball, this looks great. Is there anything KV can do to help this proceed?
Sorry for the delay, I'll go ahead and fix this up with some tests today.
@nvanbenschoten actually, I'm not so sure this works as-is, because there are sources of latch wait time that don't indicate contention. An example is mentioned here - QueryIntent requests that wait on a transaction's writes to finish. I don't know of other cases, but wouldn't be surprised if they exist. Maybe this means spanLatch.Manager is too low-level, because we need to know what txn is holding latches?
One thing to call out (which I was reminded of by this slack thread) is we should check that latch contention events make their way to crdb_internal.*_contention_events vtables (on a quick glance it should just work).
Any update on this issue @DrewKimball? Curious, could we target 24.2 (even better if we can backport) for this one?
Rooting out contention (including latch) symptoms continues to be a pain in several customer escalations.
Any update on this issue @DrewKimball? Curious, could we target 24.2 (even better if we can backport) for this one?
Rooting out contention (including latch) symptoms continues to be a pain in several customer escalations.
I can work on it this week - the remaining work is to keep track of the txn ID during latching so that we can tell between "self" and "contention" latch wait time (see a brief discussion here). @kevin-v-ngo do you have any thoughts on the metrics we want to surface here? I'm thinking of adding latch wait time involving other transactions directly to our cumulative contention time metric, but I can imagine we might also want to track something like cumulative latch wait time separately.
We also have the option to track "lock" vs "latch" contention time separately.
I can imagine we might also want to track something like cumulative latch wait time separately
Yes, I agree. I'd follow the same pattern as what we did for CPU (which only has SQL for now) and break it down.
- One metric for latch
- One metric for Lock
- One metric for Latch+Lock (contentionTime as it is today in crdb_internal.statement_statistics but includes latch wait time)
This approach wouldn't alter the schema and precents potentially breaking customers using 'contentionTime' (it's only additive with two columns). And then in the UI, we can break it down, especially if there guidance on what a user would do when seeing one type of contention over another.
Thoughts?
It seems difficult to write a good test for this, but I manually ran KV0 with --zipfian and collected a bundle. The bundle showed a bit less than a second waiting on a latch acquisition:
This was reflected in the structured stats event in the trace:
1002.647ms 1002.407ms event:ComponentStats{ID: {16fb992d-281f-4d3f-9d9e-7f110b76a8a4 PROCESSOR 1 1 }, KV contention time: 989ms, execution time: 1s, sql cpu time: 2ms, batches output: 1, rows output: 1}
NOTE: I didn't show the annotated EXPLAIN ANALYZE diagram here because the statement bundle collected through the db console doesn't include the diagram. However, the structured event above would have been used to annotate that diagram.
In the current iteration, I've plumbed the txn ID through the latch manager so that we can distinguish whether a latch wait was self-induced or due to contention. Now, only contention-related latch wait events are added to the cumulative contention time metric.
Yes, I agree. I'd follow the same pattern as what we did for CPU (which only has SQL for now) and break it down.
One metric for latch One metric for Lock One metric for Latch+Lock (contentionTime as it is today in crdb_internal.statement_statistics but includes latch wait time) This approach wouldn't alter the schema and precents potentially breaking customers using 'contentionTime' (it's only additive with two columns). And then in the UI, we can break it down, especially if there guidance on what a user would do when seeing one type of contention over another.
Thoughts?
I think we should clarify now how we want to handle "self" vs "contention" vs "total" (self + contention) latch wait time. Should we just ignore self-induced latch waiting, and only track contention-related events in One metric for latch?
That being said, maybe we can defer that decision to another PR - this one will add latch wait time to the contention metric, and also adds enough tracking that we can easily add a metric self-induced or total latch wait time as well.
Thanks Drew! Several questions below😅
Should we just ignore self-induced latch waiting, and only track contention-related events in One metric for latch? What does self-induced latching waiting tell the user - refactor their transaction?
I assume this means the SQL transaction is 'conflicting' with itself correct? Or is the same statement (consisting of multiple kv requests) conflicting with itself?
Since we're adding latch contention events to _crdb_internal.*contention_events, users should be able to detect the blocking transaction, including itself correct? If so, I don't see why it's critical to further break down since the blocking id would be itself.
Adding @kevinkokomani @nicktrav @nvanbenschoten @yuzefovich to weigh in on this as well in case this additional metric (self-inducing wait time) would help in any scenarios/previous escalations
What does self-induced latching waiting tell the user - refactor their transaction?
I'm not sure it would be directly useful to the user; rather, I think it'd be most useful during escalations. I think users probably would only care about the contention portion of latch wait time.
I assume this means the SQL transaction is 'conflicting' with itself correct? Or is the same statement (consisting of multiple kv requests) conflicting with itself?
I don't think "conflicting" is the right word for when this happens. Rather, there are times when a statement/txn has to wait for a request to finish before it can continue evaluation, and this is handled through the latch manager. The example I'm aware of occurs during write pipelining - we wait for replication to complete using QueryIntent requests in the latch manager, and evaluation can continue asynchronously in most cases.
So, this self-induced waiting time is expected during normal execution, and isn't in itself indicative of any problems. I also think it would be confusing to combine with contention-induced waiting time even if the previous point wasn't true, since the mitigations and causes are different.
Thanks @DrewKimball and @kevin-v-ngo. The only part I have a question about is:
rather, I think it'd be most useful during escalations
If this is the case then I would still hope there is an easy way to identify the metric, perhaps through a method of advanced observability + an internal runbook. Makes sense that we don't want the metric readily accessible on the DB Console or end-user views because it is't indicative of any problems or any actions that the user should take. However if we want to potentially identify an issue that specifically is occurring during the write pipelining, e.g. maybe this "self-induced waiting time" is higher than we might expect, then tracking it somewhere + helping Support identify it easily with a runbook of some sort would be ideal.
Thanks for the context @DrewKimball.
I don't think "conflicting" is the right word for when this happens. Rather, there are times when a statement/txn has to wait for a request to finish before it can continue evaluation, and this is handled through the latch manager...So, this self-induced waiting time is expected during normal execution, and isn't in itself indicative of any problems. I also think it would be confusing to combine with contention-induced waiting time even if the previous point wasn't true, since the mitigations and causes are different.
Agree to not combine with contention-induced waiting time, especially if this metric would commonly not be indicative of a problem and the user action is distinct from transaction conflicts today.
I don't see any issues with capturing this as a separate metric at least in our statement statistics table if this could be useful in escalations for L1/L2 - we may need it in the future for unknown unknowns. I'm not sure what the lift/work involved here but happy to track as a separate issue as well.
One thing to call out (which I was reminded of by this slack thread) is we should check that latch contention events make their way to crdb_internal.*_contention_events vtables (on a quick glance it should just work).
Actually, this didn't work before because I introduced a LatchWaitEvent struct, while the existing infrastructure searches the trace for ContentionEvent. I've reworked this draft to just always use ContentionEvent instead, and added an IsLatch field to that struct that changes the message printed in the trace. Latch wait events do appear to show up in the contention_events tables as expected.
If this is the case then I would still hope there is an easy way to identify the metric, perhaps through a method of advanced observability + an internal runbook. Makes sense that we don't want the metric readily accessible on the DB Console or end-user views because it is't indicative of any problems or any actions that the user should take. However if we want to potentially identify an issue that specifically is occurring during the write pipelining, e.g. maybe this "self-induced waiting time" is higher than we might expect, then tracking it somewhere + helping Support identify it easily with a runbook of some sort would be ideal.
The latch wait events will still be visible in the trace as lines that look like this:
event:latch table conflict with 285e526b-1cbe-41d7-982b-bb043762410a on /Table/106/1/0/0 for 0.909s
They just won't be included in the cumulative contention time if it's a self-induced wait. Does that seem reasonable, or do we also want to add some additional observability?
I don't see any issues with capturing this as a separate metric at least in our statement statistics table if this could be useful in escalations for L1/L2 - we may need it in the future for unknown unknowns. I'm not sure what the lift/work involved here but happy to track as a separate issue as well.
Issue: https://github.com/cockroachdb/cockroach/issues/124068
After some discussion with @yuzefovich, I've added a default-off session setting that determines whether these latch-wait contention events are added to the contention registry. My reasoning is that these events are likely much more common than other sources, and we don't want to add significant overhead to the system or flood the statement insights with uninteresting events.
The cumulative contention time metric in the db console and EXPLAIN ANALYZE diagram annotations will still reflect latch wait time, as will the ContendedQueriesCount and CumulativeContentionNanos metrics.
Friendly ping @DrewKimball