Add metrics for nexus scheule to start latency
What changed?
Add metric for ScheduleToStart latency of nexus operations.
Why?
How did you test it?
Potential risks
Documentation
Is hotfix candidate?
I would put this on the outbound executor and use the task type tag so we can get this coverage for all outbound tasks:
https://github.com/temporalio/temporal/blob/117847107c19575c30d894ed7571d484146427ac/service/history/outbound_queue_active_task_executor.go#L70
In addition to that you could capture the start time before the task is executed and check if a NotFound error was returned - indicating that the task was skipped (to some degree). In that case I would put a label on the metric saying whether it was skipped or processed.
This will avoid issues with reprocessed tasks (which are expected) from messing with the schedule-to-start latency.
This PR was marked as stale. Please update or close it.