temporal
temporal copied to clipboard
Emitting metrics for DLQ message count
What changed?
Added a DLQMEtricsEmitter which will emit DLQ message count every 3 hours.
Why?
We alredy have a counter metric for DLQ writes. This will increment each time a new message is written to DLQ. This can be used to create an alarm that fires when a message is added to DLQ. But we also need to know if DLQ messages are inspected and cleared on time. This is not possible with the existing metric. This new metric DLQMessageCount will help in monitoring the current number of messages in DLQ. This metric will only be emitted from the history service instance which hosts shard numbered 1.
I also considered adding a gauge metric for DLQ message count. If the history service restarts, we will lose this count.
How did you test it?
Unit test
Potential risks
None