alertmanager icon indicating copy to clipboard operation
alertmanager copied to clipboard

Add AlertLifeCycleObserver that allows consumers to hook into Alert life cycle

Open emanlodovice opened this issue 2 years ago • 10 comments

What this pull request does

This pull requests introduces a new AlertLifeCycleObserver interface that is accepted in the API, Dispatcher, and the notification pipeline. This interface contains methods to allow tracking what happens to an alert in alert manager.

Motivation

Currently, when a customer complains “I think my alert is delayed”, we currently have no straightforward way to troubleshoot. At minimum, we should be able to quickly identify if the problem is post-notification (we sent to the receiver on time but the receiver has some delay) or pre-notification.

By introducing a new interface that allows to hook into the alert life cycle, consumers of the alert manager package would be able to implement whatever observability solution works best for them.

emanlodovice avatar Aug 16 '23 00:08 emanlodovice

This is great! I've been thinking about doing something similar, for the exact reasons mentioned:

when a customer complains “I think my alert is delayed”, we currently have no straightforward way to troubleshoot. At minimum, we should be able to quickly identify if the problem is post-notification (we sent to the receiver on time but the receiver has some delay) or pre-notification.

grobinson-grafana avatar Aug 29 '23 10:08 grobinson-grafana

I'm not 100% sure to understand how it would be used outside of prometheus/alertmanager. Can you share some code? Also though not exactly the same, I wonder if we shouldn't implement tracing inside Alertmanager to provide this visibility about "where's my alert?".

The use that we are thinking of is just adding logs for these events. It sort of becomes an alert history that we can query when the customer comes in. We would like to have the flexibility in implementing how we collect and format the logs and how we will store them.

emanlodovice avatar Aug 30 '23 18:08 emanlodovice

Just some nits but overall looks good!

qinxx108 avatar Oct 09 '23 23:10 qinxx108

@grobinson-grafana @simonpasquier could you have a look at this PR when you have time? Thank you

emanlodovice avatar Oct 12 '23 06:10 emanlodovice

Rebased PR and fixed conflicts

emanlodovice avatar Oct 17 '23 21:10 emanlodovice

@simonpasquier this draft PR in cortex gives the general idea of our use case for this feature https://github.com/cortexproject/cortex/pull/5602/commits

emanlodovice avatar Oct 19 '23 03:10 emanlodovice

@gotjosh good day. Can you take a look at this one?

emanlodovice avatar Nov 20 '23 21:11 emanlodovice