Allow mutating alerts on pipeline stages
Currently the alertmanager notify stages allow things like dedupe, group by amongst other things. What we have observed is that mutating of alerts carry a lot of interesting possibilities such as:
- attach exemplars to each alert via annotations and paint them on the corresponding receiver
- group alerts by a certain grouping key and attach a root cause which can be fetched from a certain external API
currently alerts are dispatched parallel to each receiver which means that any stage is concurrently invoked causing map updates to annotations to cause a panic.
some of the things that would be useful:
- support mutation stages that are common across all receivers (we would hate to see different sample traces across various receivers)
- support grouping of alerts post mutation to facilitate more complex things
Currently things like AlertStoreCallback can be used to achieve some of this but said callback blocks the POST/PUT API which can cause a slow down on the rule manager.
I would just advice caution about putting blocking operations (such as API) calls as a stage in the notification as it can upset the quite delicate failover semantics when running a cluster of Alertmanagers in high availability mode.
I wonder how much of this can be done before the alerts are sent to the Alertmanager (i.e. in the ruler) using something like annotations? In fact, just last month at our in-person dev submmit, we (the Prometheus contributors) agreed to add support for a third dimension in addition to annotations and labels that are intended for more opaque metadata, an example of which could be exemplars.