[ENHANCE] Group alerts where the changed object is the same
Is your feature request related to a problem? Please describe. I have a configmap that is consumed by 10-15 deployments (so far, likely more over time). The setup works perfectly for the shared config and reloader works as expected. However, its frustrating with the alerts setup to see an individual alert/line for every single deployment that is reloaded as I get 10-15 messages each time. Over time, if variables change/update a few times a day, thats a large number of messages to a channel.
For example, if you had two deployments with a shared config, youd get these alerts:
Reloader
[2:53]
[TEST ENV]: Reloader detected changes in api of type CONFIGMAP in namespace default. Hence reloaded deployment-1 of type Deployment in namespace default
[2:53]
Reloader
[TEST ENV]: Reloader detected changes in api of type CONFIGMAP in namespace default. Hence reloaded deployment-2 of type Deployment in namespace default
Describe the solution you'd like
Ideal solution would be a way to flag (something like ALERT_GROUP_SAME_OBEJCT true/false default false) grouping before the alerts are sent so they are condensed into a single alert. For example:
[2:53]
[TEST ENV]: Reloader detected changes in api of type CONFIGMAP in namespace default. The following type Deployment in namespace default was restarted:
- deployment-1
- deployment-2
Describe alternatives you've considered If the above isnt possible, a stop-gap alternative would be the ability to ignore successful reloads by annotation, perhaps only noting errors for alerts.
Eg, if you give deployment-2 the annotation reloader.stakater.com/alert: 'false', it'll just be ignored for sending an alert. Or a vice-versa opt-in things to alerting on functionality, whatever works.
Additional context Add any other context or screenshots about the feature request here.
Hi! Thank you for your suggestion.
The current behavior is expected but we do see your point about spamming alerts.
The first alternative will require some refactoring to make work, PRs are welcome but i don't see us having the capacity in-house at the moment and the solution requires some planning to make it robust while still keeping the current behavior.
The second one seems interesting, but when investigating we noted that the alert is not sent on error at all.
If we assume that we are only interested in the failing reloads, would an option like ALERT_ON_FAILURE which enables alerts on failed reloads solve your issue?
My thinking is that one can then disable the ALERT_ON_RELOAD and only have ALERT_ON_RELOAD to reduce the noise.