Cluster check
Create a 'cluster'/rollup check.
This would allow you to group multiple checks together and expose the 'cluster' check as a single entity. Thresholds should be percentage based.
Would be nice:
Cluster checks that support usage of 'tags'. Ie. When creating the cluster check, you do not have to specify specific checks, but instead just specify one or more tags that other checks use.
example:
monitor:
exec-cluster-check:
type: cluster-tags
description: cluster check for important execs
interval: 10s
monitor-tags:
- very-important
warning-threshold: 20% # 20 percent of the checks are failing
critical-threshold: 50% # 50 percent of the checks are failings
warning-alerter:
- primary-slack
critical-alerter:
- primary-email
tags:
- our-cluster-checks
exec-check1:
type: exec
description: exec check test
timeout: 5s
command: echo
args:
- hello
- world
interval: 10s
return-code: 0
expect: hello
warning-threshold: 1
critical-threshold: 3
tags:
- super-exec-checks
- very-important
exec-check2:
type: exec
description: exec check test
timeout: 5s
command: echo
args:
- hello
- world
interval: 10s
return-code: 0
expect: world
warning-threshold: 1
critical-threshold: 3
warning-alerter:
- primary-slack
critical-alerter:
- primary-email
tags:
- super-exec-checks
- very-important
In the above example:
We create a 'exec-cluster-check' that will monitor the state of 2 checks that were specified through the usage of the very-important tag. If 20% of the underlying checks fail, it will produce a warning alert, if 50% of the underlying checks fail, it will produce a critical alert.
Do you anticipate this check running those checks a second time, or re-using the existing check state from the last run?
I think this should reuse check state data, not sure how tricky that could be though (having partial state only etc.).