SLO Aggregation
Problem to solve
Currently, multiple SLOs can encompass a single user journey, without a single SLO that measures the user experience.
Proposal
Add the ability to aggregate SLOs, and roll them up into a single SLO
This is similar to Keptn's Quality Gates, and proposed by Andres Grabner. More info here: https://www.youtube.com/watch?v=bMnMkOKVzdg
Further details
Key features:
- Performance Signature
- Synthetic SLI from multiple SLOs
- Key SLOs
- All will fail if this fails
- Weighted
- Total weight is the aggregation/sum of all weights
- Performance testings
- Regression Detection
The simplest usable (to me) syntax for this would be:
spec:
description: Aggregate SLO
budgetingMethod: ?
indicator:
objectiveMetric:
source: http://localhost:9090
queryType: promql
query: component:latency:slo_ok_5m{component="prod-comp-1"}
objectives:
- target: 0.95
timeWindows:
- count: 28
unit: Day
isRolling: true
With the evaluation engine being smart enough to compute this by creating the following query for prom and just storing the result:
avg_over_time(component:latency:slo_ok_5m{component="prod-comp-1"}[28d])
Hmm, I think I misunderstood the goal of this issue. I'll move over to my own issue :)
@ian-bartholomew In sloth.dev definition I do it by calculating a raw error_query_ratio it as sum of all good events and all totals from SLIs.