Use metric_relabel_configs to drop metrics but still get a high prometheus memory usage
What did you do?
We have the prometheus setup and it scapes metrics from service A, service A has 450k time series, and the prometheus memory usage is 2.6GB.
Now we want promethues scape metrics from both service A and service B, service B has 1300k time series, and we set below metric_relabel_configs setting for service B, it means that we will only keep the time series which contains myMetric at its label. Now we don't have any metric contains myMetric label.
If we check the metric scrape_samples_scraped, the number of time series is 450K+1300K
If we check the metric scrape_samples_post_metric_relabeling, the number of time series is 450k
# scrape from service B
- job_name: "serviceB"
scrape_interval: 20s
metrics_path: "xxx"
static_configs:
- targets: ["localhost:xxx"]
metric_relabel_configs:
- source_labels: ["myMetric"]
regex: ".+"
action: keep
What did you expect to see?
We expect to see the same prometheus memory usage 2.6GB because metric_relabel_configs happened before the data is ingested by the storage system.
What did you see instead? Under which circumstances?
We see a prometheus memory increase from 2.6GB to 3.3GB.
System information
Linux 5.15.0-1059-azure x86_64
Prometheus version
v2.45.0
Prometheus configuration file
No response
Alertmanager version
No response
Alertmanager configuration file
No response
Logs
No response
If you can supply a heap profile (curl .../debug/heap and post the output here), we might be able to see what happened.
It's likely that newer versions will show a bit of improvement.
Hi @bboreham Here is the heap profile of the promethues, please check