prometheus Use metric_relabel_configs to drop metrics but still get a high prometheus memory usage

What did you do?

We have the prometheus setup and it scapes metrics from service A, service A has 450k time series, and the prometheus memory usage is 2.6GB.

Now we want promethues scape metrics from both service A and service B, service B has 1300k time series, and we set below metric_relabel_configs setting for service B, it means that we will only keep the time series which contains myMetric at its label. Now we don't have any metric contains myMetric label.

If we check the metric scrape_samples_scraped, the number of time series is 450K+1300K If we check the metric scrape_samples_post_metric_relabeling, the number of time series is 450k

  # scrape from service B
  - job_name: "serviceB"
    scrape_interval: 20s
    metrics_path: "xxx"
    static_configs:
      - targets: ["localhost:xxx"]
    metric_relabel_configs:
      - source_labels: ["myMetric"]
        regex: ".+"
        action: keep

What did you expect to see?

We expect to see the same prometheus memory usage 2.6GB because metric_relabel_configs happened before the data is ingested by the storage system.

What did you see instead? Under which circumstances?

We see a prometheus memory increase from 2.6GB to 3.3GB.

System information

Linux 5.15.0-1059-azure x86_64

Prometheus version

v2.45.0

Prometheus configuration file

No response

Alertmanager version

No response

Alertmanager configuration file

No response

Logs

No response

Mar 25 '24 10:03 siyigao121212

If you can supply a heap profile (curl .../debug/heap and post the output here), we might be able to see what happened.

It's likely that newer versions will show a bit of improvement.

Apr 09 '24 09:04 bboreham

Hi @bboreham Here is the heap profile of the promethues, please check myheap

Apr 15 '24 13:04 siyigao1212