Prometheus extension adding underscores to datasource names in some metrics
The prometheus extension is reporting the dataSource wrapped in underscores in some metrics, while as in in others it's reported as is.
Affected Version
I noticed this in Druid 31, it must have been introduced as a regression earlier, as this used to work.
Description
I am using prometheus extension to source the following grafana alert:
max_over_time(druid_ingest_kafka_lag{dataSource!~".*test.*"}[3m]) / on (dataSource) sum(rate(druid_ingest_events_processed_total{dataSource!~".*test.*"}[3m])) by (dataSource) > 60*15
In words: Alert if the ingestion lag is more than 15min based on the reported consumption speed. Like I said, this used to work just fine.
However:
druid_ingest_events_processed_total now reports dataSource with underscores, i.e. "_foo_" instead of "foo", while druid_ingest_kafka_lag still reports the dataSource as "foo". Thus they don't match anymore and my alert when silent.
Expected behavior
All prometheus metrics should report the dataSource as is and not wrap them in underscores. And if that's a feature, at least be consistent.
I was able to circle it in a bit.
It seems to affect all (but only) metrics starting with "ingest" and tracing a bit through the git changes, I came to
https://github.com/apache/druid/blob/master/indexing-service/src/main/java/org/apache/druid/indexing/common/TaskRealtimeMetricsMonitorBuilder.java#L45
being added 7 months ago, just in time to qualify as the source of my issue.
@clintropolis any thoughts?
@sixtus,
-
For the
loggingemitter, I am able to see thedataSourcename correctly emitted, without any underscore (_) decorations. -
For Prometheus emitter, quick callout for reformatting of metric names and labels: https://druid.apache.org/docs/latest/development/extensions-contrib/prometheus/#metric-names
Code References:
- https://github.com/apache/druid/blob/master/extensions-contrib/prometheus-emitter/src/main/java/org/apache/druid/emitter/prometheus/Metrics.java#L90
- https://github.com/apache/druid/blob/master/extensions-contrib/prometheus-emitter/src/main/java/org/apache/druid/emitter/prometheus/PrometheusEmitter.java#L154
-
Despite that being said, even I am curious why would the
dataSourcename be decorated with underscores only for a subset of metrics. Maybe, try running the unit tests (https://github.com/apache/druid/blob/master/extensions-contrib/prometheus-emitter/src/test/java/org/apache/druid/emitter/prometheus/PrometheusEmitterTest.java) with your data source configurations, and see if the same behavior is seen.
Thanks for helping in reproducing the issue, @sixtus ! I see that in all emitters, we see the dataSource values decorated.
- Prometheus Emitter: Seen as
_dataSource_.... (_decoration) - Logging Emitter: Seen as
[dataSource].... ([]decoration) - Kafka Emitter: Seen as:
[dataSource].... ([]decoration)
Root Cause
- As you pointed above, the issue does stem from https://github.com/apache/druid/blob/master/indexing-service/src/main/java/org/apache/druid/indexing/common/TaskRealtimeMetricsMonitorBuilder.java#L45 ; in which are creating a map of
<String, String[]>. - This map is then used for all the
ingest/*metrics in https://github.com/apache/druid/blob/master/indexing-service/src/main/java/org/apache/druid/indexing/common/stats/TaskRealtimeMetricsMonitor.java ; and because the value data type in the Map is an array, all emitters show the string value decorated with[]. - Rather all the dimensions from that map are decorated. For example, if we look at this log line from Kafka Emitter, notice that the four keys of the map (
taskType, groupId, dataSource, taskId) are all decorated with[]
{"feed":"metrics","taskType":["index_kafka"],"metric":"ingest/events/thrownAway","service":"druid/middleManager","groupId":["index_kafka_kttm"],"host":"localhost:8100","version":"31.0.0","value":0,"dataSource":["kttm"],"taskId":["index_kafka_kttm_625540c86347b10_ahfoelmb"],"timestamp":"2025-02-20T01:53:20.928Z"}
- Prometheus Emitter specifically performs a replaceAll operation (https://github.com/apache/druid/blob/master/extensions-contrib/prometheus-emitter/src/main/java/org/apache/druid/emitter/prometheus/PrometheusEmitter.java#L154) replacing all non-alphanumeric characters with
_ - Hence, we see that Prometheus Emitter is decorating the
dataSourcevalues with_only for some metrics starting withingest/*
Similar Issue for SQL Query Metrics
- While debugging this issue, I came across a very similar issue with
sqlQuery*metrics, which are also getting decorated . Logged a separate issue (https://github.com/apache/druid/issues/17743) for this because the root cause of it is at a different location, and the code changes for it will be different than that we choose for this issue.
Discussion Points and Solution Approaches
- Do we need the Map to be
<String, String[]>type here (https://github.com/apache/druid/blob/master/indexing-service/src/main/java/org/apache/druid/indexing/common/TaskRealtimeMetricsMonitorBuilder.java#L45)?- Can't it be
<String, String>type? From the looks of it, all the four dimensions are singular, right? - If we make this change, then the dimensions will contain plain string values, and any conversions from String array to String will not happen, and hence no decorations would take place for any dimension.
- Can't it be
- Another option is to handle it in Prometheus Emitter
- Explicitly remove the heading
[and trailing]characters fromuserDim.toString()before the replace operation, https://github.com/apache/druid/blob/master/extensions-contrib/prometheus-emitter/src/main/java/org/apache/druid/emitter/prometheus/PrometheusEmitter.java#L154 - But, this approach is not a great one, because it caters only to Prometheus emitter, and I think the issue should be solved for all emitters
- Explicitly remove the heading
- @gianm , @clintropolis , any thoughts to share? (Tagging because git history shows you recently worked on these code pieces)
Hi @gianm , @clintropolis , any thoughts on the above solutions? Thanks!
This issue has been marked as stale due to 280 days of inactivity. It will be closed in 4 weeks if no further activity occurs. If this issue is still relevant, please simply write any comment. Even if closed, you can still revive the issue at any time or discuss it on the [email protected] list. Thank you for your contributions.