druid icon indicating copy to clipboard operation
druid copied to clipboard

Task metrics not getting published to prometheus

Open vipul-mykaarma opened this issue 1 year ago • 3 comments

Please provide a detailed title (e.g. "Broker crashes when using TopN query with Bound filter" instead of just "Broker crashes").

Affected Version

The Druid version where the problem was encountered : 29.0.1

Description

Task metrics like task_run_time, are commented with # when the metrics page is looked up This metric is only available if the TaskCountStatsMonitor module is included.

Please include as much detailed information about the problem as possible.

  • Running it a single server config small
  • Configurations in use

Common.runtime.properties:

druid.emitter=prometheus druid.emitter.prometheus.strategy=exporter druid.emitter.prometheus.port=8088 druid.prometheus.emitter.monitored-processes=["broker", "historical", "realtime", "overlord", "middleManager", "coordinator"] druid.server.http.healthCheck=true

Coordinator. runtime.properties

druid.monitoring.monitors=["org.apache.druid.server.metrics.TaskCountStatsMonitor"]

druid.emitter.prometheus.port=8089 druid.emitter=prometheus druid.emitter.prometheus.strategy=exporter druid.emitter.prometheus.http.type=multi druid.emitter.prometheus.http.multi.feed=[metrics, task, ingest]

on looking up ip:8088, ip:8089

HELP druid_task_failed_count_total Number of failed tasks per emission period. This metric is only available if the TaskCountStatsMonitor module is included.

TYPE druid_task_failed_count_total counter

vipul-mykaarma avatar May 21 '24 08:05 vipul-mykaarma

This issue has been marked as stale due to 280 days of inactivity. It will be closed in 4 weeks if no further activity occurs. If this issue is still relevant, please simply write any comment. Even if closed, you can still revive the issue at any time or discuss it on the [email protected] list. Thank you for your contributions.

github-actions[bot] avatar Feb 26 '25 00:02 github-actions[bot]

@vipul-mykaarma , I see a few params that you have specified are not available for the prometheus emitter. Can you check if the following basic options are working for you? With the below configurations, I was able to see the task_run_time metrics on the Prometheus endpoint.

  • In conf/druid/single-server/small/_common/common.runtime.properties
druid.emitter=prometheus
druid.emitter.prometheus.addServiceAsLabel=true
druid.emitter.prometheus.addHostAsLabel=true
druid.emitter.prometheus.flushPeriod=5
druid.emitter.prometheus.port=8088
druid.emitter.logging.logLevel=debug
  • In conf/druid/single-server/small/coordinator-overlord/runtime.properties:
druid.monitoring.monitors=["org.apache.druid.java.util.metrics.JvmMonitor", "org.apache.druid.server.metrics.TaskCountStatsMonitor"]

druid.emitter=prometheus
druid.emitter.prometheus.addServiceAsLabel=true
druid.emitter.prometheus.addHostAsLabel=true
druid.emitter.prometheus.flushPeriod=5
druid.emitter.prometheus.port=8089
druid.emitter.logging.logLevel=debug
  • Prometheus Metrics Endpoint Result:
 % curl http://localhost:8089/metrics | grep task  | grep -v "#"
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 27032  100 27032    0     0  13.1M      0 --:--:-- --:--:-- --:--:-- 25.7M
druid_segment_added_bytes_total{dataSource="wikipedia",druid_service="druid/coordinator",host_name="localhost:8081",taskType="index_parallel",} 6525055.0
druid_task_run_time_bucket{dataSource="wikipedia",druid_service="druid/coordinator",host_name="localhost:8081",taskType="index_parallel",le="0.1",} 0.0
druid_task_run_time_bucket{dataSource="wikipedia",druid_service="druid/coordinator",host_name="localhost:8081",taskType="index_parallel",le="0.25",} 0.0
druid_task_run_time_bucket{dataSource="wikipedia",druid_service="druid/coordinator",host_name="localhost:8081",taskType="index_parallel",le="0.5",} 0.0
druid_task_run_time_bucket{dataSource="wikipedia",druid_service="druid/coordinator",host_name="localhost:8081",taskType="index_parallel",le="0.75",} 0.0
druid_task_run_time_bucket{dataSource="wikipedia",druid_service="druid/coordinator",host_name="localhost:8081",taskType="index_parallel",le="1.0",} 0.0
druid_task_run_time_bucket{dataSource="wikipedia",druid_service="druid/coordinator",host_name="localhost:8081",taskType="index_parallel",le="2.5",} 0.0
druid_task_run_time_bucket{dataSource="wikipedia",druid_service="druid/coordinator",host_name="localhost:8081",taskType="index_parallel",le="5.0",} 0.0
druid_task_run_time_bucket{dataSource="wikipedia",druid_service="druid/coordinator",host_name="localhost:8081",taskType="index_parallel",le="7.5",} 1.0
druid_task_run_time_bucket{dataSource="wikipedia",druid_service="druid/coordinator",host_name="localhost:8081",taskType="index_parallel",le="10.0",} 1.0
druid_task_run_time_bucket{dataSource="wikipedia",druid_service="druid/coordinator",host_name="localhost:8081",taskType="index_parallel",le="30.0",} 1.0
druid_task_run_time_bucket{dataSource="wikipedia",druid_service="druid/coordinator",host_name="localhost:8081",taskType="index_parallel",le="60.0",} 1.0
druid_task_run_time_bucket{dataSource="wikipedia",druid_service="druid/coordinator",host_name="localhost:8081",taskType="index_parallel",le="120.0",} 1.0
druid_task_run_time_bucket{dataSource="wikipedia",druid_service="druid/coordinator",host_name="localhost:8081",taskType="index_parallel",le="300.0",} 1.0
druid_task_run_time_bucket{dataSource="wikipedia",druid_service="druid/coordinator",host_name="localhost:8081",taskType="index_parallel",le="+Inf",} 1.0
druid_task_run_time_count{dataSource="wikipedia",druid_service="druid/coordinator",host_name="localhost:8081",taskType="index_parallel",} 1.0
druid_task_run_time_sum{dataSource="wikipedia",druid_service="druid/coordinator",host_name="localhost:8081",taskType="index_parallel",} 6.722
druid_task_success_count_total{dataSource="wikipedia",druid_service="druid/coordinator",host_name="localhost:8081",} 1.0
druid_segment_added_bytes_created{dataSource="wikipedia",druid_service="druid/coordinator",host_name="localhost:8081",taskType="index_parallel",} 1.740615766243E9
druid_task_run_time_created{dataSource="wikipedia",druid_service="druid/coordinator",host_name="localhost:8081",taskType="index_parallel",} 1.740615766916E9
druid_task_success_count_created{dataSource="wikipedia",druid_service="druid/coordinator",host_name="localhost:8081",} 1.740615776999E9

ashwintumma23 avatar Feb 27 '25 18:02 ashwintumma23

This issue has been marked as stale due to 280 days of inactivity. It will be closed in 4 weeks if no further activity occurs. If this issue is still relevant, please simply write any comment. Even if closed, you can still revive the issue at any time or discuss it on the [email protected] list. Thank you for your contributions.

github-actions[bot] avatar Dec 05 '25 00:12 github-actions[bot]