Task metrics not getting published to prometheus
Please provide a detailed title (e.g. "Broker crashes when using TopN query with Bound filter" instead of just "Broker crashes").
Affected Version
The Druid version where the problem was encountered : 29.0.1
Description
Task metrics like task_run_time, are commented with # when the metrics page is looked up This metric is only available if the TaskCountStatsMonitor module is included.
Please include as much detailed information about the problem as possible.
- Running it a single server config small
- Configurations in use
Common.runtime.properties:
druid.emitter=prometheus druid.emitter.prometheus.strategy=exporter druid.emitter.prometheus.port=8088 druid.prometheus.emitter.monitored-processes=["broker", "historical", "realtime", "overlord", "middleManager", "coordinator"] druid.server.http.healthCheck=true
Coordinator. runtime.properties
druid.monitoring.monitors=["org.apache.druid.server.metrics.TaskCountStatsMonitor"]
druid.emitter.prometheus.port=8089 druid.emitter=prometheus druid.emitter.prometheus.strategy=exporter druid.emitter.prometheus.http.type=multi druid.emitter.prometheus.http.multi.feed=[metrics, task, ingest]
on looking up ip:8088, ip:8089
HELP druid_task_failed_count_total Number of failed tasks per emission period. This metric is only available if the TaskCountStatsMonitor module is included.
TYPE druid_task_failed_count_total counter
This issue has been marked as stale due to 280 days of inactivity. It will be closed in 4 weeks if no further activity occurs. If this issue is still relevant, please simply write any comment. Even if closed, you can still revive the issue at any time or discuss it on the [email protected] list. Thank you for your contributions.
@vipul-mykaarma ,
I see a few params that you have specified are not available for the prometheus emitter. Can you check if the following basic options are working for you? With the below configurations, I was able to see the task_run_time metrics on the Prometheus endpoint.
- In
conf/druid/single-server/small/_common/common.runtime.properties
druid.emitter=prometheus
druid.emitter.prometheus.addServiceAsLabel=true
druid.emitter.prometheus.addHostAsLabel=true
druid.emitter.prometheus.flushPeriod=5
druid.emitter.prometheus.port=8088
druid.emitter.logging.logLevel=debug
- In
conf/druid/single-server/small/coordinator-overlord/runtime.properties:
druid.monitoring.monitors=["org.apache.druid.java.util.metrics.JvmMonitor", "org.apache.druid.server.metrics.TaskCountStatsMonitor"]
druid.emitter=prometheus
druid.emitter.prometheus.addServiceAsLabel=true
druid.emitter.prometheus.addHostAsLabel=true
druid.emitter.prometheus.flushPeriod=5
druid.emitter.prometheus.port=8089
druid.emitter.logging.logLevel=debug
- Prometheus Metrics Endpoint Result:
% curl http://localhost:8089/metrics | grep task | grep -v "#"
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 27032 100 27032 0 0 13.1M 0 --:--:-- --:--:-- --:--:-- 25.7M
druid_segment_added_bytes_total{dataSource="wikipedia",druid_service="druid/coordinator",host_name="localhost:8081",taskType="index_parallel",} 6525055.0
druid_task_run_time_bucket{dataSource="wikipedia",druid_service="druid/coordinator",host_name="localhost:8081",taskType="index_parallel",le="0.1",} 0.0
druid_task_run_time_bucket{dataSource="wikipedia",druid_service="druid/coordinator",host_name="localhost:8081",taskType="index_parallel",le="0.25",} 0.0
druid_task_run_time_bucket{dataSource="wikipedia",druid_service="druid/coordinator",host_name="localhost:8081",taskType="index_parallel",le="0.5",} 0.0
druid_task_run_time_bucket{dataSource="wikipedia",druid_service="druid/coordinator",host_name="localhost:8081",taskType="index_parallel",le="0.75",} 0.0
druid_task_run_time_bucket{dataSource="wikipedia",druid_service="druid/coordinator",host_name="localhost:8081",taskType="index_parallel",le="1.0",} 0.0
druid_task_run_time_bucket{dataSource="wikipedia",druid_service="druid/coordinator",host_name="localhost:8081",taskType="index_parallel",le="2.5",} 0.0
druid_task_run_time_bucket{dataSource="wikipedia",druid_service="druid/coordinator",host_name="localhost:8081",taskType="index_parallel",le="5.0",} 0.0
druid_task_run_time_bucket{dataSource="wikipedia",druid_service="druid/coordinator",host_name="localhost:8081",taskType="index_parallel",le="7.5",} 1.0
druid_task_run_time_bucket{dataSource="wikipedia",druid_service="druid/coordinator",host_name="localhost:8081",taskType="index_parallel",le="10.0",} 1.0
druid_task_run_time_bucket{dataSource="wikipedia",druid_service="druid/coordinator",host_name="localhost:8081",taskType="index_parallel",le="30.0",} 1.0
druid_task_run_time_bucket{dataSource="wikipedia",druid_service="druid/coordinator",host_name="localhost:8081",taskType="index_parallel",le="60.0",} 1.0
druid_task_run_time_bucket{dataSource="wikipedia",druid_service="druid/coordinator",host_name="localhost:8081",taskType="index_parallel",le="120.0",} 1.0
druid_task_run_time_bucket{dataSource="wikipedia",druid_service="druid/coordinator",host_name="localhost:8081",taskType="index_parallel",le="300.0",} 1.0
druid_task_run_time_bucket{dataSource="wikipedia",druid_service="druid/coordinator",host_name="localhost:8081",taskType="index_parallel",le="+Inf",} 1.0
druid_task_run_time_count{dataSource="wikipedia",druid_service="druid/coordinator",host_name="localhost:8081",taskType="index_parallel",} 1.0
druid_task_run_time_sum{dataSource="wikipedia",druid_service="druid/coordinator",host_name="localhost:8081",taskType="index_parallel",} 6.722
druid_task_success_count_total{dataSource="wikipedia",druid_service="druid/coordinator",host_name="localhost:8081",} 1.0
druid_segment_added_bytes_created{dataSource="wikipedia",druid_service="druid/coordinator",host_name="localhost:8081",taskType="index_parallel",} 1.740615766243E9
druid_task_run_time_created{dataSource="wikipedia",druid_service="druid/coordinator",host_name="localhost:8081",taskType="index_parallel",} 1.740615766916E9
druid_task_success_count_created{dataSource="wikipedia",druid_service="druid/coordinator",host_name="localhost:8081",} 1.740615776999E9
This issue has been marked as stale due to 280 days of inactivity. It will be closed in 4 weeks if no further activity occurs. If this issue is still relevant, please simply write any comment. Even if closed, you can still revive the issue at any time or discuss it on the [email protected] list. Thank you for your contributions.