druid icon indicating copy to clipboard operation
druid copied to clipboard

Throughput by volume from #10407

Open somu-imply opened this issue 3 years ago • 4 comments

Throughput by volume from #10407

Currently there is no way to know how much data is processed by task during ingestion. This PR adds ingest/events/processedBytes metric to emit number of bytes read since last emission time.

This PR adds InputStats class which is present in all task types and acts as holder for task level counts like processed bytes in this case. Thus standardized metrics throughout the task types can be added in future and emitted using InputStatsMonitor which is automatically initialized for all tasks

This PR provides convenient wrapper class named CountableInputEntity which can warp any InputEntity to count number of bytes processed through that InputEntity, thus its easier for new implementations to emit this metric just by wrapping the base input entity in this while creating InputEntityIteratingReader

Since Kafka and Kinesis does not use InputEntity, therefore processed bytes is increment directly in SeekableStreamIndexTaskRunner as it has access to InputStats

With this is place, druid will be able to report throughput by volume

This is a work in progress. have updated the PR to have the monitor emitted correctly now. I am using the following with wikipedia data

druid.monitoring.monitors=["org.apache.druid.java.util.metrics.InputStatsMonitor"]
druid.emitter=logging
druid.emitter.logging.logLevel=info
druid.monitoring.emissionPeriod=PT1S

There would be an output of the amount of bytes processed per emission period. With the PR, the task report is having outputs such as

2022-07-07T17:58:51.547Z","service":"druid/middleManager","host":"localhost:8100","version":"0.24.0-SNAPSHOT","metric":"ingest/events/processedBytes","value":17122640,"dataSource":["wikipedia"],"taskId":["index_parallel_wikipedia_hnpcdnad_2022-07-07T17:58:25.391Z"],"taskType":["index"]}
2022-07-07T17:58:52.550Z","service":"druid/middleManager","host":"localhost:8100","version":"0.24.0-SNAPSHOT","metric":"ingest/events/processedBytes","value":16928464,"dataSource":["wikipedia"],"taskId":["index_parallel_wikipedia_hnpcdnad_2022-07-07T17:58:25.391Z"],"taskType":["index"]}
2022-07-07T17:58:53.555Z","service":"druid/middleManager","host":"localhost:8100","version":"0.24.0-SNAPSHOT","metric":"ingest/events/processedBytes","value":161408,"dataSource":["wikipedia"],"taskId":["index_parallel_wikipedia_hnpcdnad_2022-07-07T17:58:25.391Z"],"taskType":["index"]}
2022-07-07T17:58:54.558Z","service":"druid/middleManager","host":"localhost:8100","version":"0.24.0-SNAPSHOT","metric":"ingest/events/processedBytes","value":0,"dataSource":["wikipedia"],"taskId":["index_parallel_wikipedia_hnpcdnad_2022-07-07T17:58:25.391Z"],"taskType":["index"]}

This PR has:

  • [ ] been self-reviewed.
    • [ ] using the concurrency checklist (Remove this item if the PR doesn't have any relation to concurrency.)
  • [ ] added documentation for new or modified features or behaviors.
  • [ ] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • [ ] added or updated version, license, or notice information in licenses.yaml
  • [ ] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • [ ] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • [ ] added integration tests.
  • [ ] been tested in a test Druid cluster.

somu-imply avatar Jul 06 '22 14:07 somu-imply

Don't see documentation changes in this PR like this - https://github.com/apache/druid/pull/10407/files#diff-4153c0d20e748deebfabfa5f90696f686d98ff83362e7fea73937a025ed42db5

pjain1 avatar Jul 11 '22 11:07 pjain1

I'll add in the documentation shortly after I orchestrate the test changes and get past some of the travis failures

somu-imply avatar Jul 13 '22 19:07 somu-imply

@somu-imply , thanks for adding this new monitoring. One suggestion; would you add more details to the git message with more explanation of what was implemented here and why, and how things are implemented. This information will help with the review.

zachjsh avatar Jul 20 '22 06:07 zachjsh

@zachjsh this PR is just a refactoring of #10407. I have updated the description from there in this as well so to keep all info at one place. I'll add the javadocs and the docs shortly

somu-imply avatar Jul 21 '22 03:07 somu-imply

Closing this as #13520 is already merged.

kfaraz avatar Jan 14 '23 05:01 kfaraz