druid icon indicating copy to clipboard operation
druid copied to clipboard

(WIP) Wait for initialization during KubernetesTaskRunner startup

Open georgew5656 opened this issue 2 years ago • 1 comments

This fix attempts to bring the KubernetesTaskRunner more into line with the HttpRemoteTaskRunner (https://github.com/apache/druid/blob/master/indexing-service/src/main/java/org/apache/druid/indexing/overlord/hrtr/HttpRemoteTaskRunner.java#L560) w.r.t startup initialization.

Right now when the overlord becomes a leader using the KubernetesTaskRunner it adds all of the running tasks to its mapping, but doesn't wait for the underlying thread pool to finish syncing state from Kubernetes. This change attempts to do this (although it doesn't fail if it is unable to completely finish syncing)

Description

Best-effort attempt to sync state from Kubernetes completely before becoming the overlord leader when running mm-less ingestion.

In the start() method, after adding all the jobs in kubernetes to the tasks map, try to wait for the underlying thread pool to finish syncing state from K8s.

Release note

Improvments to overlord lifecycle when running mm-less ingestion

Key changed/added classes in this PR
  • KubernetesTaskRunner

This PR has:

  • [X] been self-reviewed.
    • [ ] using the concurrency checklist (Remove this item if the PR doesn't have any relation to concurrency.)
  • [ ] added documentation for new or modified features or behaviors.
  • [ ] a release note entry in the PR description.
  • [ ] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • [ ] added or updated version, license, or notice information in licenses.yaml
  • [ ] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • [X] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • [ ] added integration tests.
  • [ ] been tested in a test Druid cluster.

georgew5656 avatar Sep 26 '23 17:09 georgew5656

This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 4 weeks if no further activity occurs. If you think that's incorrect or this pull request should instead be reviewed, please simply write any comment. Even if closed, you can still revive the PR at any time or discuss it on the [email protected] list. Thank you for your contributions.

github-actions[bot] avatar Mar 06 '24 00:03 github-actions[bot]

This pull request/issue has been closed due to lack of activity. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.

github-actions[bot] avatar Apr 05 '24 00:04 github-actions[bot]