[HUDI-1881]: draft implementation for trigger based on data availability
Tips
- Thank you very much for contributing to Apache Hudi.
- Please review https://hudi.apache.org/contribute/how-to-contribute before opening a pull request.
What is the purpose of the pull request
HoodieMultiTableDeltaStreamer needs to change its way of execution and implement thread pools to be able to ingest tables parallely. Draft implementation for trigger based on data availability as suggested by @vinothchandar here - https://github.com/apache/hudi/pull/3929#pullrequestreview-800810611.
Brief change log
Added new method isDataAvailableForIngestion() in DeltaSync to be called from HoodieMultiTableDeltaStreamer's sync() method.
Verify this pull request
Will add tests once design is approved.
Committer checklist
-
[ ] Has a corresponding JIRA in PR title & commit
-
[ ] Commit message is descriptive of the change
-
[ ] CI is green
-
[ ] Necessary doc changes done or have another open PR
-
[ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
@pratyakshsharma : once the patch is ready, do ping me here. I can review
@pratyakshsharma are you still working on this PR?
@yihua yes, I plan to complete it this week.
@hudi-bot run azure
@nsivabalan please take a pass, this should be good to review.
@nsivabalan @vinothchandar ping!
I need this PR to be merged for one of my project. Can you please merge this request as soon as possible.
@nsivabalan @yihua @codope ping!
Looking forward to this PR being merged.
@nsivabalan @vinothchandar @yihua users are asking for this PR. Can we review this anytime soon? :)
I am also looking forward to this PR being merged 😄
I'm waiting for this PR (or any possible solution to the continuous mode for MultiTableDeltaStreamer) as well
CI report:
- b7203e6d2d6f1e8d3121024faedfa2da1ccc0c71 Azure: SUCCESS
- 518758403252fd03ca77eb8977dda217575efecc UNKNOWN
Bot commands
@hudi-bot supports the following commands:-
@hudi-bot run azurere-run the last Azure build
Hi @pratyakshsharma @nsivabalan Can anyone let me know about the latest update on this limitation of HoodieMultiTableStreamer ? Where running HoodieMultiTableStreamer in --continuous mode seems to ingest only the first table.
@nsivabalan should be able to share the latest on this I believe.
nope. I don't think we have a fix for that yet. let us know if you are interested. If you would like to contribute, we can guide w/ the solution and review the patch and assist w/ landing. If not, let us know if this is something you want to see in next release. we can try our best to allot resource to work on this.
@nsivabalan I am willing to take it up. I have already given it a shot in this PR. Let me know if you wish to discuss on slack or here itself about the solution.
@nsivabalan Siva, it seems @pratyakshsharma has already made an attempt at addressing the issue in the PR. Given the importance of the use-case and the progress made so far, it would be beneficial if @pratyakshsharma continues working on it. Having this feature would greatly contribute to resolving our use-case and making our lives easy. Let's encourage Pratyaksh to proceed.
I cherry picked this commit into Hudi 0.14.1 and did some minor changes and it seems to working fine for me.