hudi icon indicating copy to clipboard operation
hudi copied to clipboard

[HUDI-1881]: draft implementation for trigger based on data availability

Open pratyakshsharma opened this issue 3 years ago • 19 comments

Tips

  • Thank you very much for contributing to Apache Hudi.
  • Please review https://hudi.apache.org/contribute/how-to-contribute before opening a pull request.

What is the purpose of the pull request

HoodieMultiTableDeltaStreamer needs to change its way of execution and implement thread pools to be able to ingest tables parallely. Draft implementation for trigger based on data availability as suggested by @vinothchandar here - https://github.com/apache/hudi/pull/3929#pullrequestreview-800810611.

Brief change log

Added new method isDataAvailableForIngestion() in DeltaSync to be called from HoodieMultiTableDeltaStreamer's sync() method.

Verify this pull request

Will add tests once design is approved.

Committer checklist

  • [ ] Has a corresponding JIRA in PR title & commit

  • [ ] Commit message is descriptive of the change

  • [ ] CI is green

  • [ ] Necessary doc changes done or have another open PR

  • [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

pratyakshsharma avatar Mar 19 '22 10:03 pratyakshsharma

@pratyakshsharma : once the patch is ready, do ping me here. I can review

nsivabalan avatar Apr 15 '22 20:04 nsivabalan

@pratyakshsharma are you still working on this PR?

yihua avatar Sep 13 '22 05:09 yihua

@yihua yes, I plan to complete it this week.

pratyakshsharma avatar Sep 15 '22 08:09 pratyakshsharma

@hudi-bot run azure

pratyakshsharma avatar Oct 08 '22 18:10 pratyakshsharma

@nsivabalan please take a pass, this should be good to review.

pratyakshsharma avatar Oct 08 '22 21:10 pratyakshsharma

@nsivabalan @vinothchandar ping!

pratyakshsharma avatar Oct 27 '22 17:10 pratyakshsharma

I need this PR to be merged for one of my project. Can you please merge this request as soon as possible.

Priyanka128 avatar Oct 31 '22 12:10 Priyanka128

@nsivabalan @yihua @codope ping!

pratyakshsharma avatar Nov 15 '22 18:11 pratyakshsharma

Looking forward to this PR being merged.

sharfarazbaari-wk avatar Dec 21 '22 09:12 sharfarazbaari-wk

@nsivabalan @vinothchandar @yihua users are asking for this PR. Can we review this anytime soon? :)

pratyakshsharma avatar Dec 21 '22 10:12 pratyakshsharma

I am also looking forward to this PR being merged 😄

sydneyhoran avatar Apr 13 '23 17:04 sydneyhoran

I'm waiting for this PR (or any possible solution to the continuous mode for MultiTableDeltaStreamer) as well

joe-shad avatar Jun 26 '23 11:06 joe-shad

CI report:

  • b7203e6d2d6f1e8d3121024faedfa2da1ccc0c71 Azure: SUCCESS
  • 518758403252fd03ca77eb8977dda217575efecc UNKNOWN
Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

hudi-bot avatar Jun 26 '23 14:06 hudi-bot

Hi @pratyakshsharma @nsivabalan Can anyone let me know about the latest update on this limitation of HoodieMultiTableStreamer ? Where running HoodieMultiTableStreamer in --continuous mode seems to ingest only the first table.

Sarfaraz-214 avatar Dec 26 '23 10:12 Sarfaraz-214

@nsivabalan should be able to share the latest on this I believe.

pratyakshsharma avatar Dec 26 '23 13:12 pratyakshsharma

nope. I don't think we have a fix for that yet. let us know if you are interested. If you would like to contribute, we can guide w/ the solution and review the patch and assist w/ landing. If not, let us know if this is something you want to see in next release. we can try our best to allot resource to work on this.

nsivabalan avatar Jan 03 '24 19:01 nsivabalan

@nsivabalan I am willing to take it up. I have already given it a shot in this PR. Let me know if you wish to discuss on slack or here itself about the solution.

pratyakshsharma avatar Jan 04 '24 06:01 pratyakshsharma

@nsivabalan Siva, it seems @pratyakshsharma has already made an attempt at addressing the issue in the PR. Given the importance of the use-case and the progress made so far, it would be beneficial if @pratyakshsharma continues working on it. Having this feature would greatly contribute to resolving our use-case and making our lives easy. Let's encourage Pratyaksh to proceed.

Sarfaraz-214 avatar Jan 08 '24 07:01 Sarfaraz-214

I cherry picked this commit into Hudi 0.14.1 and did some minor changes and it seems to working fine for me.

Sarfaraz-214 avatar Jan 09 '24 13:01 Sarfaraz-214