hudi icon indicating copy to clipboard operation
hudi copied to clipboard

[HUDI-7507] Adding timestamp ordering validation before creating requested timeli…

Open nsivabalan opened this issue 1 year ago • 4 comments

Change Logs

When multiple writers trigger table services, there is a chance that one of them could create requested in a different ordering compared to the actual timestamp. Linkedin jira has more details of the scenario w/ an illustration. This patch, ensure that before creating a requested entry in the timeline, there is no other instant greater than the current instant time.

Impact

No unintended gaps wrt multi-writers and timestamp ordering intricacies.

Risk level (write none, low medium or high below)

low

Documentation Update

Describe any necessary documentation update if there is any new feature, config, or user-facing change. If not, put "none".

  • The config description must be updated if new configs are added or the default value of the configs are changed
  • Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the instruction to make changes to the website.

Contributor's checklist

  • [ ] Read through contributor's guide
  • [ ] Change Logs and Impact were stated clearly
  • [ ] Adequate tests were added if applicable
  • [ ] CI passed

nsivabalan avatar May 27 '24 23:05 nsivabalan

This patch, ensure that before creating a requested entry in the timeline, there is no other instant greater than the current instant time.

The instant time is just a snapshot id, I don't think we should introduce this limitation.

I agree that ideally this wouldn't be necessary and all operations would be able to create a plan with a desired timestamp and then abort if during the pre-commit validation transaction the timeline is found to be in a conflicting state. The issue though is (by default) once a table service plan (compact/clean/cluster) is scheduled it can't be aborted and removed later. This means that if one of the scenarios in HUDI-7507 arises, then by the time the compact/clean/cluster write is about to commit, if it sees that earliest instants have now appeared in the timeline (and conflict with the plan), it will not be able to complete (which is desired) but also not be able to abort.

kbuci avatar Jun 26 '24 18:06 kbuci

@nsivabalan Will your changes here also allow clean to perform the timestamp order validation? Or will that be handled by a later PR?

kbuci avatar Jun 26 '24 18:06 kbuci

timeline is found to be in a conflicting state. The issue though is (by default) once a table service plan (compact/clean/cluster) is scheduled it can't be aborted and removed later. This means that if one of the scenarios in HUDI-7507 arises, then by the time the compact/clean/cluster write is about to commit, if it sees that earliest instants have now appeared in the timeline (and conflict with the plan), it will not be able to complete (which is desired) but also not be able to abort.

If it aims to improve the conflict resolution, since 1.x we have introduced the instant completion time notion, which is always monotonically increasing.

danny0405 avatar Jun 28 '24 00:06 danny0405

timeline is found to be in a conflicting state. The issue though is (by default) once a table service plan (compact/clean/cluster) is scheduled it can't be aborted and removed later. This means that if one of the scenarios in HUDI-7507 arises, then by the time the compact/clean/cluster write is about to commit, if it sees that earliest instants have now appeared in the timeline (and conflict with the plan), it will not be able to complete (which is desired) but also not be able to abort.

If it aims to improve the conflict resolution, since 1.x we have introduced the instant completion time notion, which is always monotonically increasing.

Yes my understanding is that HUDI-7507 doesn't apply and isn't needed for 1.x https://issues.apache.org/jira/browse/HUDI-7507?focusedCommentId=17828142&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17828142 . Though if we want to resolve HUDI-7507 in 0.15 then I think we would need to apply this PR (or an alternate solution) to that version

kbuci avatar Jul 01 '24 21:07 kbuci

hey @danny0405 : yes. I will raise a patch against 0.x branch. we may not need it for 1.x. Or we can debate if its required for 1.x. but for 0.x branch. we definitely need it.

nsivabalan avatar Jul 05 '24 19:07 nsivabalan

here is the patch for 0.x branch https://github.com/apache/hudi/pull/11580

nsivabalan avatar Jul 05 '24 19:07 nsivabalan

CI report:

  • e9c58f48cb1d142f18f362632af28dcf651b51a4 Azure: FAILURE
Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

hudi-bot avatar Jul 05 '24 21:07 hudi-bot

Closing this as Hudi 1.0 / master does not necessarily need the start time ordering.

yihua avatar Sep 19 '24 05:09 yihua