hudi [SUPPORT] Rollback failed clustering 0.12.2

Tips before filing an issue

Have you gone through our FAQs?
Join the mailing list to engage in conversations and get faster support at [email protected].
If you have triaged this as a bug, then file an issue directly.

Describe the problem you faced

Hello, this is a followup from https://github.com/apache/hudi/issues/10878. We managed to run clustering, but I'm obsessed with a potential recovery plan. So what behavior I know - when .commit.requested and .commit.inflight created, but not .commit - then subsequent write will do a rollback. - this works for normal commits. However, if I start clustering - if the job stops before .inflight is created - subsequent write will fail if affects partition present in .replacecommit.requested - controlled by hoodie.clustering.updates.strategy. So here I can only either run clustering from CLI or just delete instant(can you confirm? per code looks like it's safe if there is no .inflight). But - if it fails after start writing files(after .replacecommit.inflight is created, but before .replacecommit is created) - which choices do I have? As I checked through the code - it looks like there is no automatic rollback for replacecommit, and hudi-cli has rollback only for finished instants. Given this, can you answer 2 questions:

If clustering failed after .replacecommit.requested, but before .replacecommit.inflight - is it safe to just delete commit file itself? Recently you added this PR and it looks to be doing exactly this https://github.com/apache/hudi/pull/10645/files
If clustering failed after .replacecommit.inflight, but before .replacecommit - what are the recovery steps? If I understand correctly there is no automatic rollback for it, but i may be wrong(working on reproduction).

To Reproduce

Steps to reproduce the behavior:

Expected behavior

A clear and concise description of what you expected to happen.

Environment Description

Hudi version : 0.12.2
Spark version : 3.3.0
Hive version :
Hadoop version :
Storage (HDFS/S3/GCS..) :
Running on Docker? (yes/no) :

Additional context

Add any other context about the problem here.

Stacktrace

Add the stacktrace of the error.

Apr 05 '24 17:04 VitoMakarevich

Update: managed to reproduce it, after stopping job during clustering, subsequent write fails with the exception Not allowed to update the clustering file group HoodieFileGroupId{partitionPath='partition1=1', fileId='ff2d1ed7-ff77-4a9f-95c6-1b9deeccf105-0'}. For pending clustering operations, we are not going to support update for now.

Apr 05 '24 17:04 VitoMakarevich

I see this property hoodie.clustering.rollback.pending.replacecommit.on.conflict - is it generally safe to use if we have single writer with inline services only? I don't really understand how updates can be accepted to pending clustering as these files are not even produced yet, so I probably would not like to change the strategy from reject to allow. Is this essentially a way to rollback clustering only if there are conflicts, but skip it otherwise? Which problems it might have in Hudi 0.12.2 Spark 3.3.0? no metadata table, timeline server enabled, COW, partitioned, inline clustering, inline eager clean, single writer setup.

Apr 05 '24 17:04 VitoMakarevich

Update - dug into the code clusteringHandleUpdate, and see that if: Updates rejected - write fails. Updates accepted - if(hoodie.clustering.rollback.pending.replacecommit.on.conflict is true) - those pending clustering instants that conflict with update records - rolled back. Updates accepted - if(hoodie.clustering.rollback.pending.replacecommit.on.conflict is false) - pending clustering instants left on commit line, updates made to previous files.

So it looks like switching these 2: hoodie.clustering.updates.strategy -> org.apache.hudi.client.clustering.update.strategy.SparkAllowUpdateStrategy (non-default) hoodie.clustering.rollback.pending.replacecommit.on.conflict -> true(non-default) is generally safe for all operations inline and single writer. e.g. if the commit fails in the middle of clustering - subsequent commit will be run and it will synchronously rollback clustering instants, and writing updates into old files.

Can someone confirm? @nsivabalan And what is the motivation behind having Reject default and how then recover - I'm afraid I'm missing something, as it was consciously configured by default to fail-fast.

Apr 05 '24 18:04 VitoMakarevich

hey @suryaprasanna : Can you take this up and offer some suggestions.

Apr 09 '24 01:04 nsivabalan

I managed to do it with hoodie.clustering.updates.strategy -> org.apache.hudi.client.clustering.update.strategy.SparkAllowUpdateStrategy (non-default) hoodie.clustering.rollback.pending.replacecommit.on.conflict -> true(non-default)

The precondition is that your write should affect clustered partitions, otherwise nothing will happen.

Unfortunately, I don't see any other way to do it(without copypasting some Hudi internals which looks risky for many users).

Apr 12 '24 13:04 VitoMakarevich

Thanks @VitoMakarevich . We were also able to resolve the same error using these two configs only as you suggested.

There is a discussion around fixing this in a long term as part of this JIRA - https://issues.apache.org/jira/browse/HUDI-1045

May 08 '24 10:05 ad1happy2go