[SUPPORT] Hudi Error rolling back using marker files
Tips before filing an issue
-
Have you gone through our FAQs?
-
Join the mailing list to engage in conversations and get faster support at [email protected].
-
If you have triaged this as a bug, then file an issue directly.
Describe the problem you faced
We have recently upgraded hudi to 0.11.0 and application has been running good for last few days. Then all of sudden due to some reason, application is failing to rollback the failed commit using marker files . We are using the timelinebased marker files creation and observed that MARKERS.type file is not created in /.hoodie/.temp/{commitId}/ directory and while rolling back Asynclearer is trying to read the markers as Directly created marker files and failing to find .marker in the file names.
Steps to reproduce the behavior:
- Start the application and keep it running.
- We don't know exactly what has stopped the application, but it started trying to rollback the failed commit and it is failing to do that.
Expected behavior
Run and rollback the failed commits if application has failed.
Environment Description
-
Hudi version : 0.11.0
-
Spark version : 3.2.1
-
Hive version : 2.1.3
-
Hadoop version : 3.x.x
-
Storage (HDFS/S3/GCS..) : GCS
-
Running on Docker? (yes/no) : yes(k8's).
Additional context
Stacktrace
Caused by: java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieRollbackException: Failed to rollback gs://xxx/hudi/xxxx/ commits 20220514180424825
at java.base/java.util.concurrent.CompletableFuture.reportGet(Unknown Source)
at java.base/java.util.concurrent.CompletableFuture.get(Unknown Source)
at org.apache.hudi.async.HoodieAsyncService.waitForShutdown(HoodieAsyncService.java:103)
at org.apache.hudi.async.AsyncCleanerService.waitForCompletion(AsyncCleanerService.java:75)
... 11 more
Caused by: org.apache.hudi.exception.HoodieRollbackException: Failed to rollback gs://xxx/hudi/xxx/ commits 20220514180424825
at org.apache.hudi.client.BaseHoodieWriteClient.rollback(BaseHoodieWriteClient.java:783)
at org.apache.hudi.client.BaseHoodieWriteClient.rollbackFailedWrites(BaseHoodieWriteClient.java:1193)
at org.apache.hudi.client.BaseHoodieWriteClient.rollbackFailedWrites(BaseHoodieWriteClient.java:1176)
at org.apache.hudi.client.BaseHoodieWriteClient.lambda$clean$33796fd2$1(BaseHoodieWriteClient.java:856)
at org.apache.hudi.common.util.CleanerUtils.rollbackFailedWrites(CleanerUtils.java:142)
at org.apache.hudi.client.BaseHoodieWriteClient.clean(BaseHoodieWriteClient.java:855)
at org.apache.hudi.client.BaseHoodieWriteClient.clean(BaseHoodieWriteClient.java:825)
at org.apache.hudi.async.AsyncCleanerService.lambda$startService$0(AsyncCleanerService.java:55)
... 4 more
Caused by: org.apache.hudi.exception.HoodieRollbackException: Error rolling back using marker files written for [==>20220514180424825__commit__INFLIGHT]
at org.apache.hudi.table.action.rollback.MarkerBasedRollbackStrategy.getRollbackRequests(MarkerBasedRollbackStrategy.java:103)
at org.apache.hudi.table.action.rollback.BaseRollbackPlanActionExecutor.requestRollback(BaseRollbackPlanActionExecutor.java:109)
at org.apache.hudi.table.action.rollback.BaseRollbackPlanActionExecutor.execute(BaseRollbackPlanActionExecutor.java:132)
at org.apache.hudi.table.HoodieSparkCopyOnWriteTable.scheduleRollback(HoodieSparkCopyOnWriteTable.java:212)
at org.apache.hudi.client.BaseHoodieWriteClient.lambda$rollback$6(BaseHoodieWriteClient.java:757)
at org.apache.hudi.common.util.Option.orElseGet(Option.java:142)
at org.apache.hudi.client.BaseHoodieWriteClient.rollback(BaseHoodieWriteClient.java:757)
... 11 more
Caused by: java.lang.IllegalArgumentException
at org.apache.hudi.common.util.ValidationUtils.checkArgument(ValidationUtils.java:31)
at org.apache.hudi.common.util.MarkerUtils.stripMarkerFolderPrefix(MarkerUtils.java:67)
at org.apache.hudi.table.marker.DirectWriteMarkers.lambda$allMarkerFilePaths$0(DirectWriteMarkers.java:136)
at org.apache.hudi.common.fs.FSUtils.processFiles(FSUtils.java:277)
at org.apache.hudi.table.marker.DirectWriteMarkers.allMarkerFilePaths(DirectWriteMarkers.java:135)
at org.apache.hudi.table.marker.MarkerBasedRollbackUtils.getAllMarkerPaths(MarkerBasedRollbackUtils.java:62)
at org.apache.hudi.table.action.rollback.MarkerBasedRollbackStrategy.getRollbackRequests(MarkerBasedRollbackStrategy.java:76)
... 17 more
@yihua - please do take a look at it.
On checking I have found that MARKER_TYPE_FILENAME file is not created but the marker files are being created using timeline server. Since MARKER_TYPE_FILENAME is not found, hudi is assuming it as a direct marker file creation and trying to validate the MARKERSX to have .marker extension. Instead it should read the content of MARKERSX file and then try to validate the content of it as filenames. I am not certain, where this mismatch is happening.
@BalaMahesh sorry for getting to this issue late. Have you resolved this issue by any chance? If MARKERS.type is not present, the logic assumes that the direct markers are stored, which causes the read failure in your case. I'll check if we can improve the error handling in this case.
@BalaMahesh Do you have application logs? Do you see any of the following lines in the log?
Failed to create marker type file
Failed to write marker type file
Hudi will always throw an exception if marker-type file is not created.
Also, you could try out the following patch #6266
@BalaMahesh : any updates please.
Closing as patch is available and should be released in 0.12.1.