hudi icon indicating copy to clipboard operation
hudi copied to clipboard

[SUPPORT] Hudi Error rolling back using marker files

Open BalaMahesh opened this issue 3 years ago • 5 comments

Tips before filing an issue

  • Have you gone through our FAQs?

  • Join the mailing list to engage in conversations and get faster support at [email protected].

  • If you have triaged this as a bug, then file an issue directly.

Describe the problem you faced

We have recently upgraded hudi to 0.11.0 and application has been running good for last few days. Then all of sudden due to some reason, application is failing to rollback the failed commit using marker files . We are using the timelinebased marker files creation and observed that MARKERS.type file is not created in /.hoodie/.temp/{commitId}/ directory and while rolling back Asynclearer is trying to read the markers as Directly created marker files and failing to find .marker in the file names.

Steps to reproduce the behavior:

  1. Start the application and keep it running.
  2. We don't know exactly what has stopped the application, but it started trying to rollback the failed commit and it is failing to do that.

Expected behavior

Run and rollback the failed commits if application has failed.

Environment Description

  • Hudi version : 0.11.0

  • Spark version : 3.2.1

  • Hive version : 2.1.3

  • Hadoop version : 3.x.x

  • Storage (HDFS/S3/GCS..) : GCS

  • Running on Docker? (yes/no) : yes(k8's).

Additional context

Stacktrace

Caused by: java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieRollbackException: Failed to rollback gs://xxx/hudi/xxxx/ commits 20220514180424825
	at java.base/java.util.concurrent.CompletableFuture.reportGet(Unknown Source)
	at java.base/java.util.concurrent.CompletableFuture.get(Unknown Source)
	at org.apache.hudi.async.HoodieAsyncService.waitForShutdown(HoodieAsyncService.java:103)
	at org.apache.hudi.async.AsyncCleanerService.waitForCompletion(AsyncCleanerService.java:75)
	... 11 more
Caused by: org.apache.hudi.exception.HoodieRollbackException: Failed to rollback gs://xxx/hudi/xxx/ commits 20220514180424825
	at org.apache.hudi.client.BaseHoodieWriteClient.rollback(BaseHoodieWriteClient.java:783)
	at org.apache.hudi.client.BaseHoodieWriteClient.rollbackFailedWrites(BaseHoodieWriteClient.java:1193)
	at org.apache.hudi.client.BaseHoodieWriteClient.rollbackFailedWrites(BaseHoodieWriteClient.java:1176)
	at org.apache.hudi.client.BaseHoodieWriteClient.lambda$clean$33796fd2$1(BaseHoodieWriteClient.java:856)
	at org.apache.hudi.common.util.CleanerUtils.rollbackFailedWrites(CleanerUtils.java:142)
	at org.apache.hudi.client.BaseHoodieWriteClient.clean(BaseHoodieWriteClient.java:855)
	at org.apache.hudi.client.BaseHoodieWriteClient.clean(BaseHoodieWriteClient.java:825)
	at org.apache.hudi.async.AsyncCleanerService.lambda$startService$0(AsyncCleanerService.java:55)
	... 4 more
Caused by: org.apache.hudi.exception.HoodieRollbackException: Error rolling back using marker files written for [==>20220514180424825__commit__INFLIGHT]
	at org.apache.hudi.table.action.rollback.MarkerBasedRollbackStrategy.getRollbackRequests(MarkerBasedRollbackStrategy.java:103)
	at org.apache.hudi.table.action.rollback.BaseRollbackPlanActionExecutor.requestRollback(BaseRollbackPlanActionExecutor.java:109)
	at org.apache.hudi.table.action.rollback.BaseRollbackPlanActionExecutor.execute(BaseRollbackPlanActionExecutor.java:132)
	at org.apache.hudi.table.HoodieSparkCopyOnWriteTable.scheduleRollback(HoodieSparkCopyOnWriteTable.java:212)
	at org.apache.hudi.client.BaseHoodieWriteClient.lambda$rollback$6(BaseHoodieWriteClient.java:757)
	at org.apache.hudi.common.util.Option.orElseGet(Option.java:142)
	at org.apache.hudi.client.BaseHoodieWriteClient.rollback(BaseHoodieWriteClient.java:757)
	... 11 more
Caused by: java.lang.IllegalArgumentException
	at org.apache.hudi.common.util.ValidationUtils.checkArgument(ValidationUtils.java:31)
	at org.apache.hudi.common.util.MarkerUtils.stripMarkerFolderPrefix(MarkerUtils.java:67)
	at org.apache.hudi.table.marker.DirectWriteMarkers.lambda$allMarkerFilePaths$0(DirectWriteMarkers.java:136)
	at org.apache.hudi.common.fs.FSUtils.processFiles(FSUtils.java:277)
	at org.apache.hudi.table.marker.DirectWriteMarkers.allMarkerFilePaths(DirectWriteMarkers.java:135)
	at org.apache.hudi.table.marker.MarkerBasedRollbackUtils.getAllMarkerPaths(MarkerBasedRollbackUtils.java:62)
	at org.apache.hudi.table.action.rollback.MarkerBasedRollbackStrategy.getRollbackRequests(MarkerBasedRollbackStrategy.java:76)
	... 17 more

Screenshot 2022-05-16 at 11 01 13 AM

BalaMahesh avatar May 16 '22 05:05 BalaMahesh

@yihua - please do take a look at it.

BalaMahesh avatar May 16 '22 06:05 BalaMahesh

On checking I have found that MARKER_TYPE_FILENAME file is not created but the marker files are being created using timeline server. Since MARKER_TYPE_FILENAME is not found, hudi is assuming it as a direct marker file creation and trying to validate the MARKERSX to have .marker extension. Instead it should read the content of MARKERSX file and then try to validate the content of it as filenames. I am not certain, where this mismatch is happening.

BalaMahesh avatar May 16 '22 07:05 BalaMahesh

@BalaMahesh sorry for getting to this issue late. Have you resolved this issue by any chance? If MARKERS.type is not present, the logic assumes that the direct markers are stored, which causes the read failure in your case. I'll check if we can improve the error handling in this case.

yihua avatar Jun 29 '22 08:06 yihua

@BalaMahesh Do you have application logs? Do you see any of the following lines in the log? Failed to create marker type file Failed to write marker type file Hudi will always throw an exception if marker-type file is not created.

codope avatar Aug 06 '22 14:08 codope

Also, you could try out the following patch #6266

codope avatar Aug 08 '22 11:08 codope

@BalaMahesh : any updates please.

nsivabalan avatar Sep 12 '22 15:09 nsivabalan

Closing as patch is available and should be released in 0.12.1.

codope avatar Sep 19 '22 13:09 codope