hudi icon indicating copy to clipboard operation
hudi copied to clipboard

[SUPPORT] IllegalArgumentException at org.apache.hudi.common.util.ValidationUtils.checkArgument(ValidationUtils.java:33)

Open michael1991 opened this issue 1 year ago • 17 comments

Describe the problem you faced

I have two jobs:

  • JobA use spark3.3.2+hudi0.14.1 append batch log data;
  • JobB use spark3.3.2+hudi0.12.3 to read log data produced by JobA. Then error occurs in JobA like title.

Expected behavior

We dont write concurrently, we should not get this error.

Environment Description

  • Hudi version : 0.14.1(single spark writer), 0.12.3(multi spark reader with hoodie.metadata.enable to improve read performance)

  • Spark version : 3.3.2

  • Hive version : no

  • Hadoop version : 3.3.6

  • Storage (HDFS/S3/GCS..) : GCS

  • Running on Docker? (yes/no) : no, Dataproc

Stacktrace

org.apache.hudi.exception.HoodieException: Failed to apply clean commit to metadata
	at org.apache.hudi.table.action.BaseActionExecutor.writeTableMetadata(BaseActionExecutor.java:91) ~[hudi-spark3.3-bundle_2.12-0.14.1.jar:0.14.1]
	at org.apache.hudi.table.action.clean.CleanActionExecutor.runClean(CleanActionExecutor.java:227) ~[hudi-spark3.3-bundle_2.12-0.14.1.jar:0.14.1]
	at org.apache.hudi.table.action.clean.CleanActionExecutor.runPendingClean(CleanActionExecutor.java:193) ~[hudi-spark3.3-bundle_2.12-0.14.1.jar:0.14.1]
	at org.apache.hudi.table.action.clean.CleanActionExecutor.execute(CleanActionExecutor.java:263) ~[hudi-spark3.3-bundle_2.12-0.14.1.jar:0.14.1]
	at org.apache.hudi.table.HoodieSparkCopyOnWriteTable.clean(HoodieSparkCopyOnWriteTable.java:291) ~[hudi-spark3.3-bundle_2.12-0.14.1.jar:0.14.1]
	at org.apache.hudi.client.BaseHoodieTableServiceClient.clean(BaseHoodieTableServiceClient.java:763) ~[hudi-spark3.3-bundle_2.12-0.14.1.jar:0.14.1]
	at org.apache.hudi.client.BaseHoodieWriteClient.clean(BaseHoodieWriteClient.java:862) ~[hudi-spark3.3-bundle_2.12-0.14.1.jar:0.14.1]
	at org.apache.hudi.client.BaseHoodieWriteClient.clean(BaseHoodieWriteClient.java:835) ~[hudi-spark3.3-bundle_2.12-0.14.1.jar:0.14.1]
	at org.apache.hudi.async.AsyncCleanerService.lambda$startService$0(AsyncCleanerService.java:55) ~[hudi-spark3.3-bundle_2.12-0.14.1.jar:0.14.1]
	at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700) ~[?:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
	at java.lang.Thread.run(Thread.java:829) ~[?:?]
Caused by: java.lang.IllegalArgumentException
	at org.apache.hudi.common.util.ValidationUtils.checkArgument(ValidationUtils.java:33) ~[hudi-spark3.3-bundle_2.12-0.14.1.jar:0.14.1]
	at org.apache.hudi.common.table.timeline.HoodieActiveTimeline.transitionState(HoodieActiveTimeline.java:618) ~[hudi-spark3.3-bundle_2.12-0.14.1.jar:0.14.1]
	at org.apache.hudi.common.table.timeline.HoodieActiveTimeline.transitionState(HoodieActiveTimeline.java:599) ~[hudi-spark3.3-bundle_2.12-0.14.1.jar:0.14.1]
	at org.apache.hudi.common.table.timeline.HoodieActiveTimeline.saveAsComplete(HoodieActiveTimeline.java:224) ~[hudi-spark3.3-bundle_2.12-0.14.1.jar:0.14.1]
	at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.commit(BaseSparkCommitActionExecutor.java:311) ~[hudi-spark3.3-bundle_2.12-0.14.1.jar:0.14.1]
	at org.apache.hudi.table.action.commit.BaseCommitActionExecutor.autoCommit(BaseCommitActionExecutor.java:201) ~[hudi-spark3.3-bundle_2.12-0.14.1.jar:0.14.1]
	at org.apache.hudi.table.action.commit.BaseCommitActionExecutor.commitOnAutoCommit(BaseCommitActionExecutor.java:183) ~[hudi-spark3.3-bundle_2.12-0.14.1.jar:0.14.1]
	at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.updateIndexAndCommitIfNeeded(BaseSparkCommitActionExecutor.java:279) ~[hudi-spark3.3-bundle_2.12-0.14.1.jar:0.14.1]
	at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.execute(BaseSparkCommitActionExecutor.java:184) ~[hudi-spark3.3-bundle_2.12-0.14.1.jar:0.14.1]
	at org.apache.hudi.table.action.deltacommit.SparkUpsertPreppedDeltaCommitActionExecutor.execute(SparkUpsertPreppedDeltaCommitActionExecutor.java:44) ~[hudi-spark3.3-bundle_2.12-0.14.1.jar:0.14.1]
	at org.apache.hudi.table.HoodieSparkMergeOnReadTable.upsertPrepped(HoodieSparkMergeOnReadTable.java:126) ~[hudi-spark3.3-bundle_2.12-0.14.1.jar:0.14.1]
	at org.apache.hudi.table.HoodieSparkMergeOnReadTable.upsertPrepped(HoodieSparkMergeOnReadTable.java:88) ~[hudi-spark3.3-bundle_2.12-0.14.1.jar:0.14.1]
	at org.apache.hudi.client.SparkRDDWriteClient.upsertPreppedRecords(SparkRDDWriteClient.java:156) ~[hudi-spark3.3-bundle_2.12-0.14.1.jar:0.14.1]
	at org.apache.hudi.client.SparkRDDWriteClient.upsertPreppedRecords(SparkRDDWriteClient.java:63) ~[hudi-spark3.3-bundle_2.12-0.14.1.jar:0.14.1]
	at org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.commitInternal(HoodieBackedTableMetadataWriter.java:1099) ~[hudi-spark3.3-bundle_2.12-0.14.1.jar:0.14.1]
	at org.apache.hudi.metadata.SparkHoodieBackedTableMetadataWriter.commit(SparkHoodieBackedTableMetadataWriter.java:117) ~[hudi-spark3.3-bundle_2.12-0.14.1.jar:0.14.1]
	at org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.processAndCommit(HoodieBackedTableMetadataWriter.java:810) ~[hudi-spark3.3-bundle_2.12-0.14.1.jar:0.14.1]
	at org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.update(HoodieBackedTableMetadataWriter.java:899) ~[hudi-spark3.3-bundle_2.12-0.14.1.jar:0.14.1]
	at org.apache.hudi.table.action.BaseActionExecutor.writeTableMetadata(BaseActionExecutor.java:86) ~[hudi-spark3.3-bundle_2.12-0.14.1.jar:0.14.1]
	... 12 more
2024-03-22 02:54:03 ERROR HoodieAsyncService:105 - Service shutdown with error
java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieException: Failed to apply clean commit to metadata
	at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395) ~[?:?]
	at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999) ~[?:?]
	at org.apache.hudi.async.HoodieAsyncService.waitForShutdown(HoodieAsyncService.java:103) ~[hudi-spark3.3-bundle_2.12-0.14.1.jar:0.14.1]
	at org.apache.hudi.async.AsyncCleanerService.waitForCompletion(AsyncCleanerService.java:75) ~[hudi-spark3.3-bundle_2.12-0.14.1.jar:0.14.1]
	at org.apache.hudi.client.BaseHoodieTableServiceClient.asyncClean(BaseHoodieTableServiceClient.java:133) ~[hudi-spark3.3-bundle_2.12-0.14.1.jar:0.14.1]
	at org.apache.hudi.client.BaseHoodieWriteClient.autoCleanOnCommit(BaseHoodieWriteClient.java:595) ~[hudi-spark3.3-bundle_2.12-0.14.1.jar:0.14.1]
	at org.apache.hudi.client.BaseHoodieWriteClient.mayBeCleanAndArchive(BaseHoodieWriteClient.java:579) ~[hudi-spark3.3-bundle_2.12-0.14.1.jar:0.14.1]
	at org.apache.hudi.client.BaseHoodieWriteClient.commitStats(BaseHoodieWriteClient.java:248) ~[hudi-spark3.3-bundle_2.12-0.14.1.jar:0.14.1]
	at org.apache.hudi.client.SparkRDDWriteClient.commit(SparkRDDWriteClient.java:104) ~[hudi-spark3.3-bundle_2.12-0.14.1.jar:0.14.1]
	at org.apache.hudi.HoodieSparkSqlWriterInternal.commitAndPerformPostOperations(HoodieSparkSqlWriter.scala:1081) ~[hudi-spark3.3-bundle_2.12-0.14.1.jar:0.14.1]
	at org.apache.hudi.HoodieSparkSqlWriterInternal.writeInternal(HoodieSparkSqlWriter.scala:520) ~[hudi-spark3.3-bundle_2.12-0.14.1.jar:0.14.1]
	at org.apache.hudi.HoodieSparkSqlWriterInternal.write(HoodieSparkSqlWriter.scala:204) ~[hudi-spark3.3-bundle_2.12-0.14.1.jar:0.14.1]
	at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:121) ~[hudi-spark3.3-bundle_2.12-0.14.1.jar:0.14.1]
	at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:150) ~[hudi-spark3.3-bundle_2.12-0.14.1.jar:0.14.1]
	at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:47) ~[spark-sql_2.12-3.3.2.jar:0.14.1]
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:75) ~[spark-sql_2.12-3.3.2.jar:3.3.2]
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:73) ~[spark-sql_2.12-3.3.2.jar:3.3.2]
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:84) ~[spark-sql_2.12-3.3.2.jar:3.3.2]
	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:98) ~[spark-sql_2.12-3.3.2.jar:0.14.1]
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:109) ~[spark-sql_2.12-3.3.2.jar:0.14.1]
	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:169) ~[spark-sql_2.12-3.3.2.jar:0.14.1]
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:95) ~[spark-sql_2.12-3.3.2.jar:0.14.1]
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779) ~[spark-sql_2.12-3.3.2.jar:3.3.2]
	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64) ~[spark-sql_2.12-3.3.2.jar:0.14.1]
	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:98) ~[spark-sql_2.12-3.3.2.jar:0.14.1]
	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:94) ~[spark-sql_2.12-3.3.2.jar:0.14.1]
	at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:584) ~[spark-catalyst_2.12-3.3.2.jar:3.3.2]
	at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:176) ~[spark-catalyst_2.12-3.3.2.jar:3.3.2]
	at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:584) ~[spark-catalyst_2.12-3.3.2.jar:3.3.2]
	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30) ~[spark-catalyst_2.12-3.3.2.jar:3.3.2]
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267) ~[spark-catalyst_2.12-3.3.2.jar:3.3.2]
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263) ~[spark-catalyst_2.12-3.3.2.jar:3.3.2]
	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30) ~[spark-catalyst_2.12-3.3.2.jar:3.3.2]
	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30) ~[spark-catalyst_2.12-3.3.2.jar:3.3.2]
	at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:560) ~[spark-catalyst_2.12-3.3.2.jar:3.3.2]
	at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:94) ~[spark-sql_2.12-3.3.2.jar:0.14.1]
	at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:81) ~[spark-sql_2.12-3.3.2.jar:0.14.1]
	at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:79) ~[spark-sql_2.12-3.3.2.jar:0.14.1]
	at org.apache.spark.sql.execution.QueryExecution.assertCommandExecuted(QueryExecution.scala:116) ~[spark-sql_2.12-3.3.2.jar:0.14.1]
	at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:860) ~[spark-sql_2.12-3.3.2.jar:3.3.2]
	at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:390) ~[spark-sql_2.12-3.3.2.jar:3.3.2]
	at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:363) ~[spark-sql_2.12-3.3.2.jar:3.3.2]
	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:239) ~[spark-sql_2.12-3.3.2.jar:3.3.2]
	at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:?]
	at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:?]
	at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?]
	at java.lang.reflect.Method.invoke(Method.java:566) ~[?:?]
	at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) ~[spark-core_2.12-3.3.2.jar:3.3.2]
	at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:973) ~[spark-core_2.12-3.3.2.jar:3.3.2]
	at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) ~[spark-core_2.12-3.3.2.jar:3.3.2]
	at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) ~[spark-core_2.12-3.3.2.jar:3.3.2]
	at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) ~[spark-core_2.12-3.3.2.jar:3.3.2]
	at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1061) ~[spark-core_2.12-3.3.2.jar:3.3.2]
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1070) ~[spark-core_2.12-3.3.2.jar:3.3.2]
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) ~[spark-core_2.12-3.3.2.jar:3.3.2]
Caused by: org.apache.hudi.exception.HoodieException: Failed to apply clean commit to metadata
	at org.apache.hudi.table.action.BaseActionExecutor.writeTableMetadata(BaseActionExecutor.java:91) ~[hudi-spark3.3-bundle_2.12-0.14.1.jar:0.14.1]
	at org.apache.hudi.table.action.clean.CleanActionExecutor.runClean(CleanActionExecutor.java:227) ~[hudi-spark3.3-bundle_2.12-0.14.1.jar:0.14.1]
	at org.apache.hudi.table.action.clean.CleanActionExecutor.runPendingClean(CleanActionExecutor.java:193) ~[hudi-spark3.3-bundle_2.12-0.14.1.jar:0.14.1]
	at org.apache.hudi.table.action.clean.CleanActionExecutor.execute(CleanActionExecutor.java:263) ~[hudi-spark3.3-bundle_2.12-0.14.1.jar:0.14.1]
	at org.apache.hudi.table.HoodieSparkCopyOnWriteTable.clean(HoodieSparkCopyOnWriteTable.java:291) ~[hudi-spark3.3-bundle_2.12-0.14.1.jar:0.14.1]
	at org.apache.hudi.client.BaseHoodieTableServiceClient.clean(BaseHoodieTableServiceClient.java:763) ~[hudi-spark3.3-bundle_2.12-0.14.1.jar:0.14.1]
	at org.apache.hudi.client.BaseHoodieWriteClient.clean(BaseHoodieWriteClient.java:862) ~[hudi-spark3.3-bundle_2.12-0.14.1.jar:0.14.1]
	at org.apache.hudi.client.BaseHoodieWriteClient.clean(BaseHoodieWriteClient.java:835) ~[hudi-spark3.3-bundle_2.12-0.14.1.jar:0.14.1]
	at org.apache.hudi.async.AsyncCleanerService.lambda$startService$0(AsyncCleanerService.java:55) ~[hudi-spark3.3-bundle_2.12-0.14.1.jar:0.14.1]
	at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700) ~[?:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
	at java.lang.Thread.run(Thread.java:829) ~[?:?]
Caused by: java.lang.IllegalArgumentException
	at org.apache.hudi.common.util.ValidationUtils.checkArgument(ValidationUtils.java:33) ~[hudi-spark3.3-bundle_2.12-0.14.1.jar:0.14.1]
	at org.apache.hudi.common.table.timeline.HoodieActiveTimeline.transitionState(HoodieActiveTimeline.java:618) ~[hudi-spark3.3-bundle_2.12-0.14.1.jar:0.14.1]
	at org.apache.hudi.common.table.timeline.HoodieActiveTimeline.transitionState(HoodieActiveTimeline.java:599) ~[hudi-spark3.3-bundle_2.12-0.14.1.jar:0.14.1]
	at org.apache.hudi.common.table.timeline.HoodieActiveTimeline.saveAsComplete(HoodieActiveTimeline.java:224) ~[hudi-spark3.3-bundle_2.12-0.14.1.jar:0.14.1]
	at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.commit(BaseSparkCommitActionExecutor.java:311) ~[hudi-spark3.3-bundle_2.12-0.14.1.jar:0.14.1]
	at org.apache.hudi.table.action.commit.BaseCommitActionExecutor.autoCommit(BaseCommitActionExecutor.java:201) ~[hudi-spark3.3-bundle_2.12-0.14.1.jar:0.14.1]
	at org.apache.hudi.table.action.commit.BaseCommitActionExecutor.commitOnAutoCommit(BaseCommitActionExecutor.java:183) ~[hudi-spark3.3-bundle_2.12-0.14.1.jar:0.14.1]
	at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.updateIndexAndCommitIfNeeded(BaseSparkCommitActionExecutor.java:279) ~[hudi-spark3.3-bundle_2.12-0.14.1.jar:0.14.1]
	at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.execute(BaseSparkCommitActionExecutor.java:184) ~[hudi-spark3.3-bundle_2.12-0.14.1.jar:0.14.1]
	at org.apache.hudi.table.action.deltacommit.SparkUpsertPreppedDeltaCommitActionExecutor.execute(SparkUpsertPreppedDeltaCommitActionExecutor.java:44) ~[hudi-spark3.3-bundle_2.12-0.14.1.jar:0.14.1]
	at org.apache.hudi.table.HoodieSparkMergeOnReadTable.upsertPrepped(HoodieSparkMergeOnReadTable.java:126) ~[hudi-spark3.3-bundle_2.12-0.14.1.jar:0.14.1]
	at org.apache.hudi.table.HoodieSparkMergeOnReadTable.upsertPrepped(HoodieSparkMergeOnReadTable.java:88) ~[hudi-spark3.3-bundle_2.12-0.14.1.jar:0.14.1]
	at org.apache.hudi.client.SparkRDDWriteClient.upsertPreppedRecords(SparkRDDWriteClient.java:156) ~[hudi-spark3.3-bundle_2.12-0.14.1.jar:0.14.1]
	at org.apache.hudi.client.SparkRDDWriteClient.upsertPreppedRecords(SparkRDDWriteClient.java:63) ~[hudi-spark3.3-bundle_2.12-0.14.1.jar:0.14.1]
	at org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.commitInternal(HoodieBackedTableMetadataWriter.java:1099) ~[hudi-spark3.3-bundle_2.12-0.14.1.jar:0.14.1]
	at org.apache.hudi.metadata.SparkHoodieBackedTableMetadataWriter.commit(SparkHoodieBackedTableMetadataWriter.java:117) ~[hudi-spark3.3-bundle_2.12-0.14.1.jar:0.14.1]
	at org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.processAndCommit(HoodieBackedTableMetadataWriter.java:810) ~[hudi-spark3.3-bundle_2.12-0.14.1.jar:0.14.1]
	at org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.update(HoodieBackedTableMetadataWriter.java:899) ~[hudi-spark3.3-bundle_2.12-0.14.1.jar:0.14.1]
	at org.apache.hudi.table.action.BaseActionExecutor.writeTableMetadata(BaseActionExecutor.java:86) ~[hudi-spark3.3-bundle_2.12-0.14.1.jar:0.14.1]
	at org.apache.hudi.table.action.clean.CleanActionExecutor.runClean(CleanActionExecutor.java:227) ~[hudi-spark3.3-bundle_2.12-0.14.1.jar:0.14.1]
	at org.apache.hudi.table.action.clean.CleanActionExecutor.runPendingClean(CleanActionExecutor.java:193) ~[hudi-spark3.3-bundle_2.12-0.14.1.jar:0.14.1]
	at org.apache.hudi.table.action.clean.CleanActionExecutor.execute(CleanActionExecutor.java:263) ~[hudi-spark3.3-bundle_2.12-0.14.1.jar:0.14.1]
	at org.apache.hudi.table.HoodieSparkCopyOnWriteTable.clean(HoodieSparkCopyOnWriteTable.java:291) ~[hudi-spark3.3-bundle_2.12-0.14.1.jar:0.14.1]
	at org.apache.hudi.client.BaseHoodieTableServiceClient.clean(BaseHoodieTableServiceClient.java:763) ~[hudi-spark3.3-bundle_2.12-0.14.1.jar:0.14.1]
	at org.apache.hudi.client.BaseHoodieWriteClient.clean(BaseHoodieWriteClient.java:862) ~[hudi-spark3.3-bundle_2.12-0.14.1.jar:0.14.1]
	at org.apache.hudi.client.BaseHoodieWriteClient.clean(BaseHoodieWriteClient.java:835) ~[hudi-spark3.3-bundle_2.12-0.14.1.jar:0.14.1]
	at org.apache.hudi.async.AsyncCleanerService.lambda$startService$0(AsyncCleanerService.java:55) ~[hudi-spark3.3-bundle_2.12-0.14.1.jar:0.14.1]
	at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700) ~[?:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
	at java.lang.Thread.run(Thread.java:829) ~[?:?]
2024-03-22 02:54:03 WARN  HoodieSparkSqlWriterInternal:534 - Closing write client```

michael1991 avatar Mar 22 '24 03:03 michael1991

how many writers are there, there might be multiple cleaners.

danny0405 avatar Mar 22 '24 03:03 danny0405

We have only one writer here, while multiple readers. So we have one cleaner within writer, right?

michael1991 avatar Mar 22 '24 03:03 michael1991

yeah, can you check the metadata file state under .hoodie dir?

danny0405 avatar Mar 22 '24 03:03 danny0405

yeah, can you check the metadata file state under .hoodie dir?

Oh, I found some empty files there. image

And if I enable metadata in reader side, warning thrown as below:

24/03/22 06:30:09 WARN AbstractHoodieLogRecordReader: TargetInstantTime 20240322043545698 invalid or extra rollback command block in gs://bucket/tables_prod/ods/hudi_log_events/.hoodie/metadata/files/.files-0000-0_20240322033640958001.log.5_1-0-1
24/03/22 06:30:10 WARN BaseTableMetadata: Metadata record for log_date=2024-03-20 encountered some files to be deleted which was not added before. Ignoring the spurious deletes as the `_hoodie.metadata.ignore.spurious.deletes` config is set to true
java.io.FileNotFoundException: 
File not found: gs://bucket/tables_prod/ods/hudi_log_events/log_date=2024-03-20/62c503f0-0fca-463a-8643-c1fa688412e5-0_0-8-211_20240320203557276.parquet

How to avoid this error?

michael1991 avatar Mar 22 '24 06:03 michael1991

@michael1991 Why are we trying to read the data written using newer Hudi version from an older Hudi version . Between 0.12.3 and 0.14.1 anyway used the different table version and have different log format.

Also can you post writer configurations too.

ad1happy2go avatar Mar 22 '24 09:03 ad1happy2go

  • Because we are ingesting event log data without record key, before 0.14, we must set record key for writer. Meanwhile, in GCP Dataproc, Hudi v0.12.3 is the default version, so we use 0.12.3 bundle to read data generated by 0.14.1 bundle.
  • Our writer configurations:
lazy val EVENT_HUDI_CONF_MAP: Map[String, String] = Map(
    "hoodie.database.name" -> "database",
    "hoodie.table.name" -> "log_events",
    "hoodie.schema.on.read.enable" -> "true",
    "hoodie.combine.before.upsert" -> "false",
    "hoodie.datasource.write.partitionpath.field" -> "log_date",
    "hoodie.datasource.write.operation" -> "insert_overwrite",
    "hoodie.clean.async" -> "true",
    "hoodie.cleaner.commits.retained" -> "5",
    "hoodie.parquet.compression.codec" -> "snappy",
    "hoodie.copyonwrite.record.size.estimate" -> "64",
    "hoodie.datasource.write.hive_style_partitioning" -> "true"
  )

michael1991 avatar Mar 22 '24 09:03 michael1991

@michael1991 I dont think that's the right way. We must use 0.14.1 only to read data also. You can use OSS hudi bundle on dataproc to read data back and see if you have same issue occurring there too. Thanks.

ad1happy2go avatar Mar 22 '24 09:03 ad1happy2go

Ok, let me have a try, and we have a spark job, needs to read base data & event data, then join them to generate new dataset. Base data are generated by 0.12.3 bundle for 1yr, and event data are generated by 0.14.1 bundle from last week, so can we use 0.14.1 bundle to read two datasets with different table versions, meaning backward compatibility in reader side?

michael1991 avatar Mar 22 '24 09:03 michael1991

I disable metadata in reader side, so the reader job just treat them as normal parquet files and ignore reading metadata. Meanwhile, I disable metadata in writer side too, after few commits, I re-enable metadata in writer side. But exception has been thrown at beginning of every day. The writer job is scheduled hourly, and insert overwrite whole day partition, due to event data without record key. Seems not writer & reader version compatibility issue.

michael1991 avatar Mar 25 '24 01:03 michael1991

spark-hudi.log I got more details for this error, actually this error occurs after this rollback error. But from log, I could not see any error before rollback occurs. Hope it could be helpful. I'm not sure whether async cleaning could be enabled when metadata rollback happen.

michael1991 avatar Mar 26 '24 06:03 michael1991

Root cause is: rollback and async clean, two actions need to delete same file, then "dead lock" happens. Disable async cleaning works.

michael1991 avatar Mar 27 '24 02:03 michael1991

Root cause is: rollback and async clean, two actions need to delete same file, then "dead lock" happens. Disable async cleaning works.

Nice findings, seems a bug, can you fire a fix for it.

danny0405 avatar Mar 30 '24 00:03 danny0405

@michael1991 Thanks for identifying the root cause. Do you have a fix in your mind. Created tracking jira for the same - https://issues.apache.org/jira/browse/HUDI-7560

Are you using spark structured streaming to write or HudiStreamer?

ad1happy2go avatar Apr 01 '24 11:04 ad1happy2go

@michael1991 Thanks for identifying the root cause. Do you have a fix in your mind. Created tracking jira for the same - https://issues.apache.org/jira/browse/HUDI-7560

Are you using spark structured streaming to write or HudiStreamer?

I'm using Spark core and sql to write, not structured streaming or HudiStreamer. In the future, we will try Flink to write streaming data.

michael1991 avatar Apr 02 '24 08:04 michael1991

@michael1991 Async services do not make sense with spark core and sql write anyway. So you should use inline table services only.

ad1happy2go avatar Apr 02 '24 08:04 ad1happy2go

CC @linliu-code

nsivabalan avatar Apr 09 '24 01:04 nsivabalan

https://hudi.apache.org/docs/metadata#deployment-model-b-single-writer-with-async-table-services

Lock is needed for asycn table service with MDT.

xicm avatar May 11 '24 09:05 xicm

As @xicm mentioned, lock is needed for async table service, close issue then.

michael1991 avatar Jun 07 '24 09:06 michael1991