[SUPPORT]org.apache.hudi.exception.HoodieUpsertException: Failed to upsert for commit time 20220418194506064
22/04/18 19:49:02 INFO timeline.HoodieActiveTimeline: Checking for file exists ?/hudi//_/.hoodie/20220418194506064.deltacommit.requested Exception in thread "pool-24-thread-1" org.apache.hudi.exception.HoodieUpsertException: Failed to upsert for commit time 20220418194506064 at org.apache.hudi.table.action.commit.AbstractWriteHelper.write(AbstractWriteHelper.java:62) at org.apache.hudi.table.action.deltacommit.SparkUpsertDeltaCommitActionExecutor.execute(SparkUpsertDeltaCommitActionExecutor.java:46) at org.apache.hudi.table.HoodieSparkMergeOnReadTable.upsert(HoodieSparkMergeOnReadTable.java:90) at org.apache.hudi.table.HoodieSparkMergeOnReadTable.upsert(HoodieSparkMergeOnReadTable.java:77) at org.apache.hudi.client.SparkRDDWriteClient.upsert(SparkRDDWriteClient.java:159) at org.apache.hudi.DataSourceUtils.doWriteOperation(DataSourceUtils.java:214) at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:275) at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:164) at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68) at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80) at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80) at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:668) at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:668) at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73) at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:668) at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:276) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:270) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:228) at com.zhijingling.hudihivesync.service.SparkHudiMysql$$anonfun$1$$anonfun$apply$1$$anonfun$apply$2$$anon$1.run(SparkHudiMysql.scala:183) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.IllegalArgumentException at org.apache.hudi.common.util.ValidationUtils.checkArgument(ValidationUtils.java:31) at org.apache.hudi.common.table.timeline.HoodieActiveTimeline.transitionState(HoodieActive
环境: hudi 0.10.0 spark:2.4.4. hadoop:3.0.0. 写入方式:MOR_TABLE_TYPE_OPT_VAL 刚开始不报错,运行1个小时后报错,数据量:100万左右,分区:100个左右
@lanyu1hao Can you share more details to reproduce the scenario? Is this happening for single writer? What were the write configs? What was the timeline under .hoodie folder when this crash happened? Were there other operations running on the table like cleaning or compaction (timeline would give an idea)?
From the stacktrace it looks like the file 20220418194506064.deltacommit.requested was not found.
@lanyu1hao It looks like the stacktrace misses some content after the following line. Could you provide more information to help diagnose the issue? Have you retried the job to see if the writes can succeed? If you already got past it, feel free to close the issue.
at org.apache.hudi.common.table.timeline.HoodieActiveTimeline.transitionState(HoodieActive
@lanyu1hao : hey, can you provide us w/ more logs/stacktrace as requested. If not, would be tough for us to debug further. Feel free to close out the issue if you got it resolved. thanks.
please share the write configs used, full stacktrace and logs. "ls" of ".hoodie" folder when this exception was hit.
@lanyu1hao : gentle ping. If the issue is resolved, feel free to close out the issue.
closing due t no activity.
I am also getting the same error. I am using Glue to read the CSV file and write it into a Hudi table.
py4j.protocol.Py4JJavaError: An error occurred while calling o326.save. : org.apache.hudi.exception.HoodieUpsertException: Failed to upsert for commit time 20230809204110303