[SUPPORT] HoodieException: Failed to apply clean commit to metadata - HUDI 0.15
Hello Community,
We are facing issue with the streaming pipeline with hudi 0.15 (hoodie.metadata.enable=True) while migrating from 0.12 to 0.15 Pipeline is running successfully with hoodie.metadata.enable=False but failing with hoodie.metadata.enable=True
Need your help to debug this issue, if you guys have faced similar issues in past please let me know
ISSUE : 1
[2024-08-11 21:22:35,835] ERROR org.apache.hudi.exception.HoodieException: Error waiting for async clean service to finish (com.pe.skull.titan.utils.SparkUtils) [2024-08-11 21:22:35,836] ERROR writing data for pe_oms_product_oms.product_order_eligible_payment_modes_snapshot_nrt attempt: 3 (com.pe.skull.titan.utils.SparkUtils) [2024-08-11 21:22:40,290] ERROR Service shutdown with error (org.apache.hudi.async.HoodieAsyncService) java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieException: Failed to apply clean commit to metadata at java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395) ~[?:?] at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999) ~[?:?]
Caused by: org.apache.hudi.exception.HoodieException: Failed to apply clean commit to metadata at org.apache.hudi.table.action.BaseActionExecutor.writeTableMetadata(BaseActionExecutor.java:93) ~[hudi-spark3.5-bundle_2.12-0.15.0.jar:0.15.0] at org.apache.hudi.table.action.clean.CleanActionExecutor.runClean(CleanActionExecutor.java:235) ~[hudi-spark3.5-bundle_2.12-0.15.0.jar:0.15.0] at org.apache.hudi.table.action.clean.CleanActionExecutor.runPendingClean(CleanActionExecutor.java:200) ~[hudi-spark3.5-bundle_2.12-0.15.0.jar:0.15.0]
Caused by: java.lang.IllegalArgumentException: File gs://pe-skull-data/pe/processed/nrt_snapshot/data/pe_oms_product_oms/product_order_eligible_payment_modes_snapshot_nrt/.hoodie/metadata/.hoodie/20240811212153175.deltacommit.requested does not exist! at org.apache.hudi.common.util.ValidationUtils.checkArgument(ValidationUtils.java:42) ~[hudi-spark3.5-bundle_2.12-0.15.0.jar:0.15.0] at org.apache.hudi.common.table.timeline.HoodieActiveTimeline.transitionState(HoodieActiveTimeline.java:617) ~[hudi-spark3.5-bundle_2.12-0.15.0.jar:0.15.0] at org.apache.hudi.common.table.timeline.HoodieActiveTimeline.transitionRequestedToInflight(HoodieActiveTimeline.java:683) ~[hudi-spark3.5-bundle_2.12-0.15.0.jar:0.15.0]
Caused by: java.lang.IllegalArgumentException: File gs://pe-skull-data/pe/processed/nrt_snapshot/data/pe_oms_product_oms/product_order_eligible_payment_modes_snapshot_nrt/.hoodie/metadata/.hoodie/20240811212153175.deltacommit.requested does not exist! at org.apache.hudi.common.util.ValidationUtils.checkArgument(ValidationUtils.java:42) ~[hudi-spark3.5-bundle_2.12-0.15.0.jar:0.15.0] at org.apache.hudi.common.table.timeline.HoodieActiveTimeline.transitionState(HoodieActiveTimeline.java:617) ~[hudi-spark3.5-bundle_2.12-0.15.0.jar:0.15.0]
ISSUE : 2
[2024-08-10 01:21:09,735] ERROR Failed to perform previous clean operation, instant: [==>20240810012008022__clean__INFLIGHT__20240810012011211] (org.apache.hudi.table.action.clean.CleanActionExecutor) org.apache.hudi.exception.HoodieCommitException: Failed to complete commit 20240810012008022 due to finalize errors. at org.apache.hudi.table.action.commit.BaseCommitActionExecutor.finalizeWrite(BaseCommitActionExecutor.java:223) ~[hudi-spark3.5-bundle_2.12-0.15.0.jar:0.15.0] at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.commit(BaseSparkCommitActionExecutor.java:303) ~[hudi-spark3.5-bundle_2.12-0.15.0.jar:0.15.0] at org.apache.hudi.table.action.commit.BaseCommitActionExecutor.autoCommit(BaseCommitActionExecutor.java:202) ~[hudi-spark3.5-bundle_2.12-0.15.0.jar:0.15.0] at org.apache.hudi.table.action.commit.BaseCommitActionExecutor.commitOnAutoCommit(BaseCommitActionExecutor.java:184) ~[hudi-spark3.5-bundle_2.12-0.15.0.jar:0.15.0]
Caused by: org.apache.hudi.exception.HoodieIOException: listStatus(hadoopPath: gs://pe-skull-data/pe/processed/nrt_snapshot/data/pe_mercury_vault/bk_vendor_snapshot_nrt/.hoodie/metadata/.hoodie/.temp/20240810012008022): 'gs://pe-skull-data/pe/processed/nrt_snapshot/data/pe_mercury_vault/bk_vendor_snapshot_nrt/.hoodie/metadata/.hoodie/.temp/20240810012008022' does not exist. at org.apache.hudi.table.HoodieTable.reconcileAgainstMarkers(HoodieTable.java:784) ~[hudi-spark3.5-bundle_2.12-0.15.0.jar:0.15.0] at org.apache.hudi.table.HoodieTable.finalizeWrite(HoodieTable.java:696) ~[hudi-spark3.5-bundle_2.12-0.15.0.jar:0.15.0] at org.apache.hudi.table.HoodieSparkMergeOnReadTable.finalizeWrite(HoodieSparkMergeOnReadTable.java:230) ~[hudi-spark3.5-bundle_2.12-0.15.0.jar:0.15.0] at org.apache.hudi.table.action.commit.BaseCommitActionExecutor.finalizeWrite(BaseCommitActionExecutor.java:220) ~[hudi-spark3.5-bundle_2.12-0.15.0.jar:0.15.0] ... 27 more Caused by: java.io.FileNotFoundException: listStatus(hadoopPath: gs://pe-skull-data/pe/processed/nrt_snapshot/data/pe_mercury_vault/bk_vendor_snapshot_nrt/.hoodie/metadata/.hoodie/.temp/20240810012008022): 'gs://pe-skull-data/pe/processed/nrt_snapshot/data/pe_mercury_vault/bk_vendor_snapshot_nrt/.hoodie/metadata/.hoodie/.temp/20240810012008022' does not exist. at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem.lambda$listStatus$11(GoogleHadoopFileSystem.java:801) ~[gcs-connector-3.0.1.jar:?] at org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.invokeTrackingDuration(IOStatisticsBinding.java:547) ~[hadoop-client-api-3.3.6.jar:?]
Hoodie Options Used :
hoodie.cleaner.commits.retained: 10 hoodie.metadata.keep.max.commits: 30 hoodie.metadata.clean.async: false hoodie.keep.max.commits: 30 hoodie.metadata.keep.min.commits: 20 hoodie.archive.async: false hoodie.clean.automatic: true hoodie.finalize.write.parallelism: 200 hoodie.fail.on.timeline.archiving: false hoodie.clean.async: true hoodie.parquet.max.file.size: 128000000 hoodie.datasource.hive_sync.support_timestamp : true #DISABLING METADATA TO REDUCE FREQUENT CALLS TO GCS hoodie.metadata.enable: false hoodie.datasource.write.hive_style_partitioning : true hoodie.parquet.small.file.limit: 100000000 hoodie.datasource.hive_sync.enable: true hoodie.bulkinsert.shuffle.parallelism: 200 hoodie.keep.min.commits: 11 hoodie.datasource.meta.sync.enable: true hoodie.metadata.cleaner.commits.retained: 3 hoodie.cleaner.incremental.mode: true hoodie.commits.archival.batch: 12 hoodie.upsert.shuffle.parallelism: 200 hive_sync.support_timestamp: true hoodie.insert.shuffle.parallelism: 200 hoodie.metadata.compact.max.delta.commits: 10 compaction.delta_commits: 5 metadata.compaction.delta_commits: 10 hoodie.compact.inline.max.delta.commits: 5 hoodie.archive.automatic: true hoodie.cleaner.parallelism: 200
cc @nsivabalan for visibility.
@sushant-searce Looks like your metadata table seems to be corrupted somehow. can you provide us the timeline for the both table and metadata table. If it's fine, you can zip the .hoodie directory and attach here.
The root cause for the same may be the async clean. You can try inline clean and see if you see this issue.
@ad1happy2go
I wont be able to share .hoodie directory due to confidentiality.
Made below changes in Hoodie Options hoodie.clean.async: false hoodie.cleaner.policy: KEEP_LATEST_FILE_VERSIONS
It's a streaming pipeline so it's running right now but will let you know if pipelines fails with any issue in couple of hours.
Keep this ISSUE open for couple of days, we will monitor this streaming pipeline meantime.
Thanks @sushant-searce for the update.
@sushant-searce Are we good to close this issue ?
Hello @ad1happy2go
Pipeline was running in production for past 2 months but all the sudden pipelines are failing
sharingError Traces with you for one of the pipeline Can anyone help here as it's very urgent and causing production outage
ivysettings.xml file not found in HIVE_HOME or HIVE_CONF_DIR,/etc/hive/conf.dist/ivysettings.xml will be used org.apache.hudi.exception.HoodieMetaSyncException: Could not sync using the meta sync class org.apache.hudi.hive.HiveSyncTool at org.apache.hudi.sync.common.util.SyncUtilHelpers.runHoodieMetaSync(SyncUtilHelpers.java:81) at org.apache.hudi.HoodieSparkSqlWriterInternal.$anonfun$metaSync$2(HoodieSparkSqlWriter.scala:1015) at scala.collection.mutable.HashSet.foreach(HashSet.scala:79) at org.apache.hudi.HoodieSparkSqlWriterInternal.metaSync(HoodieSparkSqlWriter.scala:1013) at org.apache.hudi.HoodieSparkSqlWriterInternal.commitAndPerformPostOperations(HoodieSparkSqlWriter.scala:1112) at org.apache.hudi.HoodieSparkSqlWriterInternal.writeInternal(HoodieSparkSqlWriter.scala:508) at org.apache.hudi.HoodieSparkSqlWriterInternal.write(HoodieSparkSqlWriter.scala:187) at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:125) at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:168) at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:48) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:75) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:73) at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:84) at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:107) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:125) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:201) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:108) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:66) at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:107) at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:98) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:473) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(origin.scala:76) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:473) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:32) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:32) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:32) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:449) at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:98) at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:85) at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:83) at org.apache.spark.sql.execution.QueryExecution.assertCommandExecuted(QueryExecution.scala:142) at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:859) at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:388) at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:361) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:240) at com.pe.skull.titan.utils.SparkUtils.writeToTable(SparkUtils.java:137) at com.pe.skull.titan.tasks.PipelineRunner.writeData(PipelineRunner.java:154) at com.pe.skull.titan.tasks.PipelineRunner.processBatch(PipelineRunner.java:118) at com.pe.skull.titan.tasks.PipelineRunner.lambda$startPipelines$51830645$1(PipelineRunner.java:77) at org.apache.spark.sql.streaming.DataStreamWriter.$anonfun$foreachBatch$1(DataStreamWriter.scala:505) at org.apache.spark.sql.streaming.DataStreamWriter.$anonfun$foreachBatch$1$adapted(DataStreamWriter.scala:505) at org.apache.spark.sql.execution.streaming.sources.ForeachBatchSink.addBatch(ForeachBatchSink.scala:34) at org.apache.spark.sql.execution.streaming.MicroBatchExecution.$anonfun$runBatch$17(MicroBatchExecution.scala:732) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:125) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:201) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:108) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:66) at org.apache.spark.sql.execution.streaming.MicroBatchExecution.$anonfun$runBatch$16(MicroBatchExecution.scala:729) at org.apache.spark.sql.execution.streaming.ProgressReporter.reportTimeTaken(ProgressReporter.scala:427) at org.apache.spark.sql.execution.streaming.ProgressReporter.reportTimeTaken$(ProgressReporter.scala:425) at org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:67) at org.apache.spark.sql.execution.streaming.MicroBatchExecution.runBatch(MicroBatchExecution.scala:729) at org.apache.spark.sql.execution.streaming.MicroBatchExecution.$anonfun$runActivatedStream$2(MicroBatchExecution.scala:286) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at org.apache.spark.sql.execution.streaming.ProgressReporter.reportTimeTaken(ProgressReporter.scala:427) at org.apache.spark.sql.execution.streaming.ProgressReporter.reportTimeTaken$(ProgressReporter.scala:425) at org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:67) at org.apache.spark.sql.execution.streaming.MicroBatchExecution.$anonfun$runActivatedStream$1(MicroBatchExecution.scala:249) at org.apache.spark.sql.execution.streaming.ProcessingTimeExecutor.execute(TriggerExecutor.scala:67) at org.apache.spark.sql.execution.streaming.MicroBatchExecution.runActivatedStream(MicroBatchExecution.scala:239) at org.apache.spark.sql.execution.streaming.StreamExecution.$anonfun$runStream$1(StreamExecution.scala:311) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900) at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:289) at org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.$anonfun$run$1(StreamExecution.scala:211) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at org.apache.spark.JobArtifactSet$.withActiveJobArtifactState(JobArtifactSet.scala:94) at org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:211) Caused by: org.apache.hudi.exception.HoodieException: Got runtime exception when hive syncing product_versions_snapshot_nrt at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:170) at org.apache.hudi.sync.common.util.SyncUtilHelpers.runHoodieMetaSync(SyncUtilHelpers.java:79) ... 71 more Caused by: java.lang.NullPointerException at org.apache.hudi.common.table.timeline.TimelineUtils.lambda$null$5(TimelineUtils.java:114) at java.base/java.util.HashMap.forEach(HashMap.java:1337) at org.apache.hudi.common.table.timeline.TimelineUtils.lambda$getDroppedPartitions$6(TimelineUtils.java:113) at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1655) at java.base/java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:658) at org.apache.hudi.common.table.timeline.TimelineUtils.getDroppedPartitions(TimelineUtils.java:110) at org.apache.hudi.sync.common.HoodieSyncClient.getDroppedPartitionsSince(HoodieSyncClient.java:97) at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:289) at org.apache.hudi.hive.HiveSyncTool.doSync(HiveSyncTool.java:179) at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:167) ... 72 more org.apache.hudi.exception.HoodieMetaSyncException: Could not sync using the meta sync class org.apache.hudi.hive.HiveSyncTool at org.apache.hudi.sync.common.util.SyncUtilHelpers.runHoodieMetaSync(SyncUtilHelpers.java:81) at org.apache.hudi.HoodieSparkSqlWriterInternal.$anonfun$metaSync$2(HoodieSparkSqlWriter.scala:1015) at scala.collection.mutable.HashSet.foreach(HashSet.scala:79) at org.apache.hudi.HoodieSparkSqlWriterInternal.metaSync(HoodieSparkSqlWriter.scala:1013) at org.apache.hudi.HoodieSparkSqlWriterInternal.commitAndPerformPostOperations(HoodieSparkSqlWriter.scala:1112) at org.apache.hudi.HoodieSparkSqlWriterInternal.writeInternal(HoodieSparkSqlWriter.scala:508) at org.apache.hudi.HoodieSparkSqlWriterInternal.write(HoodieSparkSqlWriter.scala:187) at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:125) at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:168) at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:48) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:75) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:73) at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:84) at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:107) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:125) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:201) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:108) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:66) at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:107) at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:98) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:473) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(origin.scala:76) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:473) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:32) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:32) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:32) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:449) at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:98) at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:85) at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:83) at org.apache.spark.sql.execution.QueryExecution.assertCommandExecuted(QueryExecution.scala:142) at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:859) at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:388) at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:361) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:240) at com.pe.skull.titan.utils.SparkUtils.writeToTable(SparkUtils.java:137) at com.pe.skull.titan.tasks.PipelineRunner.writeData(PipelineRunner.java:154) at com.pe.skull.titan.tasks.PipelineRunner.processBatch(PipelineRunner.java:118) at com.pe.skull.titan.tasks.PipelineRunner.lambda$startPipelines$51830645$1(PipelineRunner.java:77) at org.apache.spark.sql.streaming.DataStreamWriter.$anonfun$foreachBatch$1(DataStreamWriter.scala:505) at org.apache.spark.sql.streaming.DataStreamWriter.$anonfun$foreachBatch$1$adapted(DataStreamWriter.scala:505) at org.apache.spark.sql.execution.streaming.sources.ForeachBatchSink.addBatch(ForeachBatchSink.scala:34) at org.apache.spark.sql.execution.streaming.MicroBatchExecution.$anonfun$runBatch$17(MicroBatchExecution.scala:732) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:125) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:201) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:108) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:66) at org.apache.spark.sql.execution.streaming.MicroBatchExecution.$anonfun$runBatch$16(MicroBatchExecution.scala:729) at org.apache.spark.sql.execution.streaming.ProgressReporter.reportTimeTaken(ProgressReporter.scala:427) at org.apache.spark.sql.execution.streaming.ProgressReporter.reportTimeTaken$(ProgressReporter.scala:425) at org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:67) at org.apache.spark.sql.execution.streaming.MicroBatchExecution.runBatch(MicroBatchExecution.scala:729) at org.apache.spark.sql.execution.streaming.MicroBatchExecution.$anonfun$runActivatedStream$2(MicroBatchExecution.scala:286) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at org.apache.spark.sql.execution.streaming.ProgressReporter.reportTimeTaken(ProgressReporter.scala:427) at org.apache.spark.sql.execution.streaming.ProgressReporter.reportTimeTaken$(ProgressReporter.scala:425) at org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:67) at org.apache.spark.sql.execution.streaming.MicroBatchExecution.$anonfun$runActivatedStream$1(MicroBatchExecution.scala:249) at org.apache.spark.sql.execution.streaming.ProcessingTimeExecutor.execute(TriggerExecutor.scala:67) at org.apache.spark.sql.execution.streaming.MicroBatchExecution.runActivatedStream(MicroBatchExecution.scala:239) at org.apache.spark.sql.execution.streaming.StreamExecution.$anonfun$runStream$1(StreamExecution.scala:311) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900) at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:289) at org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.$anonfun$run$1(StreamExecution.scala:211) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at org.apache.spark.JobArtifactSet$.withActiveJobArtifactState(JobArtifactSet.scala:94) at org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:211) Caused by: org.apache.hudi.exception.HoodieException: Got runtime exception when hive syncing i_recommendations_widget_shown_snapshot_nrt at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:170) at org.apache.hudi.sync.common.util.SyncUtilHelpers.runHoodieMetaSync(SyncUtilHelpers.java:79) ... 71 more Caused by: java.lang.NullPointerException at org.apache.hudi.common.table.timeline.TimelineUtils.lambda$null$5(TimelineUtils.java:114) at java.base/java.util.HashMap.forEach(HashMap.java:1337) at org.apache.hudi.common.table.timeline.TimelineUtils.lambda$getDroppedPartitions$6(TimelineUtils.java:113) at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1655) at java.base/java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:658) at org.apache.hudi.common.table.timeline.TimelineUtils.getDroppedPartitions(TimelineUtils.java:110) at org.apache.hudi.sync.common.HoodieSyncClient.getDroppedPartitionsSince(HoodieSyncClient.java:97) at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:289) at org.apache.hudi.hive.HiveSyncTool.doSync(HiveSyncTool.java:179) at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:167) ... 72 more
Hello Team,
HUDI 15 We have performed some troubleshooting steps and tries with different hoodie properties. sharing test cases with you..
Test Case 1 Pipeline Run Status : Success
hoodie.clean.automatic: false
hoodie.clean.async: false
hoodie.datasource.hive_sync.enable: false
Test Case 2 Pipeline Run Status : Fail [ Table is different than case #1 ]
hoodie.clean.automatic: false
hoodie.clean.async: false
hoodie.datasource.hive_sync.enable: true
Test Case 3 Pipeline Run Status : Success [ Table is different than case #1 ]
hoodie.clean.automatic: true
hoodie.clean.async: false
hoodie.datasource.hive_sync.enable: false
Test Case 4 Pipeline Run Status : Success [ Table same as #1 ]
hoodie.clean.automatic: true
hoodie.clean.async: false
hoodie.datasource.hive_sync.enable: true
As you can see the test cases i have shared above Pipeline was working yesterday after disabling and then enabling the hive_sync.
Yesterday we disabled hive_sync in the pipeline and it ran successfully and enabled it again in next and that run successfully
but IN TODAYs run it failed with same error
It is very concerning is there anything we are missing here
Sharinf Hoodie Options as well
hudiOptions:
hoodie.cleaner.commits.retained: 10
hoodie.metadata.keep.max.commits: 30
hoodie.metadata.clean.async: false
hoodie.keep.max.commits: 30
hoodie.metadata.keep.min.commits: 20
hoodie.archive.async: false
hoodie.clean.automatic: true
hoodie.finalize.write.parallelism: 200
hoodie.fail.on.timeline.archiving: false
hoodie.clean.async: false
hoodie.parquet.max.file.size: 128000000
hoodie.datasource.hive_sync.support_timestamp : true
#DISABLING METADATA TO REDUCE FREQUENT CALLS TO GCS
hoodie.metadata.enable: false
hoodie.datasource.write.hive_style_partitioning : true
hoodie.parquet.small.file.limit: 100000000
hoodie.datasource.hive_sync.enable: true
hoodie.bulkinsert.shuffle.parallelism: 200
hoodie.keep.min.commits: 11
hoodie.datasource.meta.sync.enable: true
hoodie.metadata.cleaner.commits.retained: 3
hoodie.cleaner.incremental.mode: true
hoodie.commits.archival.batch: 12
hoodie.upsert.shuffle.parallelism: 200
hive_sync.support_timestamp: true
hoodie.insert.shuffle.parallelism: 200
hoodie.metadata.compact.max.delta.commits: 10
compaction.delta_commits: 5
metadata.compaction.delta_commits: 10
hoodie.compact.inline.max.delta.commits: 5
hoodie.archive.automatic: true
hoodie.cleaner.parallelism: 200
@sushant-searce Is there any reasons why are you settinghoodie.fail.on.timeline.archiving to false?
Also disabling metadata table will ideally increase number of GCS calls as it has to rely on listing operations.
Although it still look concerning as its null pointer exception on org.apache.hudi.common.table.timeline.TimelineUtils.getDroppedPartitions
Are you using only upsert or any other operations also on this table?
Issue#2 is cleaner failure while finalizing write. Scenario is async cleaning is enabled and then cleaning went inflight, and when we come to finalizing the write and reconcile against markers, the cleaner fails because the reconciliation is looking up the clean instant itself.