hudi [SUPPORT] Hudi write to COW table hangs on Preparing compaction metadata job

Describe the problem you faced

My Hudi job runs fine for first 9-10 executions and each job takes about 9-10mins. The job run after 10th/11th execution hangs and neither succeeds or fails. I am running this on Glue 4.0, Hudi 0.14. I have gone through the Spark UI and looks like the job is hanging on Preparing compaction metadata: gft_fact_consol_hudi_metadata step.

To Reproduce

Steps to reproduce the behavior:

Below are the hudi options used

{
                'hoodie.table.cdc.enabled':'true',
                'hoodie.table.cdc.supplemental.logging.mode': 'data_before_after',

                'hoodie.datasource.write.recordkey.field': 'bazaar_uuid',
                'hoodie.datasource.write.keygenerator.class': 'org.apache.hudi.keygen.ComplexKeyGenerator',

                'hoodie.table.name': "gft_fact_consol_hudi",
                'hoodie.datasource.write.table.name': "gft_fact_consol_hudi",
                'hoodie.datasource.hive_sync.table': "gft_fact_consol_hudi",
                'hoodie.datasource.hive_sync.database': "default",

                'hoodie.datasource.write.partitionpath.field': 'a,b,c',
                'hoodie.datasource.hive_sync.partition_fields': 'a,b,c',
                'hoodie.datasource.write.hive_style_partitioning': 'true',
                'hoodie.datasource.hive_sync.enable': 'true',
                'hoodie.datasource.hive_sync.partition_extractor_class': 'org.apache.hudi.hive.MultiPartKeysValueExtractor',

                'hoodie.metadata.enable': 'true',
                'hoodie.metadata.record.index.enable':'true',
                'hoodie.cleaner.policy': 'KEEP_LATEST_FILE_VERSIONS',

                # 'hoodie.parquet.small.file.limit':104857600,
                # 'hoodie.parquet.max.file.size':125829120,

                'hoodie.clustering.inline':'true',
                'hoodie.clustering.inline.max.commits': '4',

                'hoodie.datasource.write.storage.type': 'COPY_ON_WRITE',
                'hoodie.datasource.write.operation': 'upsert',
                'hoodie.datasource.write.precombine.field': 'record_uuid',

                'hoodie.datasource.hive_sync.use_jdbc': 'false',
                'hoodie.datasource.hive_sync.mode': 'hms',
                'hoodie.datasource.hive_sync.support_timestamp': 'true',

                # 'hoodie.write.concurrency.mode': 'OPTIMISTIC_CONCURRENCY_CONTROL',
                # 'hoodie.write.lock.provider': 'org.apache.hudi.aws.transaction.lock.DynamoDBBasedLockProvider',
                # 'hoodie.cleaner.policy.failed.writes': 'LAZY',

                # 'hoodie.write.lock.dynamodb.table': 'fri_hudi_locks_table',
                # 'hoodie.embed.timeline.server': 'false',
                # 'hoodie.write.lock.client.wait_time_ms_between_retry': 50000,
                # 'hoodie.write.lock.wait_time_ms_between_retry': 20000,
                # 'hoodie.write.lock.wait_time_ms': 60000,
                # 'hoodie.write.lock.client.num_retries': 15,
                # 'hoodie.keep.max.commits':'7',
                # 'hoodie.keep.min.commits':'6',
                # 'hoodie.write.lock.dynamodb.region': 'us-west-2',
                # 'hoodie.write.lock.dynamodb.endpoint_url': 'dynamodb.us-west-2.amazonaws.com'
}

Expected behavior

As per https://hudi.apache.org/docs/compaction#background, compaction should only occur for MOR tables. Any idea why it is happening for a COW table?

Environment Description

Hudi version : 0.14
Spark version : 3.3.0
Hive version :
Hadoop version :
Storage (HDFS/S3/GCS..) : s3
Running on Docker? (yes/no) :

Jul 31 '24 18:07 keerthiskating

@keerthiskating Do you have lot of small files?

Can you run the file sizing tool - https://medium.com/@simpsons/monitoring-table-stats-22684eb70ee1

Aug 01 '24 11:08 ad1happy2go

@ad1happy2go I do not have access to EMR cluster and so unable to run the spark-submit. However, I explicitly set following parameters to allow for aggressive cleaning. Even then, first 10-12 writes to the table are running fine and then the write job next to it hangs on Preparing compaction metadata: gft_fact_consol_hudi_metadata

'hoodie.keep.max.commits':'2', 'hoodie.keep.min.commits':'1',

How do I ensure compaction happens after each write? Also, is this log file compaction? Is there a way I can set parallelism for compaction?

Aug 01 '24 21:08 keerthiskating

The compaction step is on the metadata table for your table, not on the table itself. The metadata table afaik is always a MOR table.

To verify, you can disable the metadata table hoodie.metadata.enable: false and these steps should go away.

(Not sure why this is happening, definitely seems strange that it would just be stuck because the metadata table is quite small)

Aug 01 '24 23:08 mzheng-plaid

I tried reading the metadata table alone using Can the Hudi Metadata table be queried? in a separate spark job and it hangs. @nsivabalan thoughts?

Aug 02 '24 14:08 keerthiskating

@keerthiskating What logs you see when it hanged, did you check spark UI which stage is getting stuck.

Aug 05 '24 06:08 ad1happy2go

No logs are being sent to cloudwatch, it is getting stuck at Preparing compaction metadata: gft_fact_consol_hudi_metadata

Aug 05 '24 19:08 keerthiskating

@keerthiskating Sorry for the delay on this. Do you know how many file groups are there in your table. Are you having too many partitions? If possible, can you try running this tool and share the output - https://medium.com/@simpsons/monitoring-table-stats-22684eb70ee1

Also, can you check the size of your .hoodie directory. if possible, can you zip it and share to community to look into this further?

Aug 12 '24 15:08 ad1happy2go

@ad1happy2go I do not have an EMR cluster to run spark-submit. I am using Glue.

Aug 12 '24 16:08 keerthiskating

@keerthiskating On thing you can check is size of your log files and how many file groups are getting impacted by checking the compaction commit metadata.

Aug 14 '24 15:08 ad1happy2go

@ad1happy2go Hi, I don't know much about the log files and the file groups. Why do the size and the number of log files and file groups matter for this issue?

Aug 15 '24 23:08 Gatsby-Lee

@keerthiskating @Gatsby-Lee As we see the stage has 11 tasks. normally one task created for one file group. They are running in parallel and taking more than 1 hour. That means these tasks has a lot to merge. This normally happens when the log files are very big.

Aug 16 '24 15:08 ad1happy2go

@ad1happy2go Thank you. I see. Each task is for one file group and in the shared issue, each task takes more than 1hr. So, you are thinking that there might be lots of files to merge. Is there a way to know how many files exist in each file group? ( like stats )

Aug 16 '24 23:08 Gatsby-Lee

Yes That's correct @Gatsby-Lee .

@keerthiskating Were you able to check this out. Please update us on the same. Thanks.

Aug 22 '24 09:08 ad1happy2go

Hi everybody!

This exact thing is happening to me, in the exact same setup Glue 4.0, Hudi 0.14.1. After about 9 "fast" upserts it hangs out on that "Preparing compaction metadata" step for about 3 hours timing out my Glue job that usually takes about 6 to 9 minutes.

It is indeed a big table with lots of files (about 10MB to 130MB)

As for @ad1happy2go suggestion, my .hoodie is about 11GB uncompressed and the hfiles as well as some .log in the .hoodie/metadata/record_index/ directory are from 80B to 400MB+

@keerthiskating Sorry for the delay on this. Do you know how many file groups are there in your table. Are you having too many partitions? If possible, can you try running this tool and share the output - https://medium.com/@simpsons/monitoring-table-stats-22684eb70ee1

Also, can you check the size of your .hoodie directory. if possible, can you zip it and share to community to look into this further?

Nov 07 '24 00:11 juanAmayaRamirez

And here are the results of running the table stats utility: I dont think is that bad for it to take 3H+ with 5 available workers, or is it? (they are parquet with snappy)

11/07 01:34:45 INFO TableSizeStats: Number of files: 157 24/11/07 01:34:45 INFO TableSizeStats: Total size: 18.38 GB 24/11/07 01:34:45 INFO TableSizeStats: Minimum file size: 59.46 MB 24/11/07 01:34:45 INFO TableSizeStats: Maximum file size: 121.33 MB 24/11/07 01:34:45 INFO TableSizeStats: Average file size: 119.86 MB 24/11/07 01:34:45 INFO TableSizeStats: Median file size: 120.40 MB 24/11/07 01:34:45 INFO TableSizeStats: P50 file size: 120.40 MB 24/11/07 01:34:45 INFO TableSizeStats: P90 file size: 120.98 MB 24/11/07 01:34:45 INFO TableSizeStats: P95 file size: 121.11 MB 24/11/07 01:34:45 INFO TableSizeStats: P99 file size: 121.32 MB

Any help will be greatly appreciated

Nov 07 '24 01:11 juanAmayaRamirez

@juanAmayaRamirez The issue here is it created 10 file groups in record index. Each log file is also very huge ~465 MB. So it have to merge those many big log files with the base file during compaction. For one file group it will only create one task so parallelism within file group. You can disable the metadata table once and then enable it back to recreate it and increase value of hoodie.metadata.record.index.min.filegroup.count to a higher number. So it create more file groups.

Although we still need to check why it is creating such large log files.

Nov 07 '24 03:11 ad1happy2go

Hi @ad1happy2go I know its been a while! but I Finally verified once again with the file sizing tool .. and it appears that just after recreating the table Insert_overwrite_table there are indeed too many small files in the table itself: 24/11/13 17:14:09 INFO TableSizeStats: Number of files: 1767 24/11/13 17:14:09 INFO TableSizeStats: Total size: 18.52 GB 24/11/13 17:14:09 INFO TableSizeStats: Minimum file size: 3.54 MB 24/11/13 17:14:09 INFO TableSizeStats: Maximum file size: 17.70 MB 24/11/13 17:14:09 INFO TableSizeStats: Average file size: 10.73 MB 24/11/13 17:14:09 INFO TableSizeStats: Median file size: 10.72 MB 24/11/13 17:14:09 INFO TableSizeStats: P50 file size: 10.72 MB 24/11/13 17:14:09 INFO TableSizeStats: P90 file size: 10.76 MB 24/11/13 17:14:09 INFO TableSizeStats: P95 file size: 10.77 MB 24/11/13 17:14:09 INFO TableSizeStats: P99 file size: 10.80 MB

Got any sugestions on how to group those files from the first write? tried using coalesce but with no success.

Nov 13 '24 17:11 juanAmayaRamirez