[SUPPORT] Hudi write to COW table hangs on Preparing compaction metadata job
Describe the problem you faced
My Hudi job runs fine for first 9-10 executions and each job takes about 9-10mins. The job run after 10th/11th execution hangs and neither succeeds or fails. I am running this on Glue 4.0, Hudi 0.14. I have gone through the Spark UI and looks like the job is hanging on Preparing compaction metadata: gft_fact_consol_hudi_metadata step.
To Reproduce
Steps to reproduce the behavior:
Below are the hudi options used
{
'hoodie.table.cdc.enabled':'true',
'hoodie.table.cdc.supplemental.logging.mode': 'data_before_after',
'hoodie.datasource.write.recordkey.field': 'bazaar_uuid',
'hoodie.datasource.write.keygenerator.class': 'org.apache.hudi.keygen.ComplexKeyGenerator',
'hoodie.table.name': "gft_fact_consol_hudi",
'hoodie.datasource.write.table.name': "gft_fact_consol_hudi",
'hoodie.datasource.hive_sync.table': "gft_fact_consol_hudi",
'hoodie.datasource.hive_sync.database': "default",
'hoodie.datasource.write.partitionpath.field': 'a,b,c',
'hoodie.datasource.hive_sync.partition_fields': 'a,b,c',
'hoodie.datasource.write.hive_style_partitioning': 'true',
'hoodie.datasource.hive_sync.enable': 'true',
'hoodie.datasource.hive_sync.partition_extractor_class': 'org.apache.hudi.hive.MultiPartKeysValueExtractor',
'hoodie.metadata.enable': 'true',
'hoodie.metadata.record.index.enable':'true',
'hoodie.cleaner.policy': 'KEEP_LATEST_FILE_VERSIONS',
# 'hoodie.parquet.small.file.limit':104857600,
# 'hoodie.parquet.max.file.size':125829120,
'hoodie.clustering.inline':'true',
'hoodie.clustering.inline.max.commits': '4',
'hoodie.datasource.write.storage.type': 'COPY_ON_WRITE',
'hoodie.datasource.write.operation': 'upsert',
'hoodie.datasource.write.precombine.field': 'record_uuid',
'hoodie.datasource.hive_sync.use_jdbc': 'false',
'hoodie.datasource.hive_sync.mode': 'hms',
'hoodie.datasource.hive_sync.support_timestamp': 'true',
# 'hoodie.write.concurrency.mode': 'OPTIMISTIC_CONCURRENCY_CONTROL',
# 'hoodie.write.lock.provider': 'org.apache.hudi.aws.transaction.lock.DynamoDBBasedLockProvider',
# 'hoodie.cleaner.policy.failed.writes': 'LAZY',
# 'hoodie.write.lock.dynamodb.table': 'fri_hudi_locks_table',
# 'hoodie.embed.timeline.server': 'false',
# 'hoodie.write.lock.client.wait_time_ms_between_retry': 50000,
# 'hoodie.write.lock.wait_time_ms_between_retry': 20000,
# 'hoodie.write.lock.wait_time_ms': 60000,
# 'hoodie.write.lock.client.num_retries': 15,
# 'hoodie.keep.max.commits':'7',
# 'hoodie.keep.min.commits':'6',
# 'hoodie.write.lock.dynamodb.region': 'us-west-2',
# 'hoodie.write.lock.dynamodb.endpoint_url': 'dynamodb.us-west-2.amazonaws.com'
}
Expected behavior
As per https://hudi.apache.org/docs/compaction#background, compaction should only occur for MOR tables. Any idea why it is happening for a COW table?
Environment Description
-
Hudi version : 0.14
-
Spark version : 3.3.0
-
Hive version :
-
Hadoop version :
-
Storage (HDFS/S3/GCS..) : s3
-
Running on Docker? (yes/no) :
@keerthiskating Do you have lot of small files?
Can you run the file sizing tool - https://medium.com/@simpsons/monitoring-table-stats-22684eb70ee1
@ad1happy2go I do not have access to EMR cluster and so unable to run the spark-submit. However, I explicitly set following parameters to allow for aggressive cleaning. Even then, first 10-12 writes to the table are running fine and then the write job next to it hangs on Preparing compaction metadata: gft_fact_consol_hudi_metadata
'hoodie.keep.max.commits':'2', 'hoodie.keep.min.commits':'1',
How do I ensure compaction happens after each write? Also, is this log file compaction? Is there a way I can set parallelism for compaction?
The compaction step is on the metadata table for your table, not on the table itself. The metadata table afaik is always a MOR table.
To verify, you can disable the metadata table hoodie.metadata.enable: false and these steps should go away.
(Not sure why this is happening, definitely seems strange that it would just be stuck because the metadata table is quite small)
I tried reading the metadata table alone using Can the Hudi Metadata table be queried? in a separate spark job and it hangs. @nsivabalan thoughts?
@keerthiskating What logs you see when it hanged, did you check spark UI which stage is getting stuck.
No logs are being sent to cloudwatch, it is getting stuck at Preparing compaction metadata: gft_fact_consol_hudi_metadata
@keerthiskating Sorry for the delay on this. Do you know how many file groups are there in your table. Are you having too many partitions? If possible, can you try running this tool and share the output - https://medium.com/@simpsons/monitoring-table-stats-22684eb70ee1
Also, can you check the size of your .hoodie directory. if possible, can you zip it and share to community to look into this further?
@ad1happy2go I do not have an EMR cluster to run spark-submit. I am using Glue.
@keerthiskating On thing you can check is size of your log files and how many file groups are getting impacted by checking the compaction commit metadata.
@ad1happy2go Hi, I don't know much about the log files and the file groups. Why do the size and the number of log files and file groups matter for this issue?
@keerthiskating @Gatsby-Lee As we see the stage has 11 tasks. normally one task created for one file group. They are running in parallel and taking more than 1 hour. That means these tasks has a lot to merge. This normally happens when the log files are very big.
@ad1happy2go Thank you. I see. Each task is for one file group and in the shared issue, each task takes more than 1hr. So, you are thinking that there might be lots of files to merge. Is there a way to know how many files exist in each file group? ( like stats )
Yes That's correct @Gatsby-Lee .
@keerthiskating Were you able to check this out. Please update us on the same. Thanks.
Hi everybody!
This exact thing is happening to me, in the exact same setup Glue 4.0, Hudi 0.14.1. After about 9 "fast" upserts it hangs out on that "Preparing compaction metadata" step for about 3 hours timing out my Glue job that usually takes about 6 to 9 minutes.
It is indeed a big table with lots of files (about 10MB to 130MB)
As for @ad1happy2go suggestion, my .hoodie is about 11GB uncompressed
and the hfiles as well as some .log in the .hoodie/metadata/record_index/ directory are from 80B to 400MB+
@keerthiskating Sorry for the delay on this. Do you know how many file groups are there in your table. Are you having too many partitions? If possible, can you try running this tool and share the output - https://medium.com/@simpsons/monitoring-table-stats-22684eb70ee1
Also, can you check the size of your .hoodie directory. if possible, can you zip it and share to community to look into this further?
And here are the results of running the table stats utility: I dont think is that bad for it to take 3H+ with 5 available workers, or is it? (they are parquet with snappy)
11/07 01:34:45 INFO TableSizeStats: Number of files: 157 24/11/07 01:34:45 INFO TableSizeStats: Total size: 18.38 GB 24/11/07 01:34:45 INFO TableSizeStats: Minimum file size: 59.46 MB 24/11/07 01:34:45 INFO TableSizeStats: Maximum file size: 121.33 MB 24/11/07 01:34:45 INFO TableSizeStats: Average file size: 119.86 MB 24/11/07 01:34:45 INFO TableSizeStats: Median file size: 120.40 MB 24/11/07 01:34:45 INFO TableSizeStats: P50 file size: 120.40 MB 24/11/07 01:34:45 INFO TableSizeStats: P90 file size: 120.98 MB 24/11/07 01:34:45 INFO TableSizeStats: P95 file size: 121.11 MB 24/11/07 01:34:45 INFO TableSizeStats: P99 file size: 121.32 MB
Any help will be greatly appreciated
@juanAmayaRamirez The issue here is it created 10 file groups in record index. Each log file is also very huge ~465 MB. So it have to merge those many big log files with the base file during compaction. For one file group it will only create one task so parallelism within file group. You can disable the metadata table once and then enable it back to recreate it and increase value of hoodie.metadata.record.index.min.filegroup.count to a higher number. So it create more file groups.
Although we still need to check why it is creating such large log files.
Hi @ad1happy2go I know its been a while! but I Finally verified once again with the file sizing tool .. and it appears that just after recreating the table Insert_overwrite_table there are indeed too many small files in the table itself: 24/11/13 17:14:09 INFO TableSizeStats: Number of files: 1767 24/11/13 17:14:09 INFO TableSizeStats: Total size: 18.52 GB 24/11/13 17:14:09 INFO TableSizeStats: Minimum file size: 3.54 MB 24/11/13 17:14:09 INFO TableSizeStats: Maximum file size: 17.70 MB 24/11/13 17:14:09 INFO TableSizeStats: Average file size: 10.73 MB 24/11/13 17:14:09 INFO TableSizeStats: Median file size: 10.72 MB 24/11/13 17:14:09 INFO TableSizeStats: P50 file size: 10.72 MB 24/11/13 17:14:09 INFO TableSizeStats: P90 file size: 10.76 MB 24/11/13 17:14:09 INFO TableSizeStats: P95 file size: 10.77 MB 24/11/13 17:14:09 INFO TableSizeStats: P99 file size: 10.80 MB
Got any sugestions on how to group those files from the first write? tried using coalesce but with no success.