[SUPPORT] No way to clean `archived/` folder
Describe the problem you faced
There's no way to control archived/ folder size and no way to trigger its cleaning.
We have a long running table which accumulated a lot of archives (~100 GB) which now damages cleaner performance and overall performance of ingestion process.
To Reproduce
Steps to reproduce the behavior:
- Go to All Configuration in Hudi Site
- Check for all settings that control archived/ folder of hudi
- Ensure there is none
Expected behavior
There should be a description somewhere in documentation of stating how to upkeep archived/ folder.
Upkeep of archived/ folder should be delegated to cleaner.
Environment Description
-
Hudi version : 0.14.0
-
Spark version : 3.4.1
-
Hive version :
-
Hadoop version :
-
Storage (HDFS/S3/GCS..) : S3
-
Running on Docker? (yes/no) :
Additional context
Related slack thread: https://apache-hudi.slack.com/archives/C4D716NPQ/p1711531654297129
Stacktrace
Add the stacktrace of the error.
just a side note, we were re-creating metadata for that table.
We have some option such as hoodie.archive.merge.enable for archival log merging, the cleaing is introduced only after 1.0 release.
Pointing to slack thread also here - https://apache-hudi.slack.com/archives/C4D716NPQ/p1711531654297129
may be we should introduce a ArchivalClean table service to auto clean files older than say 2 months. Not many users are going to inspect archival timeline after 2+ months. and it will avoid accumulating entire history. Interested users can still choose to not clean it up.
Hi, any news on archived commit purge ? We had 10k archived commits in the metadata table. This leads to long locking with also 10k such logs during the metadata step:
Closing Log file reader
somehow (hudi 0.14.1):
- the mdt archived commit are not merged
- the mdt archived commit are read
We have some option such as
hoodie.archive.merge.enablefor archival log merging, the cleaing is introduced only after 1.0 release.
@danny0405 Hi, Does hudi 1.0.1 support cleaning up archived directories? I don't see any parameters for cleaning up archived directories on the official website.
yes, it is supported.