hudi icon indicating copy to clipboard operation
hudi copied to clipboard

[SUPPORT] No way to clean `archived/` folder

Open bk-mz opened this issue 1 year ago • 5 comments

Describe the problem you faced

There's no way to control archived/ folder size and no way to trigger its cleaning.

We have a long running table which accumulated a lot of archives (~100 GB) which now damages cleaner performance and overall performance of ingestion process.

To Reproduce

Steps to reproduce the behavior:

  1. Go to All Configuration in Hudi Site
  2. Check for all settings that control archived/ folder of hudi
  3. Ensure there is none

Expected behavior

There should be a description somewhere in documentation of stating how to upkeep archived/ folder.

Upkeep of archived/ folder should be delegated to cleaner.

Environment Description

  • Hudi version : 0.14.0

  • Spark version : 3.4.1

  • Hive version :

  • Hadoop version :

  • Storage (HDFS/S3/GCS..) : S3

  • Running on Docker? (yes/no) :

Additional context

Related slack thread: https://apache-hudi.slack.com/archives/C4D716NPQ/p1711531654297129

Stacktrace

Add the stacktrace of the error.

bk-mz avatar Mar 27 '24 15:03 bk-mz

just a side note, we were re-creating metadata for that table.

bk-mz avatar Mar 27 '24 15:03 bk-mz

We have some option such as hoodie.archive.merge.enable for archival log merging, the cleaing is introduced only after 1.0 release.

danny0405 avatar Mar 28 '24 00:03 danny0405

Pointing to slack thread also here - https://apache-hudi.slack.com/archives/C4D716NPQ/p1711531654297129

ad1happy2go avatar Mar 28 '24 11:03 ad1happy2go

may be we should introduce a ArchivalClean table service to auto clean files older than say 2 months. Not many users are going to inspect archival timeline after 2+ months. and it will avoid accumulating entire history. Interested users can still choose to not clean it up.

nsivabalan avatar Apr 09 '24 01:04 nsivabalan

Hi, any news on archived commit purge ? We had 10k archived commits in the metadata table. This leads to long locking with also 10k such logs during the metadata step:

Closing Log file reader

somehow (hudi 0.14.1):

  • the mdt archived commit are not merged
  • the mdt archived commit are read

parisni avatar Oct 02 '24 09:10 parisni

We have some option such as hoodie.archive.merge.enable for archival log merging, the cleaing is introduced only after 1.0 release.

@danny0405 Hi, Does hudi 1.0.1 support cleaning up archived directories? I don't see any parameters for cleaning up archived directories on the official website.

Image

SGITLOGIN avatar Mar 11 '25 09:03 SGITLOGIN

yes, it is supported.

danny0405 avatar Mar 11 '25 10:03 danny0405