hudi icon indicating copy to clipboard operation
hudi copied to clipboard

[SUPPORT] How can i clean up archived metadata?

Open BruceKellan opened this issue 1 year ago • 5 comments

Describe the problem you faced

image

In our production environment, there are hundreds of flink hudi streaming application running related to hudi. Due to the characteristics of streaming, the application will generate checkpoints at the minute level, such as a commit every 3 minutes. Due to the above reasons, a lot of hudi timeline metadata is generated.

Now I have some applications that have been running for two year. Even if the archived metadata is merged and archived, the file directory size of /.hoodie/archived is quite large and close to 20GB.

In order to avoid unnecessary risks, I need to take some measures in advance. I have some doubts:

  1. What will happen if I directly delete some archived files that have not been changed for a long time?

  2. Is there any way for me to clean up these archived files?

Environment Description

  • Hudi version : 0.13.1

  • Flink version: flink-1.13

  • Spark version : 3.2.0

  • Storage (HDFS/S3/GCS..) : Aliyun-OSS

  • Running on Docker? (yes/no) : no

BruceKellan avatar Jul 31 '24 02:07 BruceKellan

It is safe to delete older archived files. Currently there is no way but you can probably write a simple script to do so.

ad1happy2go avatar Jul 31 '24 05:07 ad1happy2go

Thanks for your apply.

Do we have some pull-requests or plans to automatically clean up old archive files?

BruceKellan avatar Aug 01 '24 02:08 BruceKellan

Do we have some pull-requests or plans to automatically clean up old archive files?

I think we can put it in the upgrade handler maybe.

danny0405 avatar Aug 01 '24 04:08 danny0405

thanks for your apply. @danny0405 @ad1happy2go I'm going to try to write a simple script to clean up merged archived meta file that haven't been modified for a long time. If any other questions arise, they will be raised here. I will close this issue after I clean it up.

BruceKellan avatar Aug 02 '24 01:08 BruceKellan

@BruceKellan Can you share the script also to community. Let us know if you face any issues.

ad1happy2go avatar Aug 22 '24 09:08 ad1happy2go

I'm so sorry for the long time no reply, we have implemented this cleanup script.

I hope everyone can understand that this cleanup script is applicable to version 0.13.1, but as the timeline structure evolves, it is not guaranteed to still be applicable.

closed this issue.

BruceKellan avatar Mar 12 '25 03:03 BruceKellan

@BruceKellan Can you share the script also to community. Let us know if you face any issues.

Just some delete action, although it is rough, it can help solve the problem

BruceKellan avatar Mar 12 '25 03:03 BruceKellan