[SUPPORT] How can i clean up archived metadata?
Describe the problem you faced
In our production environment, there are hundreds of flink hudi streaming application running related to hudi. Due to the characteristics of streaming, the application will generate checkpoints at the minute level, such as a commit every 3 minutes. Due to the above reasons, a lot of hudi timeline metadata is generated.
Now I have some applications that have been running for two year. Even if the archived metadata is merged and archived, the file directory size of /.hoodie/archived is quite large and close to 20GB.
In order to avoid unnecessary risks, I need to take some measures in advance. I have some doubts:
-
What will happen if I directly delete some archived files that have not been changed for a long time?
-
Is there any way for me to clean up these archived files?
Environment Description
-
Hudi version : 0.13.1
-
Flink version: flink-1.13
-
Spark version : 3.2.0
-
Storage (HDFS/S3/GCS..) : Aliyun-OSS
-
Running on Docker? (yes/no) : no
It is safe to delete older archived files. Currently there is no way but you can probably write a simple script to do so.
Thanks for your apply.
Do we have some pull-requests or plans to automatically clean up old archive files?
Do we have some pull-requests or plans to automatically clean up old archive files?
I think we can put it in the upgrade handler maybe.
thanks for your apply. @danny0405 @ad1happy2go I'm going to try to write a simple script to clean up merged archived meta file that haven't been modified for a long time. If any other questions arise, they will be raised here. I will close this issue after I clean it up.
@BruceKellan Can you share the script also to community. Let us know if you face any issues.
I'm so sorry for the long time no reply, we have implemented this cleanup script.
I hope everyone can understand that this cleanup script is applicable to version 0.13.1, but as the timeline structure evolves, it is not guaranteed to still be applicable.
closed this issue.
@BruceKellan Can you share the script also to community. Let us know if you face any issues.
Just some delete action, although it is rough, it can help solve the problem