[SUPPORT] Incremental cleaning never used during insert
hudi 0.11.1
I am working on tables with huge number of partition (> 100k) and almost append only - no update in the past, rarely delete.
Previously I had some issue with cleaning together with bulk-insert : auto-clean was very slow because never found previous cleaning commit and also always do full cleaning of all partitions.
Now I am using insert operation and was expecting no such issue. But I also get that behavior: auto-clean always process every partition in the table.
Moreover, cleaning is way slower with metadata enabled (from 5 minutes w/o metadata to 4 hours w/ metadata enabled), and it get slower when metadata compaction has not been done recently. As a result, auto-clean is not possible in my case together with metadata enabled.
By the way, cleaning has multiple functionality such removing old files, but also repairing the timeline (eg: timeouted commits).
- Is incremental cleaning supposed to work that way ?
- Can full cleaning w/ metadata performances be improved somehow (for example use filelisting which is faster)