hive icon indicating copy to clipboard operation
hive copied to clipboard

HIVE-28341: Iceberg: Change Major QB Full Table Compaction to compact…

Open difin opened this issue 1 year ago • 1 comments

… partition by partition

What changes were proposed in this pull request?

Change Major QB Full Table Compaction to compact partition by partition

Why are the changes needed?

Currently, Iceberg Major compaction compacts a whole table in one step. If a table is partitioned and has a lot of data this operation can take a lot of time and it risks getting write conflicts at the commit stage. This PR proposes to improve it to work partition by partition. Also, for each partition it will create one snapshot instead of 2 snapshots (truncate+IOW) created now when compacting the whole table in one step.

Does this PR introduce any user-facing change?

No

Is the change a dependency upgrade?

No

How was this patch tested?

New query test added and updated existing iceberg compaction query tests with the new correct expected results.

difin avatar Jun 27 '24 23:06 difin