iceberg-python icon indicating copy to clipboard operation
iceberg-python copied to clipboard

Implement Sorted Writes

Open vinjai opened this issue 1 year ago • 2 comments

Implements: https://github.com/apache/iceberg-python/issues/271

vinjai avatar Jun 29 '24 14:06 vinjai

This PR solves for:

  1. Writing sorted datasets to a partitioned or non-partitioned iceberg table.
  2. Generating manifests with correct sort-order-id.
  3. Integration tests to make sure sorted datasets are generated similar to spark sorting.

Decisions taken:

  • If a sort transformation is not supported in PyIceberg, we will raise a warning related to the same and move ahead by writing the unsorted data with unsorted sort-order-id.

What is not in the scope of this PR?

  • Performance improvement of the new sort function. (We will raise a separate issue for the same.)

vinjai avatar Jul 05 '24 00:07 vinjai

@Fokko This PR is ready for review

vinjai avatar Jul 05 '24 00:07 vinjai