iceberg-python
iceberg-python copied to clipboard
Implement Sorted Writes
Implements: https://github.com/apache/iceberg-python/issues/271
This PR solves for:
- Writing sorted datasets to a partitioned or non-partitioned iceberg table.
- Generating manifests with correct sort-order-id.
- Integration tests to make sure sorted datasets are generated similar to spark sorting.
Decisions taken:
- If a sort transformation is not supported in PyIceberg, we will raise a warning related to the same and move ahead by writing the unsorted data with unsorted sort-order-id.
What is not in the scope of this PR?
- Performance improvement of the new sort function. (We will raise a separate issue for the same.)
@Fokko This PR is ready for review