API,Core: Support Conditional Commits
Context
Adds support for committing changes to an iceberg table based on whether or not a condition is true at commit time. Not before the commit. Not after the commit. AT commit time.
This is useful in scenarios where users need a robust guard against potential concurrent commits. For example, some use cases require maintaining a monotonically increasing watermark in the snapshot properties. Our recently released iceberg-kafka-connect connector does this however it can only do this on a best-effort basis because Iceberg does not offer any API for expressing conditional commits. As a result, there is a risk of duplicate-file-appends there. This PR would enable closing that loophole.
For more history/context/usecases, please see the discussion in https://github.com/apache/iceberg/issues/6514.
Incidentally, Delta (the competing table format) offers a similar feature albeit through a much more narrow API: https://github.com/delta-io/delta/blob/master/PROTOCOL.md#transaction-identifiers
API design
We need to introduce a new API to allow users to declare the conditions under which a commit is allowed to proceed or not. There are two main options here:
- Add a new
void commitIf(List<Validation> validations)method to thePendingUpdateinterface.- See latest commit for implementation
- This offers a fluent-style API
- Add a new
void validate(List<Validation> validations)method to thePendingUpdateinterface.- See first commit for implementation
- I feel this is simpler but the API is not as fluent as option 1 which some reviewers raised concerns about
Note
- Please don't be put off by the size of the PR; 90% of it is purely tests.
@fqaiser94 I added a comment to the issue regarding the motivation use cases: https://github.com/apache/iceberg/issues/6514#issuecomment-1444388008.
@stevenzwu sorry, I've responded in the issue now, let's continue the conversation there.
All comments have been addressed. This is now ready for a second round of reviews.
Sorry @nastra @jackye1995, I didn't mean to remove you both as reviewers. For some reason, the UI won't even let me add you folks back as reviewers 😵💫 Please feel free to add yourselves back as reviewers.
I took a little break from this PR because reviews were moving a little slowly and it seemed like this feature wasn't considered a high priority. I have since had/seen a couple of conversations with people interested in this feature and affirmed it's value so I'm thinking now might be a good time to try reviving this PR.
I've rebased the changes on top of latest master and addressed all of the existing comments. Please take a look :)
This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions.
This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions.
This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions.
This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.