Kafka Connect: Add delta writer support
resolves #10842
@bryanck copied over the code as is.
Im planning to refactor upsert mode (delta writer) code, planning to add few improvements to it, potentially changing existing behavior. should we merge this first and add the changes with separate PR. or combine them? what do you think?
@bryanck copied over the code as is.
Im planning to refactor upsert mode (delta writer) code, planning to add few improvements to it, potentially changing existing behavior. should we merge this first and add the changes with separate PR. or combine them? what do you think?
There are a couple of discussions on why we didn't originally add the delta writer functionality. I think we will need to resolve those discussions before we add this.
This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions.
This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.
I consider this functionality as a MUST, what's preventing it from been merged? At the end people are force to move to tabular version if they want support for upserts
@ismailsimsek @bryanck can we please revive this PR?
It would be great to have this functionality, even if it needs to be used with some attention points. The current alternative is to use Tabular version, which has same CAVEATS and introduce further limitations in the environment due to the usage of old 1.5.2 Iceberg version.
the failure doesn't seem related.
@ajantha-bhat @bryanck @jbonofre could you please review?
@danielcweeks Do you feel our stance on this evolved or should we hold off on adding this until there is more clarity on the future of equality deletes?
I feel we should close this PR until we discuss this with the community. If the community feels we can move forward, I can handling porting over my code.
@bryanck while I can fully relate to your concerns, I strongly advocate for moving forward with this PR, performance considerations notwithstanding. I agree with @ajantha-bhat and their comment here that the performance limitations of equality deletes should be addressed separately. For our team, having an append-only connector is virtually useless and we would rather deal with the related performance issues.
I'm following up on this topic—are there any updates or decisions so far? Has this discussion already been escalated to the community? If not, @bryanck could you please advise on the best way to do so?
Thanks in advance for your help!
This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions.
Still relevant
Can this be merged?
Can this be merged?
There are conflicts with the main branch, so we can't merge this.
For the record: There is an ongoing effort to move away from the FileAppenderFactory and start using the FileWriterFactory (https://github.com/apache/iceberg/pull/14328), which will conflict with the current PR.
Can this be merged?
There are conflicts with the main branch, so we can't merge this.
For the record: There is an ongoing effort to move away from the FileAppenderFactory and start using the FileWriterFactory (#14328), which will conflict with the current PR.
Will the FileWriterFactory allow for upserts?
@cbuckle1: It only changes the interface which uses for the files to write. We still need a PR like this. Mostly only a change where FileAppenderFactory<Record> appenderFactory is replaced by FileWriterFactory<Record> writerFactory
Hey @pvary I just saw this PR you were mentioning is already merged. Are we good to go now?
We should resolve concerns around relying on equality deletes before going down that road, or open a new PR for a solution that does not rely on equality deletes.
Here is one thread on the mailing list from a few months ago: https://lists.apache.org/thread/96dhf3sj5pc4ql0l8yk8sxgtr78bchrd.
@bryanck - for us, we are using this for streaming data, so equality deletes are needed. Based on another comment from April: https://github.com/apache/iceberg/pull/12070#issuecomment-2801973306 it looks like others are in the same boat.
@bryanck - in the linked thread, you mentioned that the flink sink is looking at a similar issue, do you have the Github issue for us to track?
Whoever interested to implement it : Umbrella issue is https://github.com/apache/iceberg/issues/11122 the main Flink PR is: https://github.com/apache/iceberg/pull/14197 note that it has followup PRs and references to spark implementation as well (followup changes to check).
I can give it a shot, ill base it on your PR and the flink sink implementation.