iceberg icon indicating copy to clipboard operation
iceberg copied to clipboard

Kafka Connect: Add delta writer support

Open ismailsimsek opened this issue 1 year ago • 24 comments

resolves #10842

ismailsimsek avatar Jan 23 '25 15:01 ismailsimsek

@bryanck copied over the code as is.

Im planning to refactor upsert mode (delta writer) code, planning to add few improvements to it, potentially changing existing behavior. should we merge this first and add the changes with separate PR. or combine them? what do you think?

ismailsimsek avatar Jan 24 '25 11:01 ismailsimsek

@bryanck copied over the code as is.

Im planning to refactor upsert mode (delta writer) code, planning to add few improvements to it, potentially changing existing behavior. should we merge this first and add the changes with separate PR. or combine them? what do you think?

There are a couple of discussions on why we didn't originally add the delta writer functionality. I think we will need to resolve those discussions before we add this.

bryanck avatar Jan 24 '25 15:01 bryanck

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions.

github-actions[bot] avatar Feb 24 '25 00:02 github-actions[bot]

This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.

github-actions[bot] avatar Mar 03 '25 00:03 github-actions[bot]

I consider this functionality as a MUST, what's preventing it from been merged? At the end people are force to move to tabular version if they want support for upserts

juanluhidalgo avatar Mar 04 '25 10:03 juanluhidalgo

@ismailsimsek @bryanck can we please revive this PR?

olarcherc24 avatar Apr 10 '25 15:04 olarcherc24

It would be great to have this functionality, even if it needs to be used with some attention points. The current alternative is to use Tabular version, which has same CAVEATS and introduce further limitations in the environment due to the usage of old 1.5.2 Iceberg version.

nemesis910 avatar Apr 14 '25 06:04 nemesis910

the failure doesn't seem related.

@ajantha-bhat @bryanck @jbonofre could you please review?

ismailsimsek avatar Apr 14 '25 12:04 ismailsimsek

@danielcweeks Do you feel our stance on this evolved or should we hold off on adding this until there is more clarity on the future of equality deletes?

bryanck avatar Apr 14 '25 14:04 bryanck

I feel we should close this PR until we discuss this with the community. If the community feels we can move forward, I can handling porting over my code.

bryanck avatar Apr 14 '25 14:04 bryanck

@bryanck while I can fully relate to your concerns, I strongly advocate for moving forward with this PR, performance considerations notwithstanding. I agree with @ajantha-bhat and their comment here that the performance limitations of equality deletes should be addressed separately. For our team, having an append-only connector is virtually useless and we would rather deal with the related performance issues.

olarcherc24 avatar Apr 14 '25 14:04 olarcherc24

I'm following up on this topic—are there any updates or decisions so far? Has this discussion already been escalated to the community? If not, @bryanck could you please advise on the best way to do so?

Thanks in advance for your help!

nemesis910 avatar Apr 24 '25 16:04 nemesis910

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions.

github-actions[bot] avatar Jun 09 '25 00:06 github-actions[bot]

Still relevant

olarcherc24 avatar Jun 10 '25 07:06 olarcherc24

Can this be merged?

cbuckle1 avatar Oct 14 '25 13:10 cbuckle1

Can this be merged?

There are conflicts with the main branch, so we can't merge this.

For the record: There is an ongoing effort to move away from the FileAppenderFactory and start using the FileWriterFactory (https://github.com/apache/iceberg/pull/14328), which will conflict with the current PR.

pvary avatar Oct 14 '25 13:10 pvary

Can this be merged?

There are conflicts with the main branch, so we can't merge this.

For the record: There is an ongoing effort to move away from the FileAppenderFactory and start using the FileWriterFactory (#14328), which will conflict with the current PR.

Will the FileWriterFactory allow for upserts?

cbuckle1 avatar Oct 14 '25 13:10 cbuckle1

@cbuckle1: It only changes the interface which uses for the files to write. We still need a PR like this. Mostly only a change where FileAppenderFactory<Record> appenderFactory is replaced by FileWriterFactory<Record> writerFactory

pvary avatar Oct 14 '25 14:10 pvary

Hey @pvary I just saw this PR you were mentioning is already merged. Are we good to go now?

albertocarref avatar Oct 22 '25 15:10 albertocarref

We should resolve concerns around relying on equality deletes before going down that road, or open a new PR for a solution that does not rely on equality deletes.

Here is one thread on the mailing list from a few months ago: https://lists.apache.org/thread/96dhf3sj5pc4ql0l8yk8sxgtr78bchrd.

bryanck avatar Oct 22 '25 15:10 bryanck

@bryanck - for us, we are using this for streaming data, so equality deletes are needed. Based on another comment from April: https://github.com/apache/iceberg/pull/12070#issuecomment-2801973306 it looks like others are in the same boat.

cbuckle1 avatar Oct 22 '25 15:10 cbuckle1

@bryanck - in the linked thread, you mentioned that the flink sink is looking at a similar issue, do you have the Github issue for us to track?

cbuckle1 avatar Dec 01 '25 17:12 cbuckle1

Whoever interested to implement it : Umbrella issue is https://github.com/apache/iceberg/issues/11122 the main Flink PR is: https://github.com/apache/iceberg/pull/14197 note that it has followup PRs and references to spark implementation as well (followup changes to check).

ismailsimsek avatar Dec 01 '25 18:12 ismailsimsek

I can give it a shot, ill base it on your PR and the flink sink implementation.

t3hw avatar Dec 07 '25 09:12 t3hw