iceberg icon indicating copy to clipboard operation
iceberg copied to clipboard

Manifest list encryption

Open ggershinsky opened this issue 2 years ago • 23 comments

ggershinsky avatar Jun 05 '23 06:06 ggershinsky

Yep, this is due to https://github.com/apache/iceberg/blob/main/format/gcm-stream-spec.md#file-length . There are options for table modification attacks if this field is not (safely) stored.

ggershinsky avatar Apr 01 '24 08:04 ggershinsky

@rdblue Thanks for the patch, I've merged it into this PR. We still need to sync on caching the unwrapped keys, I've added a commit that implements one way of doing this, will appreciate your review.

ggershinsky avatar Sep 09 '24 06:09 ggershinsky

@rdblue thanks for the caffeine-based cache for unwrapped keys, I've applied the patch.

ggershinsky avatar Sep 12 '24 10:09 ggershinsky

Regarding the caching limit - 1 minute might be too short. If a reader (or writer) refreshes a table each few minutes, there could be many KMS unwrap calls. In the new commit, I've changed this to a combined limit of 1000 entries or 1 day. Each entry is comprised of a "wrapped key" (could be up to a few hundred bytes, depending on KMS), and "unwrapped key" (a few dozen bytes), so the 1000 entries / max size is something like 0.5MB. @rdblue what do you think?

EDT: actually, we might be able to make the cache ~x10 smaller, if we use the "key id" as the cache key. Its size is a few dozen bytes.

ggershinsky avatar Sep 12 '24 11:09 ggershinsky

@ggershinsky, I'm not too concerned with the size of the cache. I'm okay with 1 day, but that seems like a long time to have unencrypted key material in memory. I'll defer to your judgement here.

rdblue avatar Sep 16 '24 18:09 rdblue

Ok. We don't have clear guidelines on key caching in memory (key copies are spread all over the process memory - cache, plug-in KMS client code, an HTTP library in the KMS client code; the Java GC - so there are no guarantees for when an uncached key is deleted from memory, if ever, before the process stops). But I agree a day could be too long. I'll change it to 1 hour - might be a reasonable trade-off between performance requirements (KMS call overhead), and safety requirements (there is a chance a key will be deleted from the memory within a business day).

ggershinsky avatar Sep 17 '24 06:09 ggershinsky

Hi @rdblue , I've built the integration code with the latest version of this patch, works ok. Can we merge this PR?

ggershinsky avatar Sep 27 '24 13:09 ggershinsky

@rdblue Did you have any more comments on this one? I can do another pass as well but I'd like to finish this up as well soon

RussellSpitzer avatar Oct 31 '24 22:10 RussellSpitzer

@rdblue @RussellSpitzer All current comments should have been addressed in this thread and in the last commit. An additional review round is always welcome, I too would like to complete this feature.

ggershinsky avatar Nov 07 '24 13:11 ggershinsky

Hi! Are there any plans to have this feature merged in 1.8.0? This would be extremely helpful

caushie-akamai avatar Dec 15 '24 20:12 caushie-akamai

+1, we have some compliance requirements that prevent us from adopting Iceberg without client-side encryption.

rshkv avatar Dec 17 '24 16:12 rshkv

^^ cc @rdblue @RussellSpitzer

ggershinsky avatar Dec 18 '24 11:12 ggershinsky

I'd also suggest to make sure we rebase because it's been a while since this was open, and we should see CI pass with the rebased changes before merging.

amogh-jahagirdar avatar Jan 17 '25 16:01 amogh-jahagirdar

I'd also suggest to make sure we rebase because it's been a while since this was open, and we should see CI pass with the rebased changes before merging.

Sure, I'll rebase this after Iceberg 1.8 is out (per the slack discussion)

ggershinsky avatar Jan 22 '25 07:01 ggershinsky

If you need help with this pull request @ggershinsky, perhaps I could help you.

gumartinm avatar Feb 13 '25 23:02 gumartinm

This PR is rebased and synced with the spec patch.

ggershinsky avatar Feb 25 '25 09:02 ggershinsky

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions.

github-actions[bot] avatar Mar 28 '25 00:03 github-actions[bot]

commenting to prevent closure. cc @rdblue

ggershinsky avatar Mar 28 '25 07:03 ggershinsky

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions.

github-actions[bot] avatar Apr 28 '25 00:04 github-actions[bot]

will be updated to accomodate #12927 changes

ggershinsky avatar Apr 29 '25 14:04 ggershinsky

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions.

github-actions[bot] avatar Jun 14 '25 00:06 github-actions[bot]

will be updated to accomodate #12927 changes

done.

ggershinsky avatar Jun 16 '25 09:06 ggershinsky

cc @rdblue @RussellSpitzer

ggershinsky avatar Jun 16 '25 09:06 ggershinsky

is it possible to also add this PR as part of the 1.10 milestone? It would be great to have this done as part of 1.10, we have multiple use cases on our company where we need to encrypt tables.

caushie-akamai avatar Jul 01 '25 15:07 caushie-akamai

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions.

github-actions[bot] avatar Aug 01 '25 00:08 github-actions[bot]

revive

ggershinsky avatar Aug 01 '25 06:08 ggershinsky

@rdblue, the last community sync sounded like the community is aiming for 1.10 to be the release that implements most of the v3 spec. Would appreciate to get your assessment on whether that extends to encryption.

We and our customers have a lot of devs that want to build with Iceberg, but can't without this additional security layer. People ask me whether it makes sense to incorporate Iceberg, but I don't know how to read the tea leaves here :)

rshkv avatar Aug 08 '25 15:08 rshkv

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions.

github-actions[bot] avatar Sep 08 '25 00:09 github-actions[bot]

@rdblue @RussellSpitzer @amogh-jahagirdar It looks like all the outstanding comments, including the depredation ones, have been addressed. Any further feedback on this PR? Also cc @stevenzwu

huaxingao avatar Sep 09 '25 22:09 huaxingao

LGTM

huaxingao avatar Sep 19 '25 17:09 huaxingao