Manifest list encryption
Yep, this is due to https://github.com/apache/iceberg/blob/main/format/gcm-stream-spec.md#file-length . There are options for table modification attacks if this field is not (safely) stored.
@rdblue Thanks for the patch, I've merged it into this PR. We still need to sync on caching the unwrapped keys, I've added a commit that implements one way of doing this, will appreciate your review.
@rdblue thanks for the caffeine-based cache for unwrapped keys, I've applied the patch.
Regarding the caching limit - 1 minute might be too short. If a reader (or writer) refreshes a table each few minutes, there could be many KMS unwrap calls. In the new commit, I've changed this to a combined limit of 1000 entries or 1 day. Each entry is comprised of a "wrapped key" (could be up to a few hundred bytes, depending on KMS), and "unwrapped key" (a few dozen bytes), so the 1000 entries / max size is something like 0.5MB. @rdblue what do you think?
EDT: actually, we might be able to make the cache ~x10 smaller, if we use the "key id" as the cache key. Its size is a few dozen bytes.
@ggershinsky, I'm not too concerned with the size of the cache. I'm okay with 1 day, but that seems like a long time to have unencrypted key material in memory. I'll defer to your judgement here.
Ok. We don't have clear guidelines on key caching in memory (key copies are spread all over the process memory - cache, plug-in KMS client code, an HTTP library in the KMS client code; the Java GC - so there are no guarantees for when an uncached key is deleted from memory, if ever, before the process stops). But I agree a day could be too long. I'll change it to 1 hour - might be a reasonable trade-off between performance requirements (KMS call overhead), and safety requirements (there is a chance a key will be deleted from the memory within a business day).
Hi @rdblue , I've built the integration code with the latest version of this patch, works ok. Can we merge this PR?
@rdblue Did you have any more comments on this one? I can do another pass as well but I'd like to finish this up as well soon
@rdblue @RussellSpitzer All current comments should have been addressed in this thread and in the last commit. An additional review round is always welcome, I too would like to complete this feature.
Hi! Are there any plans to have this feature merged in 1.8.0? This would be extremely helpful
+1, we have some compliance requirements that prevent us from adopting Iceberg without client-side encryption.
^^ cc @rdblue @RussellSpitzer
I'd also suggest to make sure we rebase because it's been a while since this was open, and we should see CI pass with the rebased changes before merging.
I'd also suggest to make sure we rebase because it's been a while since this was open, and we should see CI pass with the rebased changes before merging.
Sure, I'll rebase this after Iceberg 1.8 is out (per the slack discussion)
If you need help with this pull request @ggershinsky, perhaps I could help you.
This PR is rebased and synced with the spec patch.
This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions.
commenting to prevent closure. cc @rdblue
This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions.
will be updated to accomodate #12927 changes
This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions.
will be updated to accomodate #12927 changes
done.
cc @rdblue @RussellSpitzer
is it possible to also add this PR as part of the 1.10 milestone? It would be great to have this done as part of 1.10, we have multiple use cases on our company where we need to encrypt tables.
This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions.
revive
@rdblue, the last community sync sounded like the community is aiming for 1.10 to be the release that implements most of the v3 spec. Would appreciate to get your assessment on whether that extends to encryption.
We and our customers have a lot of devs that want to build with Iceberg, but can't without this additional security layer. People ask me whether it makes sense to incorporate Iceberg, but I don't know how to read the tea leaves here :)
This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions.
@rdblue @RussellSpitzer @amogh-jahagirdar It looks like all the outstanding comments, including the depredation ones, have been addressed. Any further feedback on this PR? Also cc @stevenzwu
LGTM