iceberg icon indicating copy to clipboard operation
iceberg copied to clipboard

Encryption for REST catalog

Open smaheshwar-pltr opened this issue 8 months ago • 21 comments

This PR implements client-side support for REST catalog encryption. With it, clients interacting with a REST catalog can read and write encrypted data.

It is similar to https://github.com/apache/iceberg/pull/13066, that integrates encryption with the Hive catalog.

cc @rdblue @RussellSpitzer @ggershinsky

smaheshwar-pltr avatar Jun 03 '25 21:06 smaheshwar-pltr

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions.

github-actions[bot] avatar Jul 07 '25 00:07 github-actions[bot]

(Bump to remove staleness)

smaheshwar-pltr avatar Jul 07 '25 12:07 smaheshwar-pltr

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions.

github-actions[bot] avatar Aug 07 '25 00:08 github-actions[bot]

(Bump to remove staleness)

smaheshwar-pltr avatar Aug 07 '25 13:08 smaheshwar-pltr

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions.

github-actions[bot] avatar Sep 08 '25 00:09 github-actions[bot]

(Bump to remove staleness)

smaheshwar-pltr avatar Sep 11 '25 17:09 smaheshwar-pltr

Given #7770 is merged, curious for thoughts on this PR. REST integration sounds on the cards. Also happy for this PR to be superseded if other folks have worked on it.

cc @huaxingao @RussellSpitzer @ggershinsky @rdblue

smaheshwar-pltr avatar Sep 26 '25 13:09 smaheshwar-pltr

Could you elaborate on the api additions? I think it would help to have some more description on the general direction of this or

RussellSpitzer avatar Sep 27 '25 00:09 RussellSpitzer

@smaheshwar-pltr Could you please resolve the conflicts?

huaxingao avatar Oct 25 '25 01:10 huaxingao

@huaxingao @smaheshwar-pltr Our team has a person who works on encryption with the REST catalog. If @smaheshwar-pltr does not object, we can follow up on this patch.

ggershinsky avatar Oct 27 '25 13:10 ggershinsky

Also, it would be good to refactor (if possible) a code common to this PR and to https://github.com/apache/iceberg/pull/13066 , so that other catalogs will be able to re-use it.

ggershinsky avatar Oct 27 '25 14:10 ggershinsky

@huaxingao @smaheshwar-pltr Our team has a person who works on encryption with the REST catalog. If @smaheshwar-pltr does not object, we can follow up on this patch.

Sorry for lack of responses, was out of office for a bit.

@ggershinsky, I'm happy to continue with this PR unless the approaches significantly differ, in which case happy to also hand it off or accept a review here! 🙏

smaheshwar-pltr avatar Oct 27 '25 21:10 smaheshwar-pltr

Also, it would be good to refactor (if possible) a code common to this PR and to https://github.com/apache/iceberg/pull/13066 , so that other catalogs will be able to re-use it.

@ggershinsky, makes sense - though I struggled slightly; I think your comment here https://github.com/apache/iceberg/pull/13066#discussion_r2385678105 sums it up well

smaheshwar-pltr avatar Oct 27 '25 21:10 smaheshwar-pltr

@smaheshwar-pltr Could you please resolve the conflicts?

@huaxingao thanks for reminding, done! I think this PR is ready for an initial pass 🙏

smaheshwar-pltr avatar Oct 27 '25 21:10 smaheshwar-pltr

@huaxingao @smaheshwar-pltr Our team has a person who works on encryption with the REST catalog. If @smaheshwar-pltr does not object, we can follow up on this patch.

Sorry for lack of responses, was out of office for a bit.

@ggershinsky, I'm happy to continue with this PR unless the approaches significantly differ, in which case happy to also hand it off or accept a review here! 🙏

Sure, no problem. I think the approach is similar.

ggershinsky avatar Oct 28 '25 09:10 ggershinsky

I wonder if https://github.com/apache/iceberg/pull/14465 could affect safety of encrypted tables, and if yes, how this can be handled..

ggershinsky avatar Nov 06 '25 09:11 ggershinsky

I wonder if #14465 could affect safety of encrypted tables, and if yes, how this can be handled..

IIUC, the encryption key must be accessible to the query engine in order for it to decrypt and encrypt the data. Could you please help me understand why allowing a custom builder would raise security concerns?

From my perspective, even without that PR, users could still copy the code and replace the default TableOperations implementation. The PR just simply introduces a generic interface to make such customization cleaner and more maintainable.

XJDKC avatar Nov 06 '25 19:11 XJDKC

Thanks @XJDKC , it's likely just a matter of documenting the new interface to make sure the users are aware of the security aspects of the REST TO (if they plan to use table encryption).

why allowing a custom builder would raise security concerns?

Maybe its ok, but we need to check the risk for metadata integrity (if broken, can be used for data leaks and other attacks), as discussed in this PR comments - making sure the client gets the metadata from the REST server, and not from the metadata.json file. I'll have a look at the 14465 details to see if there are other security implications.

ggershinsky avatar Nov 06 '25 20:11 ggershinsky

Thanks @XJDKC , it's likely just a matter of documenting the new interface to make sure the users are aware of the security aspects of the REST TO (if they plan to use table encryption).

Maybe its ok, but we need to check the risk for metadata integrity (if broken, can be used for data leaks and other attacks), as discussed in this PR comments - making sure the client gets the metadata from the REST server, and not from the metadata.json file.

Thanks both for the discussions here.

I don't see anything concerning with the motivation behind the other PR. If a a REST client wanted to read from the metadata JSON in storage, they can do so regardless after calling loadTable - so enabling custom operations builders doesn't enable but facilitates custom client behaviour.

As such, I suspect the documentation should be at the spec level to advise clients if required, possibly depending on the corresponding REST spec discussions / conclusions (https://github.com/apache/iceberg/pull/14486 / https://lists.apache.org/thread/0nn11o4xf1nmw68d4px33sxw5tzzmgbo).

smaheshwar-pltr avatar Nov 07 '25 23:11 smaheshwar-pltr

I'm also fine with the motivation. Regarding the support for encrypted tables, I have two concerns,

  1. Unrelated to REST. What if the encryption.key-id table property is set (https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/TableProperties.java#L391) but a custom TO implementation ignores it. Do the users expect a table to be encrypted if the encryption.key-id is set? Should the implementors of custom TOs validate them by running the Iceberg unitests (inc TestTableEncryption)?
  2. Related to REST. The standard RESTableIOperations, built in Iceberg, is verified (in this PR) to be safe wrt metadata access. A custom TO replacement can behave differently of course.

I think both concerns can be addressed by adding a few lines to the RESTOperationsBuilder javadoc API comments (not to the REST spec, because the first point is unrelated, and the second is related only to client impl, not the server impl)

ggershinsky avatar Nov 09 '25 09:11 ggershinsky

@smaheshwar-pltr I have merged the encryption clean up PR. Could you please rebase?

huaxingao avatar Nov 19 '25 17:11 huaxingao