Encryption for REST catalog
This PR implements client-side support for REST catalog encryption. With it, clients interacting with a REST catalog can read and write encrypted data.
It is similar to https://github.com/apache/iceberg/pull/13066, that integrates encryption with the Hive catalog.
cc @rdblue @RussellSpitzer @ggershinsky
This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions.
(Bump to remove staleness)
This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions.
(Bump to remove staleness)
This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions.
(Bump to remove staleness)
Given #7770 is merged, curious for thoughts on this PR. REST integration sounds on the cards. Also happy for this PR to be superseded if other folks have worked on it.
cc @huaxingao @RussellSpitzer @ggershinsky @rdblue
Could you elaborate on the api additions? I think it would help to have some more description on the general direction of this or
@smaheshwar-pltr Could you please resolve the conflicts?
@huaxingao @smaheshwar-pltr Our team has a person who works on encryption with the REST catalog. If @smaheshwar-pltr does not object, we can follow up on this patch.
Also, it would be good to refactor (if possible) a code common to this PR and to https://github.com/apache/iceberg/pull/13066 , so that other catalogs will be able to re-use it.
@huaxingao @smaheshwar-pltr Our team has a person who works on encryption with the REST catalog. If @smaheshwar-pltr does not object, we can follow up on this patch.
Sorry for lack of responses, was out of office for a bit.
@ggershinsky, I'm happy to continue with this PR unless the approaches significantly differ, in which case happy to also hand it off or accept a review here! 🙏
Also, it would be good to refactor (if possible) a code common to this PR and to https://github.com/apache/iceberg/pull/13066 , so that other catalogs will be able to re-use it.
@ggershinsky, makes sense - though I struggled slightly; I think your comment here https://github.com/apache/iceberg/pull/13066#discussion_r2385678105 sums it up well
@smaheshwar-pltr Could you please resolve the conflicts?
@huaxingao thanks for reminding, done! I think this PR is ready for an initial pass 🙏
@huaxingao @smaheshwar-pltr Our team has a person who works on encryption with the REST catalog. If @smaheshwar-pltr does not object, we can follow up on this patch.
Sorry for lack of responses, was out of office for a bit.
@ggershinsky, I'm happy to continue with this PR unless the approaches significantly differ, in which case happy to also hand it off or accept a review here! 🙏
Sure, no problem. I think the approach is similar.
I wonder if https://github.com/apache/iceberg/pull/14465 could affect safety of encrypted tables, and if yes, how this can be handled..
I wonder if #14465 could affect safety of encrypted tables, and if yes, how this can be handled..
IIUC, the encryption key must be accessible to the query engine in order for it to decrypt and encrypt the data. Could you please help me understand why allowing a custom builder would raise security concerns?
From my perspective, even without that PR, users could still copy the code and replace the default TableOperations implementation. The PR just simply introduces a generic interface to make such customization cleaner and more maintainable.
Thanks @XJDKC , it's likely just a matter of documenting the new interface to make sure the users are aware of the security aspects of the REST TO (if they plan to use table encryption).
why allowing a custom builder would raise security concerns?
Maybe its ok, but we need to check the risk for metadata integrity (if broken, can be used for data leaks and other attacks), as discussed in this PR comments - making sure the client gets the metadata from the REST server, and not from the metadata.json file. I'll have a look at the 14465 details to see if there are other security implications.
Thanks @XJDKC , it's likely just a matter of documenting the new interface to make sure the users are aware of the security aspects of the REST TO (if they plan to use table encryption).
Maybe its ok, but we need to check the risk for metadata integrity (if broken, can be used for data leaks and other attacks), as discussed in this PR comments - making sure the client gets the metadata from the REST server, and not from the metadata.json file.
Thanks both for the discussions here.
I don't see anything concerning with the motivation behind the other PR. If a a REST client wanted to read from the metadata JSON in storage, they can do so regardless after calling loadTable - so enabling custom operations builders doesn't enable but facilitates custom client behaviour.
As such, I suspect the documentation should be at the spec level to advise clients if required, possibly depending on the corresponding REST spec discussions / conclusions (https://github.com/apache/iceberg/pull/14486 / https://lists.apache.org/thread/0nn11o4xf1nmw68d4px33sxw5tzzmgbo).
I'm also fine with the motivation. Regarding the support for encrypted tables, I have two concerns,
- Unrelated to REST. What if the
encryption.key-idtable property is set (https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/TableProperties.java#L391) but a custom TO implementation ignores it. Do the users expect a table to be encrypted if theencryption.key-idis set? Should the implementors of custom TOs validate them by running the Iceberg unitests (incTestTableEncryption)? - Related to REST. The standard RESTableIOperations, built in Iceberg, is verified (in this PR) to be safe wrt metadata access. A custom TO replacement can behave differently of course.
I think both concerns can be addressed by adding a few lines to the RESTOperationsBuilder javadoc API comments (not to the REST spec, because the first point is unrelated, and the second is related only to client impl, not the server impl)
@smaheshwar-pltr I have merged the encryption clean up PR. Could you please rebase?