[Bug] Kyuubi Spark authorization plugin with Iceberg tables on Iceberg snapshot retrieval Permission denied
Code of Conduct
- [X] I agree to follow this project's Code of Conduct
Search before asking
- [X] I have searched in the issues and found no similar issues.
Describe the bug
When using Ranger hive as source for Kyuubi Spark authorization plugin with Iceberg tables we're getting "Permission denied" on Iceberg snapshot ID data retrieval, like in the example below: "select * from iceberg.test.customers.snapshot_id_7801393477815178085",although in Ranger corresponding account has select and read rights on the test database, we are getting the following error An error was encountered:
An error occurred while calling o165.toJavaRDD.
: org.apache.kyuubi.plugin.spark.authz.AccessControlException: Permission denied: user [svc_df_big-st] does not have [select] privilege on [test.customers/snapshot_id_7801393477815178085/id]
at org.apache.kyuubi.plugin.spark.authz.ranger.SparkRangerAdminPlugin$.verify(SparkRangerAdminPlugin.scala:172)
at org.apache.kyuubi.plugin.spark.authz.ranger.RuleAuthorization$.$anonfun$checkPrivileges$5(RuleAuthorization.scala:93)
at org.apache.kyuubi.plugin.spark.authz.ranger.RuleAuthorization$.$anonfun$checkPrivileges$5$adapted(RuleAuthorization.scala:92)
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at org.apache.kyuubi.plugin.spark.authz.ranger.RuleAuthorization$.org$apache$kyuubi$plugin$spark$authz$ranger$RuleAuthorization$$checkPrivileges(RuleAuthorization.scala:92)
at org.apache.kyuubi.plugin.spark.authz.ranger.RuleAuthorization.apply(RuleAuthorization.scala:37)
at org.apache.kyuubi.plugin.spark.authz.ranger.RuleAuthorization.apply(RuleAuthorization.scala:33)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$2(RuleExecutor.scala:211)
at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
at scala.collection.immutable.List.foldLeft(List.scala:91)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1(RuleExecutor.scala:208)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1$adapted(RuleExecutor.scala:200)
at scala.collection.immutable.List.foreach(List.scala:431)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:200)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$executeAndTrack$1(RuleExecutor.scala:179)
at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:88)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.executeAndTrack(RuleExecutor.scala:179)
at org.apache.spark.sql.execution.QueryExecution.$anonfun$optimizedPlan$1(QueryExecution.scala:125)
at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:183)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:183)
at org.apache.spark.sql.execution.QueryExecution.optimizedPlan$lzycompute(QueryExecution.scala:121)
at org.apache.spark.sql.execution.QueryExecution.optimizedPlan(QueryExecution.scala:117)
at org.apache.spark.sql.execution.QueryExecution.assertOptimized(QueryExecution.scala:135)
at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:153)
at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:150)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:172)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:171)
at org.apache.spark.sql.Dataset.rdd$lzycompute(Dataset.scala:3247)
at org.apache.spark.sql.Dataset.rdd(Dataset.scala:3245)
at org.apache.spark.sql.Dataset.toJavaRDD(Dataset.scala:3257)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
Traceback (most recent call last):
File "/srv/ssd1/yarn/nm/usercache/svc_df_big-st/appcache/application_1701151368547_109009/container_e381_1701151368547_109009_01_000001/pyspark.zip/pyspark/sql/dataframe.py", line 117, in toJSON
return RDD(rdd.toJavaRDD(), self._sc, UTF8Deserializer(use_unicode))
File "/srv/ssd1/yarn/nm/usercache/svc_df_big-st/appcache/application_1701151368547_109009/container_e381_1701151368547_109009_01_000001/py4j-0.10.9.5-src.zip/py4j/java_gateway.py", line 1322, in __call__
answer, self.gateway_client, self.target_id, self.name)
File "/srv/ssd1/yarn/nm/usercache/svc_df_big-st/appcache/application_1701151368547_109009/container_e381_1701151368547_109009_01_000001/pyspark.zip/pyspark/sql/utils.py", line 111, in deco
return f(*a, **kw)
File "/srv/ssd1/yarn/nm/usercache/svc_df_big-st/appcache/application_1701151368547_109009/container_e381_1701151368547_109009_01_000001/py4j-0.10.9.5-src.zip/py4j/protocol.py", line 328, in get_return_value
format(target_id, ".", name), value)
py4j.protocol.Py4JJavaError: An error occurred while calling o165.toJavaRDD.
: org.apache.kyuubi.plugin.spark.authz.AccessControlException: Permission denied: user [svc_df_big-st] does not have [select] privilege on [test.customers/snapshot_id_7801393477815178085/id]
at org.apache.kyuubi.plugin.spark.authz.ranger.SparkRangerAdminPlugin$.verify(SparkRangerAdminPlugin.scala:172)
at org.apache.kyuubi.plugin.spark.authz.ranger.RuleAuthorization$.$anonfun$checkPrivileges$5(RuleAuthorization.scala:93)
at org.apache.kyuubi.plugin.spark.authz.ranger.RuleAuthorization$.$anonfun$checkPrivileges$5$adapted(RuleAuthorization.scala:92)
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at org.apache.kyuubi.plugin.spark.authz.ranger.RuleAuthorization$.org$apache$kyuubi$plugin$spark$authz$ranger$RuleAuthorization$$checkPrivileges(RuleAuthorization.scala:92)
at org.apache.kyuubi.plugin.spark.authz.ranger.RuleAuthorization.apply(RuleAuthorization.scala:37)
at org.apache.kyuubi.plugin.spark.authz.ranger.RuleAuthorization.apply(RuleAuthorization.scala:33)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$2(RuleExecutor.scala:211)
at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
at scala.collection.immutable.List.foldLeft(List.scala:91)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1(RuleExecutor.scala:208)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1$adapted(RuleExecutor.scala:200)
at scala.collection.immutable.List.foreach(List.scala:431)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:200)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$executeAndTrack$1(RuleExecutor.scala:179)
at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:88)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.executeAndTrack(RuleExecutor.scala:179)
at org.apache.spark.sql.execution.QueryExecution.$anonfun$optimizedPlan$1(QueryExecution.scala:125)
at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:183)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:183)
at org.apache.spark.sql.execution.QueryExecution.optimizedPlan$lzycompute(QueryExecution.scala:121)
at org.apache.spark.sql.execution.QueryExecution.optimizedPlan(QueryExecution.scala:117)
at org.apache.spark.sql.execution.QueryExecution.assertOptimized(QueryExecution.scala:135)
at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:153)
at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:150)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:172)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:171)
at org.apache.spark.sql.Dataset.rdd$lzycompute(Dataset.scala:3247)
at org.apache.spark.sql.Dataset.rdd(Dataset.scala:3245)
at org.apache.spark.sql.Dataset.toJavaRDD(Dataset.scala:3257)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
However if the test account is granted Hive access to read all databases there's no permission issue, however the * databases read access should not be normally necessary for this access to be allowed. Is there a Kyuubi Spark plugin authorization bug preventing this? The patch at https://github.com/apache/kyuubi/pull/3931/files doesn't seem to cover this scenario.
Thanks
Affects Version(s)
1.8.0
Kyuubi Server Log Output
No response
Kyuubi Engine Log Output
No response
Kyuubi Server Configurations
No response
Kyuubi Engine Configurations
No response
Additional context
We are using Spark Kyuubi Authorization Plugin with Spark 3.2 and Iceberg 1.0.0.1.3.1 as described here: https://kyuubi.readthedocs.io/en/master/security/authorization/spark/install.html
Are you willing to submit PR?
- [ ] Yes. I would be willing to submit a PR with guidance from the Kyuubi community to fix.
- [ ] No. I cannot submit a PR at this time.
Can you provide the plan details?
Hello, Please let me know if more details are needed, this is also after applying patch #5248 https://github.com/apache/kyuubi/commit/724ae93989e7e64f858b5c621ef28e9b17e45f99 |== Physical Plan ==\n*(1) Project [id#32, name#33, age#34, address#35, cloth#36]\n+- BatchScan[id#32, name#33, age#34, address#35, cloth#36] iceberg.gns_test.customers [filters=] RuntimeFilters: []\n\n|
which appears to alleviate the access issue for iceberg.test.customers.snapshot_id_X, but introduces another issues where the metadata info like snapshots,history is freely accessible without any ranger security checks.
Thanks a lot
thanks @elisabetao, we need the full plan