RANGER-2128: Implementation of Ranger Spark SQL plugin
pre-work
Basic concepts and introductions can be found in spark-authorizer's documentations.
additionals
https://github.com/apache/spark/pull/17724 exposed a new experimental develop api SparkSessionExetensions, which is able to add user supplied extensions to SparkSession object during instantiation via program api or the spark property named spark.sql.extensions.
This PR uses spark.sql.extensions and other necessary ranger-hive-plugin settings to enable Ranger security support for Spark SQL with hive as external catalog.
spark.sql.extensions=
org.apache.ranger.authorization.spark.authorizer.RangerSparkSQLExtension
@boscodurai ship it?
keen to see this in action! do you have any jar and setup notes for this?
@yaooqinn can u create apache review?
@tooptoop4 review request created.
Hi!
This work is very interesting! Any progress on the merge?
@yaooqinn Can you fix the conflicts?
@yaooqinn any thoughts on resolving the conflicts?
What does this PR do/accomplish that isn't already possible with the existing Hive support? We're currently running Spark Thriftserver (3.2.x) with the kyuubi plugin against Ranger where in Ranger we've defined the service as a Hive service and everything with regards to authentication and authorization seems to be working as expected.
The only thing that I've observed that doesn't work is the auto-complete when creating policies via the Ranger UI, I assume this is a slight dialect difference in the response from the Spark Thriftserver vs a real HiveServer2 since the query being run by Ranger (show databases like "*") returns the databases just fine when I run it myself.
What does this PR do/accomplish that isn't already possible with the existing Hive support? We're currently running Spark Thriftserver (3.2.x) with the kyuubi plugin against Ranger where in Ranger we've defined the service as a Hive service and everything with regards to authentication and authorization seems to be working as expected.
The only thing that I've observed that doesn't work is the auto-complete when creating policies via the Ranger UI, I assume this is a slight dialect difference in the response from the Spark Thriftserver vs a real HiveServer2 since the query being run by Ranger (
show databases like "*") returns the databases just fine when I run it myself.
Thanks @simonvanderveldt. I think than we can include Kyuubi plugin in Ranger for Spark. If you are familiar with Kyuubi than Can you please raise demo PR?
+1
will this work with spark-submit cluster mode without passing keytab?
I am going to close this in favor of the kyuubi spark authz plugin