Add support for init scripts in crawling for Azure Service Principals
#326
Background
Run a dependent job after the current jobs to capture the details from init scripts and if any matching spark config for Azure is found then append to the cluster, job and Azure SPN tables.
Add the following
- List of all Azure SPNs from all the init scripts.
- Add to existing inventory or create new inventory if necessary
related info:
- https://learn.microsoft.com/en-us/azure/databricks/init-scripts/cluster-scoped
- https://learn.microsoft.com/en-us/azure/databricks/init-scripts/referencing-files
- https://community.databricks.com/t5/data-engineering/databricks-cluster-init-scripts-on-abfss-location/td-p/7468
- https://learn.microsoft.com/en-us/azure/databricks/_extras/documents/azure-init-adls.pdf
-
az login --service-principal ... -
az storage blob download - https://stackoverflow.com/a/75877509/277035
-
/databricks/spark/conf - https://stackoverflow.com/questions/75555970/set-spark-conf-for-databricks-cluster-in-python-init-script
- https://community.databricks.com/t5/data-engineering/creating-cluster-from-adf-linked-service-with-workspace-init/td-p/3621
%sh
curl -X POST -H 'Content-Type: application/x-www-form-urlencoded' \
https://login.microsoftonline.com/<tenant id>/oauth2/v2.0/token \
-d 'client_id=<application id of the service principal>' \
-d 'grant_type=client_credentials' \
-d 'scope=2ff814a6-3304-4ab8-85cb-cd0e6f879c1d%2F.default' \
-d 'client_secret=<client secret of the service pincipal>'
Relates to https://github.com/databrickslabs/ucx/issues/413
Seems like a duplicate of #413
It is impossible to do with the resources we have
As part of https://github.com/databrickslabs/ucx/pull/326 the following are taken care of -
Scanned spark config all clusters, jobs, cluster policies, pipelines for Azure Service Principals who has access to storage and flagged Scanned cluster scoped and global init scripts for Azure Service Principals who has access to storage and flagged In this issue the following pending item is meant to be taken care of -
Create an inventory of all Azure SPNs who has access to storage from all the init scripts (cluster and global) and add it to the "azure_service_principals" table in HMS.
Related to https://github.com/databrickslabs/ucx/issues/249
we crawl principal permissions directly on storage accounts. we won't parse shell scripts, which is prohibitively expensive