ucx icon indicating copy to clipboard operation
ucx copied to clipboard

Add support for init scripts in crawling for Azure Service Principals

Open zpappa opened this issue 2 years ago • 4 comments

#326

Background

Run a dependent job after the current jobs to capture the details from init scripts and if any matching spark config for Azure is found then append to the cluster, job and Azure SPN tables.

Add the following

  • List of all Azure SPNs from all the init scripts.
  • Add to existing inventory or create new inventory if necessary

related info:

  • https://learn.microsoft.com/en-us/azure/databricks/init-scripts/cluster-scoped
  • https://learn.microsoft.com/en-us/azure/databricks/init-scripts/referencing-files
  • https://community.databricks.com/t5/data-engineering/databricks-cluster-init-scripts-on-abfss-location/td-p/7468
  • https://learn.microsoft.com/en-us/azure/databricks/_extras/documents/azure-init-adls.pdf
  • az login --service-principal ...
  • az storage blob download
  • https://stackoverflow.com/a/75877509/277035
  • /databricks/spark/conf
  • https://stackoverflow.com/questions/75555970/set-spark-conf-for-databricks-cluster-in-python-init-script
  • https://community.databricks.com/t5/data-engineering/creating-cluster-from-adf-linked-service-with-workspace-init/td-p/3621
%sh
curl -X POST -H 'Content-Type: application/x-www-form-urlencoded' \
https://login.microsoftonline.com/<tenant id>/oauth2/v2.0/token \
-d 'client_id=<application id of the service principal>' \
-d 'grant_type=client_credentials' \
-d 'scope=2ff814a6-3304-4ab8-85cb-cd0e6f879c1d%2F.default' \
-d 'client_secret=<client secret of the service pincipal>'

zpappa avatar Oct 09 '23 16:10 zpappa

Relates to https://github.com/databrickslabs/ucx/issues/413

pohlposition avatar Oct 09 '23 16:10 pohlposition

Seems like a duplicate of #413

nfx avatar Oct 10 '23 08:10 nfx

It is impossible to do with the resources we have

nfx avatar Oct 10 '23 13:10 nfx

As part of https://github.com/databrickslabs/ucx/pull/326 the following are taken care of -

Scanned spark config all clusters, jobs, cluster policies, pipelines for Azure Service Principals who has access to storage and flagged Scanned cluster scoped and global init scripts for Azure Service Principals who has access to storage and flagged In this issue the following pending item is meant to be taken care of -

Create an inventory of all Azure SPNs who has access to storage from all the init scripts (cluster and global) and add it to the "azure_service_principals" table in HMS.

Related to https://github.com/databrickslabs/ucx/issues/249

nfx avatar Oct 10 '23 13:10 nfx

we crawl principal permissions directly on storage accounts. we won't parse shell scripts, which is prohibitively expensive

nfx avatar Jul 15 '24 19:07 nfx