Databricks.API.PowerShell
Databricks.API.PowerShell copied to clipboard
PowerShell wrapper for the Databricks API
PowerShell Module for Databricks
This repository contains the source code for the PowerShell module "DatabricksPS". The module can also be found in the public PowerShell gallery: https://www.powershellgallery.com/packages/DatabricksPS/
It works for Databricks on Azure and also AWS. The APIs are almost identical so I decided to bundle them in one single module. The official API documentations can be found here:
Azure Databricks - https://docs.azuredatabricks.net/api/latest/index.html
Databricks on AWS - https://docs.databricks.com/api/latest/index.html
Release History
v1.9.9.11:
- Add new flag
-UsingAzContextforSet-DatabricksEnvironmentto derive authentication and URL from the Azure Az module
v1.9.9.10:
- Add support for Git Credentials API
v1.9.9.9:
- Make
Pin-DatabricksClusterandUnpin-DatabricksClusterreturn an object containing thecluster_idfor further piping into other cmdlets.
v1.9.9.8:
- Add parameter aliases to
Add-DatabricksClusterandUpdate-DatabricksClusterto match the names used in the cluster definition (e.g.cluster_namefor-CusterName)
v1.9.9.7:
- Add aliases for all cmdlets - e.g.
gdbrcforGet-DatabricksCluster - Fix minor issue with dictionaries/hashtables being passed as parameters
- Fix issue with encodings in combination with PowerShell Core
v1.9.9.6:
- Fix issue with removal of empty parameters in
Add-DatabricksCluster
v1.9.9.5:
- Fix issue with Repos API and pulling Tags
v1.9.9.4:
- Add support for
-CustomKeyswhen usingGet-DatabricksWorkspaceConfig - Add dedicated parameters for all known workspace configs to
Set-DatabricksWorkspaceConfig
v1.9.9.3:
- Add support for
-CustomConfigwhen usingSet-DatabricksWorkspaceConfig
v1.9.9.2:
- Add better suppot for integration with CI/CD pipelines
- Azure DevOps:
Set-DatabricksEnvironmentnow supports the new switch-UsingAzureDevOpsServiceConnectionto be used with Azure DevOps CLI Task - see Azure DevOps Integration - Databricks CLI:
Set-DatabricksEnvironmentnow supports the new switch-UsingDatabricksCLIAuthenticationto be used with any CI/CD tool and the Databricks CLI is already configured - see Databricks CLI Integration
- Azure DevOps:
v1.9.9.1:
- Add
-Timeoutparameter to SCIM APIGet-*cmdlets
v1.9.9.0:
- Add support for SQL endpoints to the
*-DatabricksPermissionscmdlets as described here SQL Endpoint Permissions.
v1.9.8.1:
- Fix issue with
Import-DatabricksEnvironmentwhere clusters were not imported correctly
v1.9.8.0:
- Add support for Token Management API
- using new
-Adminswitch
- using new
- Improve usability of Workspace Config API
- Add automated tests for Token API and Token Management API
- Add new
-Meswitch toGet-DatabricksSCIMUserto get information of the currently authenticated user
v1.9.7.0:
- Add support for Repos API
- Add support for Jobs API v2.1 via a switch
JobsAPIVersiononSet-DatabricksEnvironment - Deprecation Projects API (
Pull-DatabricksProject)
v1.9.6.2:
- Fix some documentation
v1.9.6.1:
- Fix an issue with
Get-DatabricksSQLHistoryand Windows PowerShell - Filters with
Get-DatabricksSQLHistoryare only supported with PowerShell CORE (details)
v1.9.6.0:
- Fix an issue with
Get-DatabricksSQLHistoryand also improved it - Add Common Snippets to this README.md
v1.9.5.3:
- Minor extension for
Update-DatabricksClustercmdlet - Fix verbose logging so API key is only displayed in -Debug mode
v1.9.5.1:
- Minor fixes for
Update-DatabricksClustercmdlet
v1.9.5.0:
- Added support for IP Access Lists API
v1.9.0.0:
- Add support for Permissions API
- includes pipelining for existing object (e.g. cluster-object, job-object, ...)
v1.8.1.0:
Update-DatabricksClusternow allows you to specify-ClusterIDand-ClusterObjectat the same time where the first one has priority. This can be used to update an existing cluster with the configuration of another cluster.
v1.8.0.1:
- Add additional option to export SQL objects via
Export-DatabricksEnvironment(experimental) - Add cmdlet to easily view results of Databricks command
- Fix issue with DBFS file handle datatype
v1.7.0.0:
- Added support for v1.2 APIs (Execution Context and Command Execution)
- fully supports pipelining for easy use
v1.6.2.0:
- Fix issue with Cluster cmdlets to properly support pipelineing
- Added support for Instance Pools in Clulster cmdlets
v1.6.0.0:
- Add support for Project APIs (experimental, link)
- Add Workspace Config settings
v1.5.0.0:
- Add support for SQL Analytics APIs (experimental, link)
v1.3.1.0:
- Add support for Workspace configs (get/set)
v1.3.0.0:
- Add support for Global Init Scripts
v1.2.2.0:
- Add -Entitlements parameter to Add-DatabricksSCIMGroup
- Some fixes for proper pipelining when working with Groups and SCIM APIs
- Add test-case for Security (SCIM, Groups, memberships, ...)
v1.2.1.0:
- Fix issue with Import of already existing files and folders
v1.2.0.1:
- Add support for Azure backed Secret Scopes for non-standard Azure environments like AzureChinaCloud or AzureUSGovernment
v1.2.0.0:
- Add support for AAD authentication in non-standard Azure environments like AzureChinaCloud or AzureUSGovernment
v1.1.4.0:
- Fix Secrets API when creating Azure KeyVault Backed Secret Scopes.
v1.1.3.0:
- Minor fix for Secrets API making -InitialManagePrincipal optional.
v1.1.2.0:
- Chang
-ApiRootUrlparameter to support any URL and not just a fixed list. - Add
Get-DatabricksApiRootUrlcmdlet to be able to get a list of predefined API Root URLs
v1.1.1.0:
- Add new cmdlet
Add-DatabricksClusterLocalLibraryto add a local library (.jar, .whl, ...) to a cluster with a single command
v1.0.0.0:
- Add Azure Active Directory (AAD) Authentication for Service Principals and Users
Setup and Installation
The easiest way to install the PowerShell module is to use the PowerShell built-in Install-Module cmdlet:
Install-Module -Name DatabricksPS
Alternatively you can also download this repository and copy the folder \Modules\DatabricksPS locally and install it from the local path, also using the Import-Module cmdlet:
Import-Module "C:\MyPSModules\Modules\DatabricksPS"
Usage
The module is designed to set the connection relevant properties once and they are used for all other cmdlets then. You can always update this information during your PS sessions to connect to different Databricks environments in the same session.
$accessToken = "dapi123456789e672c4007052d4694a7c51"
$apiUrl = "https://westeurope.azuredatabricks.net"
Set-DatabricksEnvironment -AccessToken $accessToken -ApiRootUrl $apiUrl
Once the environment is setup, you can use the other cmdlets:
Get-DatabricksWorkspaceItem -Path "/"
Export-DatabricksWorkspaceItem -Path "/TestNotebook1" -LocalPath "C:\TestNotebook1_Export.ipynb" -Format JUPYTER
Start-DatabricksJob -JobID 123 -NotebookParams @{myParameter = "test"}
Using pipelined cmdlets:
# stop all clusters
Get-DatabricksCluster | Stop-DatabricksCluster
# create multiple directories
"/test1","/test2" | Add-DatabricksWorkspaceDirectory
# get all run outputs for a given job
Get-DatabricksJobRun -JobID 123 | Get-DatabricksJobRunOutput
Using aliases:
For all cmdlets that use standard verbs (e.g Get-*) aliases are created. In general they follow these patterns: Standard-Verb-Alias (e.g. g for Get-, a for Add-, ...) then dbr for Databricks and last all UpperCase characters (e.g c for Cluster) of the original function converted to lower case.
So Get-DatabricksCluster becomes gdbrc, etc.
# stop all clusters
gdbrc | spdbrc
# create multiple directories
"/test1","/test2" | adbrwd
# get all run outputs for a given job
gdbrjr -JobID 123 | gdbrjro
Common snippets
Below you can find a list of common snippets that I found useful and use very frequently. All snippets use the Personal Access Token for authentication but of course also work with Azure Active Directory user and service principal authentication (see Authentication).
Stop all clusters at the end of the day
Set-DatabricksEnvironment -AccessToken "dapi123...def" -ApiRootUrl "https://westeurope.azuredatabricks.net"
Get-DatabricksCluster | Stop-DatabricksCluster
Export a whole or single parts of a Databricks workspace
Set-DatabricksEnvironment -AccessToken "dapi123...def" -ApiRootUrl "https://westeurope.azuredatabricks.net"
Export-DatabricksEnvironment -CleanLocalRootPath -LocalPath "C:\\my_export" -Artifacts @("Workspace", "Clusters", "Jobs")
Import a whole or single parts of a Databricks workspace
Set-DatabricksEnvironment -AccessToken "dapi123...def" -ApiRootUrl "https://westeurope.azuredatabricks.net"
Import-DatabricksEnvironment -LocalPath "C:\\my_export" -Artifacts @("Workspace", "Clusters", "Jobs")
Calling a not yet supported/implemented API
The Databricks API is update frequently and it is pretty hard to keep everything up-to-date. So in case an API call you are looking for is not yet supported by this moduel, you can always execute the call manually leveraging the existing authentication:
Set-DatabricksEnvironment -AccessToken "dapi123...def" -ApiRootUrl "https://westeurope.azuredatabricks.net"
$body = @{
cluster_id = "1202-211320-brick1";
num_workers = 4
}
Invoke-DatabricksApiRequest -Method "POST" -EndPoint "/2.0/clusters/resize" -Body $body
Authentication
There are various ways to authenticate against the Databricks REST API of which some are unique to Azure:
- Personal Access token
- Azure Active Directory (AAD) Username/Password (Azure only!)
- Azure Active Directory (AAD) Service Principal (Azure only!)
In additiont to those, the DatabricksPS module also integrates with other tools to derive the configuration and authentication. Currently these tools include:
- Azure DevOps Service Connections (Azure only!)
- Databricks CLI
- Azure Az PowerShell module (Azure only!)
Personal Access Token
This is the most straight forward authentication and works for both, Azure and AWS. The official documentation can be found here (Azure) or here (AWS) and is also persisted in this repository here.
$accessToken = "dapi123456789e672c4007052d4694a7c51"
$apiUrl = "https://westeurope.azuredatabricks.net"
Set-DatabricksEnvironment -AccessToken $accessToken -ApiRootUrl $apiUrl
Azure Active Directory (AAD) Username/Password
This authentication method is very similar to what you use when logging in interactively when accessing the Databricks web UI. You provide the Databricks workspace you want to connect to, the username and a password. The official documentation can be found here and is also persisted in this repository here.
$credUser = Get-Credential
$tenantId = '93519689-1234-1234-1234-e4b9f59d1963'
$subscriptionId = '30373b46-5678-5678-5678-d5560532fc32'
$resourceGroupName = 'myResourceGroup'
$workspaceName = 'myDatabricksWorkspace'
$azureResourceId = "/subscriptions/$subscriptionId/resourceGroups/$resourceGroupName/providers/Microsoft.Databricks/workspaces/$workspaceName"
$clientId = 'db00e35e-1111-2222-3333-c8cc85e6f524'
$apiUrl = "https://westeurope.azuredatabricks.net"
Set-DatabricksEnvironment -ClientID $clientId -Credential $credUser -AzureResourceID $azureResourceId -TenantID $tenantId -ApiRootUrl $apiUrl
Azure Active Directory (AAD) Service Principal
Service Principals are special accounts in Azure Active Directory which can be used for automated tasks like CI/CD pipelines. You provide the Databricks workspace you want to connect to, the ClientID and a ClientSecret/ClientKey. ClientID and ClientSecret need to be wrapped into a PSCredential where the ClientID is the usernamen and ClientSecret/ClientKey is the password. The rest is very similar to the Username/Password authentication except that you also need to specify the -ServicePrincipal flag. The official documentation can be found here and is also persisted in this repository here
$clientId = '12345678-6789-6789-6789-6e44bf2f5d11' # = Application ID
$clientSecret = '[email protected]'
$secureClientSecret = ConvertTo-SecureString $clientSecret -AsPlainText -Force
$credSP = New-Object System.Management.Automation.PSCredential($clientId, $secureClientSecret)
$tenantId = '93519689-1234-1234-1234-e4b9f59d1963'
$subscriptionId = '30373b46-5678-5678-5678-d5560532fc32'
$resourceGroupName = 'myResourceGroup'
$workspaceName = 'myDatabricksWorkspace'
$azureResourceId = "/subscriptions/$subscriptionId/resourceGroups/$resourceGroupName/providers/Microsoft.Databricks/workspaces/$workspaceName"
$apiUrl = "https://westeurope.azuredatabricks.net"
Set-DatabricksEnvironment -ClientID $clientId -Credential $credSP -AzureResourceID $azureResourceId -TenantID $tenantId -ApiRootUrl $apiUrl -ServicePrincipal
Azure DevOps Integration
If you want to use DatabricksPS module in your Azure DevOps pipelines and do not want to manage your Personal Access Tokens but leverage the Azure DevOps Service Connections instead, you can use the following YAML task defintion:
- task: AzureCLI@2
displayName: "DatabricksPS - Stop All Clusters"
inputs:
azureSubscription: "MyServiceConnection"
addSpnToEnvironment: true
scriptType: ps
scriptLocation: inlineScript
arguments: '$(DATABRICKS_URL) $(AzURE_RESOURCE_ID)'
inlineScript: |
Set-DatabricksEnvironment -ApiRootUrl $1 -AzureResourceID $2 -UsingAzureDevOpsServiceConnection
Get-DatabricksCluster | Stop-DatabricksCluster
azurePowerShellVersion: latestVersion
The important part is to use AzureCLI which allows you to choose a Azure DevOps Service Connection and persist the authentication information as temporary environment variables by using addSpnToEnvironment: true. Unfortunatelly this is currently not possible using AzurePowerShell.
Databricks CLI Integration
The Databricks CLI Integration relies on the Databricks CLI being installed and configured on your agent/machine already. It basically requires the two environment variables DATABRICKS_HOST and DATABRICKS_TOKEN to be set and only works with Personal Access Tokens. If those two environment variables are set, you can use the following code in your PowerShell task to e.g. stop all available clusters:
Set-DatabricksEnvironment -UsingDatabricksCLIAuthentication
Get-DatabricksCluster | Stop-DatabricksCluster
Azure Az module Integration
In the context of Azure, the Azure Az PowerShell module is the core of most solutions. To use the authentication provided by the Az module, you can simply use the switch -UsingAzContext and the -AzureResourceID and the DatabricksPS module will take care of the rest:
# Connect to Azure using the Az module
Connect-AzAccount
$subscriptionId = '30373b46-5678-5678-5678-d5560532fc32'
$resourceGroupName = 'myResourceGroup'
$workspaceName = 'myDatabricksWorkspace'
$azureResourceId = "/subscriptions/$subscriptionId/resourceGroups/$resourceGroupName/providers/Microsoft.Databricks/workspaces/$workspaceName"
Set-DatabricksEnvironment -UsingAzContext -AzureResourceID $azureResourceId
Supported APIs and endpoints
The goal of the Databricks PS modules is to supports all available Databricks REST API endpoints. However, as the APIs are constantly evolving, some newer ones might not be implemented yet. If you are missing a recently added endpoint, please open a ticket in this repo and I will add it as soon as possible!