Switch AzureDataLakeStorageV2Hook to use ManagedIdentityCredential for managed identity/workload auth
When testing the AzureDataLakeStorageV2Hook with a managed identity authentication @melugoyal got the following error:
[2024-03-26, 04:32:41 UTC] {managed_identity.py:80} INFO - ManagedIdentityCredential will use workload identity
[2024-03-26, 04:32:41 UTC] {adls.py:167} INFO - account_url: <our account url>
[2024-03-26, 04:32:41 UTC] {adls.py:206} INFO - Error while attempting to get file system 'testcontainer': Unsupported credential: <class 'airflow.providers.microsoft.azure.utils.AzureIdentityCredentialAdapter'>
[2024-03-26, 04:32:46 UTC] {taskinstance.py:2731} ERROR - Task failed with exception
Traceback (most recent call last):
File "/usr/local/lib/python3.11/site-packages/airflow/models/taskinstance.py", line 444, in _execute_task
result = _execute_callable(context=context, **execute_callable_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/airflow/models/taskinstance.py", line 414, in _execute_callable
return execute_callable(context=context, **execute_callable_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/airflow/include/azure_operators/adls.py", line 408, in execute
return hook.create_file(file_system_name=self.file_system_name, file_name=self.file_name).upload_data(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/airflow/include/azure_operators/adls.py", line 249, in create_file
file_client = self.get_file_system(file_system_name).create_file(file_name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/airflow/include/azure_operators/adls.py", line 200, in get_file_system
file_system_client = self.service_client.get_file_system_client(file_system=file_system)
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/functools.py", line 1001, in __get__
val = self.func(instance)
^^^^^^^^^^^^^^^^^^^
File "/usr/local/airflow/include/azure_operators/adls.py", line 132, in service_client
return self.get_conn()
^^^^^^^^^^^^^^^
File "/usr/local/airflow/include/azure_operators/adls.py", line 169, in get_conn
return DataLakeServiceClient(
^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/azure/storage/filedatalake/_data_lake_service_client.py", line 96, in __init__
self._blob_service_client = BlobServiceClient(blob_account_url, credential, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/azure/storage/blob/_blob_service_client.py", line 139, in __init__
super(BlobServiceClient, self).__init__(parsed_url, service='blob', credential=credential, **kwargs)
File "/usr/local/lib/python3.11/site-packages/azure/storage/blob/_shared/base_client.py", line 110, in __init__
self._config, self._pipeline = self._create_pipeline(self.credential, sdk_moniker=self._sdk_moniker, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/azure/storage/blob/_shared/base_client.py", line 234, in _create_pipeline
raise TypeError(f"Unsupported credential: {type(credential)}")
TypeError: Unsupported credential: <class 'airflow.providers.microsoft.azure.utils.AzureIdentityCredentialAdapter'>
It seems like AzureIdentityCredentialAdapter is not accepted by DataLakeServiceClient (potentially relevant Azure SDK line)
This PR worked for our workload identity auth. :)
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.
@Lee-W Thank you! Made the change you suggested and added a test, sorry it took so long! 🙂
I used the changed hook in a deployment with managed workload identity set up and got a successful task with:
[2024-05-16, 19:02:30 UTC] {managed_identity.py:80} INFO - ManagedIdentityCredential will use workload identity
On the testing side, I hope this works, I don't know that much about azure/ workload identity so I hope I am testing the right configuration. 😅