airflow icon indicating copy to clipboard operation
airflow copied to clipboard

Switch AzureDataLakeStorageV2Hook to use ManagedIdentityCredential for managed identity/workload auth

Open TJaniF opened this issue 1 year ago • 2 comments

When testing the AzureDataLakeStorageV2Hook with a managed identity authentication @melugoyal got the following error:

[2024-03-26, 04:32:41 UTC] {managed_identity.py:80} INFO - ManagedIdentityCredential will use workload identity
[2024-03-26, 04:32:41 UTC] {adls.py:167} INFO - account_url: <our account url>
[2024-03-26, 04:32:41 UTC] {adls.py:206} INFO - Error while attempting to get file system 'testcontainer': Unsupported credential: <class 'airflow.providers.microsoft.azure.utils.AzureIdentityCredentialAdapter'>
[2024-03-26, 04:32:46 UTC] {taskinstance.py:2731} ERROR - Task failed with exception
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/airflow/models/taskinstance.py", line 444, in _execute_task
    result = _execute_callable(context=context, **execute_callable_kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/airflow/models/taskinstance.py", line 414, in _execute_callable
    return execute_callable(context=context, **execute_callable_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/airflow/include/azure_operators/adls.py", line 408, in execute
    return hook.create_file(file_system_name=self.file_system_name, file_name=self.file_name).upload_data(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/airflow/include/azure_operators/adls.py", line 249, in create_file
    file_client = self.get_file_system(file_system_name).create_file(file_name)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/airflow/include/azure_operators/adls.py", line 200, in get_file_system
    file_system_client = self.service_client.get_file_system_client(file_system=file_system)
                         ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/functools.py", line 1001, in __get__
    val = self.func(instance)
          ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/airflow/include/azure_operators/adls.py", line 132, in service_client
    return self.get_conn()
           ^^^^^^^^^^^^^^^
  File "/usr/local/airflow/include/azure_operators/adls.py", line 169, in get_conn
    return DataLakeServiceClient(
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/azure/storage/filedatalake/_data_lake_service_client.py", line 96, in __init__
    self._blob_service_client = BlobServiceClient(blob_account_url, credential, **kwargs)
                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/azure/storage/blob/_blob_service_client.py", line 139, in __init__
    super(BlobServiceClient, self).__init__(parsed_url, service='blob', credential=credential, **kwargs)
  File "/usr/local/lib/python3.11/site-packages/azure/storage/blob/_shared/base_client.py", line 110, in __init__
    self._config, self._pipeline = self._create_pipeline(self.credential, sdk_moniker=self._sdk_moniker, **kwargs)
                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/azure/storage/blob/_shared/base_client.py", line 234, in _create_pipeline
    raise TypeError(f"Unsupported credential: {type(credential)}")
TypeError: Unsupported credential: <class 'airflow.providers.microsoft.azure.utils.AzureIdentityCredentialAdapter'>

It seems like AzureIdentityCredentialAdapter is not accepted by DataLakeServiceClient (potentially relevant Azure SDK line)

This PR worked for our workload identity auth. :)


^ Add meaningful description above Read the Pull Request Guidelines for more information. In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed. In case of a new dependency, check compliance with the ASF 3rd Party License Policy. In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

TJaniF avatar Mar 26 '24 13:03 TJaniF

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.

github-actions[bot] avatar May 12 '24 00:05 github-actions[bot]

@Lee-W Thank you! Made the change you suggested and added a test, sorry it took so long! 🙂

I used the changed hook in a deployment with managed workload identity set up and got a successful task with:

[2024-05-16, 19:02:30 UTC] {managed_identity.py:80} INFO - ManagedIdentityCredential will use workload identity

On the testing side, I hope this works, I don't know that much about azure/ workload identity so I hope I am testing the right configuration. 😅

TJaniF avatar May 16 '24 20:05 TJaniF