azure-sdk-for-python icon indicating copy to clipboard operation
azure-sdk-for-python copied to clipboard

[engsys] Global Sanitizers inconsistently sanitize storage account names, recordings unreplayable

Open kdestin opened this issue 1 year ago • 0 comments

Describe the bug

https://github.com/Azure/azure-sdk-for-python/pull/35196 introduced a collection of "global" sanitizers that scrub secrets from recordings as they are written to disk.

I'm currently writing a test, where the code path involves:

  1. Fetching details about a storage account

  2. Usage those details to build the uri for the next request

This sanitizer will redact the storage account name in the recording from the response in Step 1.

https://github.com/Azure/azure-sdk-for-python/blob/511aef315bf6919f52c90adb1803a3b9079cbb05/tools/azure-sdk-tools/devtools_testutils/proxy_startup.py#L379

There is no "global" sanitizer that sanitizes storage account names from request urls.

This leaves my recording un-replayable.

In recording mode, the code receives the sanitized request and tries to send a subsequent request to a URL it builds with the sanitized values: https://sanitized.blob.core.windows.net. But the recording stored an unsanitized URL for that subsequent request, https://account-name.blob.core.windows.net, so the proxy is unable to find a match.

To Reproduce Steps to reproduce the behavior:

  1. Succesfully record a test in live mode that:

    1. Fetches some response with details about a storage account
    // Example response
    {
            "id": "/subscriptions/00000000-0000-0000-0000-000000000/resourceGroups/00000/providers/Microsoft.MachineLearningServices/workspaces/00000/datastores/workspaceblobstore",
            "name": "workspaceblobstore",
            "type": "Microsoft.MachineLearningServices/workspaces/datastores",
            "properties": {
              ...,
              "subscriptionId": "00000000-0000-0000-0000-000000000",
              "resourceGroup": "resource-group",
              "datastoreType": "AzureBlob",
              "accountName": "account-name",
              "containerName": "d49eda6a-ab96-4d00-b108-33768a3d0aee-azureml-blobstore",
              "endpoint": "core.windows.net",
              "protocol": "https",
              "serviceDataAccessAuthIdentity": "WorkspaceSystemAssignedIdentity"
            },
            "systemData": {
                ...
            }
    
    }
    
    1. Uses that response to build the URL for a subsequent request

    https://account-name.blob.core.windows.net/d49eda6a-ab96-4d00-b108-33768a3d0aee-azureml-blobstore/path/to/files

  2. Attempt to re-run the test in recording mode

Expected behavior

The test should run off the recording, and pass

Actual behavior

The test fails

ERROR    root:proxy_fixtures.py:312 

-----Test proxy playback error:-----

Unable to find a record for the request PUT https://sanitized.blob.core.windows.net/d49eda6a-ab96-4d00-b108-33768a3d0aee-azureml-blobstore/LocalUpload/0e7abff4dcb2ddd489d3e72fa2039bf6/README.md?sv=2021-10-04&si=azureml-system-datastore-policy&sr=c&sig=Sanitized
Method doesn't match, request <PUT> record <HEAD>
Uri doesn't match:
    request <https://sanitized.blob.core.windows.net/d49eda6a-ab96-4d00-b108-33768a3d0aee-azureml-blobstore/LocalUpload/0e7abff4dcb2ddd489d3e72fa2039bf6/README.md?sv=2021-10-04&si=azureml-system-datastore-policy&sr=c&sig=Sanitized>
    record  <https://account-name.blob.core.windows.net/d49eda6a-ab96-4d00-b108-33768a3d0aee-azureml-blobstore/LocalUpload/0e7abff4dcb2ddd489d3e72fa2039bf6/README.md?sv=2021-10-04&si=azureml-system-datastore-policy&sr=c&sig=Sanitized>

Screenshots If applicable, add screenshots to help explain your problem.

Additional context Add any other context about the problem here.

kdestin avatar May 01 '24 22:05 kdestin