azure-cli icon indicating copy to clipboard operation
azure-cli copied to clipboard

`az storage account network-rule add` clobbers ipRules with concurrent operations

Open saarivirtajCGI opened this issue 1 year ago • 1 comments

Describe the bug

It seems that running two or more commands of az storage account network-rule add -n mystorage --ip-address x.x.x.x concurrently will overwrite each other's changes - i.e. it's not actually safely adding to whatever is there. It should probably be using ETags, or at least providing the possibility to use them by providing some sort of --if-match argument.

My Storage Account access logs show that there were two changes 2 seconds apart, the first one added an IP address and the second one immediately overwrote the lastly added IP address with another one.

My use case is running Azure DevOps pipelines, where multiple runs are triggered at the same time and they each run on different agents, needing access to the same Storage Account.

Related command

az storage account network-rule add -n mystorage --ip-address x.x.x.x

Errors

image image

Issue script & Debug output

Expected behavior

Adding IP addresses to Storage Account IP Rules should always add, never replace existing ones.

Environment Summary

azure-cli 2.63.0

core 2.63.0 telemetry 1.1.0

Dependencies: msal 1.30.0 azure-mgmt-resource 23.1.1

Python location '/opt/az/bin/python3' Extensions directory '/home//.azure/cliextensions'

Python (Linux) 3.11.8 (main, Jul 31 2024, 03:39:39) [GCC 11.4.0]

Legal docs and information: aka.ms/AzureCliLegal

Additional context

saarivirtajCGI avatar Aug 28 '24 12:08 saarivirtajCGI

Thank you for opening this issue, we will look into it.

yonzhan avatar Aug 28 '24 12:08 yonzhan

We are also running into exactly the same issue, running on Github Actions running on different agents. Our experience is exactly what OP is describing.

code:

az storage account network-rule add -g rg-name --account-name ourstorageaccount --ip-address 10.0.0.1 -o none

According to the Microsoft_Azure_ChangeAnalysis on the storage account:

11/21/2024, 8:32:37 AM GMT+1
properties.networkAcls.ipRules["10.0.0.1"]
properties.networkAcls.ipRules["10.0.0.2"]
11/21/2024, 8:32:35 AM GMT+1
properties.networkAcls.ipRules["10.0.0.1"]

ip 10.0.0.1 is added correctly at 8:32:35, but when the ip 10.0.0.2 is being added at 8:32:37, it is resulting in a change, not appending to the existing ruleset:

             {
               "action": "Allow",
-              "value": "10.0.0.1"
+              "value": "10.0.0.2"
             }

We have other workflows that succeeds in appending to the network ruleset. According to our github action logs, the action ran at the same timestamp for these two actions:

gh action log

2024-11-21T07:32:30.1490274Z Opening firewall for 10.0.0.1

gh action log

2024-11-21T07:32:30.2839596Z Opening firewall for 10.0.0.2

environment

github action execution via docker image mcr.microsoft.com/azure-cli:2.66.0

erikrok avatar Nov 21 '24 08:11 erikrok

Hi @saarivirtajCGI @erikrok, this network-rule add command is split into two steps (first a GET then a SET) on the client side. Unless the service side is able to provide a way to exclusive lock storage account property updates(can try read-only lock) or to provide a new append operation, async operations cannot guarantee the update result. Is it possible to sync the multiple github actions into a single network-rule update?

calvinhzy avatar Jan 02 '25 09:01 calvinhzy

@calvinhzy It's not really doable or practical, since there are multiple pipelines running operations that need just-in-time access to the same storage account. Let me explain.

Each operation is an individually run operation. The use case is having Terraform backend in a Storage Account - all pipelines that perform Terraform operations need to access the state files in the Storage Account (blob storage) and they need to add a just-in-time firewall exception to allow the agent to access the Storage Account.

Traditionally, an ETag + If-Match header would solve this problem. If the ETags don't match, just perform the GET and SET again until you succeed, simple. This same problem exists with for example the Application Gateway CLI commands (az network application-gateway url-path-map update) but there at least there's a workaround for it, because the REST requests and responses do contain an ETag that we can use, i.e. we can run a manual az rest command with --headers If-Match=<etag here>. See example:

Execute: az rest --method get --url "https://management.azure.com/subscriptions/xxx/resourceGroups/yyy/providers/Microsoft.Network/applicationGateways/zzz?api-version=2023-11-01"

{
  "etag": "W/\"ec63d62f-19d6-4731-ae37-baf7df6c8cf9\"",
  "id": "/subscriptions/xxx/resourceGroups/yyy/providers/Microsoft.Network/applicationGateways/zzz",
  "identity": {
...

... and then we can make some changes, and execute: az rest --method put --url "..." --body ... --headers If-Match=<above etag goes here>

If this fails because a change was made in between GET and PUT, we try again until it succeeds because of the ETag and we can be sure that it does not clobber the resource configuration.

However, the Storage Account API does not provide an ETag to work with. So there is no workaround. There's no ETag present in the response body or the headers.

az rest --method get --url "https://management.azure.com/subscriptions/xxx/resourceGroups/yyy/providers/Microsoft.Storage/storageAccounts/zzz?api-version=2024-01-01"

{
  "id": "/subscriptions/xxx/resourceGroups/yyy/providers/Microsoft.Storage/storageAccounts/zzz",
  "kind": "StorageV2",
<no etag present in response>

So because of this, it's virtually impossible to do concurrent operations on the Storage Account's IP rules for example unless you script some kind of a bespoke lock system to handle it. And quite frankly that would be a shoddy solution for this problem.

This issue needs addressing. I can imagine this is a somewhat commonly encountered problem especially because of Terraform.

saarivirtajCGI avatar Jan 07 '25 08:01 saarivirtajCGI