login icon indicating copy to clipboard operation
login copied to clipboard

ERROR: AADSTS700024: Client assertion is not within its valid time range

Open krukowskid opened this issue 2 years ago • 50 comments

Hi! I am facing a similar issue (#180) that appears to have been resolved, but I'm still encountering this problem when executing dotnet tests in GitHub Runner.

Azure.Identity.CredentialUnavailableException : DefaultAzureCredential failed to retrieve a token from the included credentials. See the troubleshooting guide for more information. https://aka.ms/azsdk/net/identity/defaultazurecredential/troubleshoot
...
- Azure CLI authentication failed due to an unknown error. See the troubleshooting guide for more information. https://aka.ms/azsdk/net/identity/azclicredential/troubleshoot ERROR: AADSTS700024: Client assertion is not within its valid time range. Current time: 2023-10-31T11:53:04.4424859Z, assertion valid from 2023-10-31T11:39:49.0000000Z, expiry time of assertion 2023-10-31T11:44:49.0000000Z. Review the documentation at https://docs.microsoft.com/azure/active-directory/develop/active-directory-certificate-credentials . Trace ID: d64c537e-1d94-4274-9012-c0d7590f1c00 Correlation ID: 5c769bb7-e85a-4557-ba28-92f8eca1c4ff Timestamp: 2023-10-31 11:53:04Z
	Interactive authentication is needed. Please run:
	az login

I'm using action version 1.4.6 and azure.identity package version 1.10.4 + DefaultAzureCredential(). The issue doesn't occur on integration tests where nearly all of them utilize tokens. However, if I run API/UI tests where I employ identity in one or two tests, it fails with above error. Do you have any suggestions or workarounds?

krukowskid avatar Nov 21 '23 08:11 krukowskid

Hi @krukowskid , could you provide the workflow file, run it again with debug mode, and provide the debug log?

YanaXu avatar Nov 23 '23 05:11 YanaXu

same issue here this is a real pain. The token are only valid for 5 minutes, and if you don't use it until very far in your workflow, then it just throw the error shown by OP

I tried azure/[email protected] same issue. I'm not using any other way to login into azure.

benjamin-rousseau-shift avatar Nov 27 '23 09:11 benjamin-rousseau-shift

same issue here this is a real pain. The token are only valid for 5 minutes, and if you don't use it until very far in your workflow, then it just throw the error shown by OP

I tried azure/[email protected] same issue. I'm not using any other way to login into azure.

Hi @benjamin-rousseau-shift could you provide your workflow file and debug log? Do you also use OIDC login? OIDC login with SP should have an expiration of 1 hour and OIDC with User-assigned managed identity should have 24 hours.

YanaXu avatar Nov 28 '23 01:11 YanaXu

I will try to give you that , I am using OIDC with a service principal using federated credentials.

benjamin-rousseau-shift avatar Nov 28 '23 01:11 benjamin-rousseau-shift

@YanaXu

here is my workflow definition (its reusable workflow). I have also enabled debug but it doesnt make sense to paste it here because it's so noisy. Workflow is failing in 🧪 Run tests for specified filter and rerun failed step. I will provide debug logs, just let me know which part/step you are interested in

reusable workflow definition
name: 'reusable/run-tests'
on:
  workflow_call:
    inputs:
      environment:
        required: true
        type: string

      system-under-test:
        required: false
        type: string
        default: xwow

      test-configuration:
        required: true
        type: string

      tests-filter:
        description: 'Filter for selecting tests to run'
        required: true
        type: string

      tests-web-url:
        required: false
        type: string

      tests-apigateway-url:
        required: false
        type: string

      report-name:
        description: 'Name for execution report and attachments'
        required: false
        default: Default
        type: string

      allure-reports:
        required: false
        default: false
        type: boolean

      allure-project-id:
        required: false
        type: string

    secrets:
      KrukowskidBotAppId:
        required: false
      KrukowskidBotPrivateKey:
        required: false
      ad-username:
        required: false
      ad-password:
        required: false
      azure-client-id:
        required: false
      azure-tenant-id:
        required: false
      azure-subscription-id:
        required: false
      identity-url:
        required: false
      identity-client-id:
        required: false
      backoffice-identity-url:
        required: false
      backoffice-client-id:
        required: false
      backoffice-client-secret:
        required: false      
      backoffice-identity-scope:
        required: false
      allure-server-password:
        required: false

permissions:
  id-token: write
  contents: write
  actions: read
  checks: write

jobs:
  run-tests:
    name: run-tests
    environment: ${{ inputs.environment }}
    runs-on:
      labels: ubuntu-latest-8core32ram
    timeout-minutes: 20
    env:
      E2E-ENVIRONMENT: ${{ inputs.test-configuration }}
      E2E-SUT: ${{ inputs.system-under-test }}
      ALLURE_SERVER_URL: ${{ vars.ALLURE_SERVER_URL }}
      ALLURE_SERVER_USER: ${{ vars.ALLURE_SERVER_USER }}
      ALLURE_SERVER_PASSWORD: ${{ secrets.allure-server-password }}
    defaults:
      run:
        shell: pwsh
    steps:
    - name: Generate token
      if: ${{ github.repository != 'Krukowskid/Krukowskid.Tests' }}
      id: generate_token
      uses: tibdex/github-app-token@v1
      with:
        app_id: ${{ secrets.KrukowskidBotAppId }}
        private_key: ${{ secrets.KrukowskidBotPrivateKey }}

    - name: Checkout Tests
      if: ${{ github.repository != 'Krukowskid/Krukowskid.Tests' }}
      uses: actions/checkout@v3
      with:
        repository: Krukowskid/Krukowskid.Tests
        token: "${{ steps.generate_token.outputs.token }}"
        ref: main
        
    - name: Checkout Tests
      if: ${{ github.repository == 'Krukowskid/Krukowskid.Tests' }}
      uses: actions/checkout@v3
        
    - name: Azure login
      uses: Azure/[email protected]
      with:
        client-id: ${{ secrets.azure-client-id }}
        tenant-id: ${{ secrets.azure-tenant-id }}
        subscription-id: ${{ secrets.azure-subscription-id }}

    - name: Setup .NET
      uses: actions/setup-dotnet@v3
      with:
        dotnet-version: 7.0.x

    - name: Check Other Chrome Version
      run: /usr/bin/google-chrome --version
    
    - name: Restore dependencies
      run: dotnet restore src

    - name: List Config Files
      run: ls src/Krukowskid.Tests.Common/Krukowskid.Tests.Common.Configuration

    - name: Add TestResults dir
      run: | 
        mkdir src/TestAutomation
        mkdir src/TestAutomation/TestResults
        mkdir src/TestAutomation/TestResults/AllureReports
      
    - name: 🦿 Override WebUrl
      if: ${{ inputs.tests-web-url != '' }}
      shell: bash --noprofile --norc {0}
      run: |
        echo "Setting E2E_TESTS__WEB__URL env var to ${{ inputs.tests-web-url }}"
        echo "E2E_TESTS__WEB__URL=${{ inputs.tests-web-url }}" >> $GITHUB_ENV
    
    - name: 🦿 Override ApiGatewayUrl
      if: ${{ inputs.tests-apigateway-url != '' }}
      shell: bash --noprofile --norc {0}
      run: |
        echo "Setting E2E_TESTS__APIGATEWAY__URL env var to ${{ inputs.tests-apigateway-url }}"
        echo "E2E_TESTS__APIGATEWAY__URL=${{ inputs.tests-apigateway-url }}" >> $GITHUB_ENV

    - name: 🏗 Build
      run: dotnet build src --no-restore

    - name: List Files
      run: |
        ls src -lR > src/TestAutomation/TestResults/post-build-files.txt
        ls ${{ github.workspace }}
        
    - name: 🦾 Install browser for Playwright tests
      shell: pwsh
      run: src/Krukowskid.Tests.UI/Krukowskid.Tests.UI.x/bin/Debug/net7.0/playwright.ps1 install --with-deps chromium
    
    - name: 🧪 Run tests for specified filter and rerun failed
      shell: bash --noprofile --norc {0}
      env:
        LC_ALL: en_US.utf8
      run: |
        counter=1
        exitcode=0
        reset="\e[0m"
        warn="\e[0;33m"
        green="\e[0;92m"
        blue="\e[0;94m"
        while [ $counter -lt 4 ]
        do
            if [ $filter ]
            then
                echo -e "${warn}Run number: $counter. Re-running failed tests filter: $filter ${reset}"
                # run test and forward output also to a file in addition to stdout (tee command)
                cp src/TestAutomation/TestResults/runtestsoutput.log src/TestAutomation/TestResults/runtestsoutput_first.log
                dotnet test src --no-build --filter=$filter --verbosity minimal --logger trx --results-directory src/TestAutomation/TestResults --settings:src/Krukowskid.Tests.Common/Krukowskid.Tests.Common.Configuration/cicd.runsettings | tee src/TestAutomation/TestResults/runtestsoutput.log
            else
                echo -e "${blue}First run. Running tests with filter "${{ inputs.tests-filter }}" ${reset}"
                dotnet test src --no-build --filter "${{ inputs.tests-filter }}" --verbosity minimal --logger trx --results-directory src/TestAutomation/TestResults --settings:src/Krukowskid.Tests.Common/Krukowskid.Tests.Common.Configuration/cicd.runsettings | tee src/TestAutomation/TestResults/runtestsoutput.log
            fi
            # capture dotnet test exit status, different from tee
            exitcode=${PIPESTATUS[0]}
            if [ $exitcode == 0 ]
            then
                echo -e "${green}Running tests succeeded after $counter attempts.${reset}"
                exit 0
            fi
            filter=$(cat src/TestAutomation/TestResults/runtestsoutput.log | grep -o -P '(?<=\sFailed\s)\w*'| grep -v -x 'Krukowskid' | awk 'BEGIN { ORS="|" } { print("Name=" $0) }' | grep -o -P '.*(?=\|$)')
            ((counter++))
        done
        exit $exitcode

    - name: List Files
      if: always()
      run: ls src -lR > src/TestAutomation/TestResults/post-tests-files.txt
    
    - name: 📈 Generate Github Report
      uses: dorny/test-reporter@v1
      if: always()
      with:
        name: ${{ inputs.report-name }} Test Execution Report
        path: 'src/TestAutomation/TestResults/*.trx'
        reporter: 'dotnet-trx'
        list-suites: 'all'
        fail-on-error: 'false'

    - name: Find Allure Reports
      if:  ${{ always() && inputs.allure-reports == true }} 
      shell: bash
      run: |        
        find src -type d -name "allure-results"        

    - name: Copy Allure Reports
      if:  ${{ always() && inputs.allure-reports == true }} 
      shell: bash
      run: |        
        find src -type d -name "allure-results" -exec cp -r -v {}/. src/TestAutomation/TestResults/AllureReports \;
              
    - name: 📈 Upload Allure Reports
      uses: unickq/send-to-allure-docker-service-action@v1
      if:  ${{ always() && github.ref_name == 'main' && inputs.allure-reports == true }} 
      continue-on-error: true
      with:
        allure_results: src/TestAutomation/TestResults/AllureReports
        project_id: ${{ inputs.allure-project-id }}
        auth: true
        generate: true       

    - name: Upload additional reports
      uses: actions/upload-artifact@v3
      if: always()
      with:
        name: ${{ inputs.report-name }}TestReports
        path: |
          src/TestAutomation
          src/**/TestResults
          src/**/bin/**/allureConfig.json
          src/**/bin/**/appSettings.*.json

krukowskid avatar Nov 28 '23 10:11 krukowskid

Hi @krukowskid , From the description of this issue, I see the error is thrown from Azure CLI. But in the steps of "reusable workflow definition", I can't tell which step throws the exception. Could you answer these questions for the further analysis?

  • This error is thrown from one of the Azure CLI cmd, right?
  • Could you provide the screenshot of the workflow run? (an example of the screenshot)
  • Is ubuntu-latest-8core32ram a self-hosted runner?
  • Do you konw which version of Azure CLI you're using for ubuntu-latest-8core32ram?
  • Have your tried the latest Azure CLI version?

YanaXu avatar Nov 29 '23 02:11 YanaXu

  • This error is thrown from one of the Azure CLI cmd, right?

Its thrown in dotnet tests (🧪 Run tests for specified filter and rerun failed step) that are using DefaultAzureCredential()

  • Could you provide the screenshot of the workflow run?

image

  • Is ubuntu-latest-8core32ram a self-hosted runner?

Its github hosted (large) runner., same problem on ubuntu-latest

  • Do you konw which version of Azure CLI you're using for ubuntu-latest-8core32ram?

same as on ubuntu-latest

  • Have your tried the latest Azure CLI version?

on the day i was creating an issue 1.4.6 was the latest. I will try with 1.5.0

krukowskid avatar Nov 29 '23 08:11 krukowskid

@krukowskid, Azure Login Action works for Azure CLI and Azure PowerShell. But in your workflow file, Run tests for specified filter and rerun failed only call dotnet commands. Do you mean the error is thrown for your c# source code? Have you checked the code if they run the auth independently without Azure CLI?

YanaXu avatar Dec 01 '23 09:12 YanaXu

I am using DefaultAzureCredential. Locally (with visualstudioidentity) it works, it also works with azure login action with secret

krukowskid avatar Dec 05 '23 13:12 krukowskid

@krukowskid , What I can see from Run tests for specified filter and rerun failed is the workflow file tries to run "dotnet test". I don't know what's inside. Azure Login Action supports Azure CLI and Azure PowerShell. If it's pure c# test codes, I don't think it'll work. If the tests call Azure CLI or Azure PowerShell, it's another story. Can you share more details with us?

YanaXu avatar Dec 06 '23 01:12 YanaXu

In dotnet code I am using DefaultAzureCredential from Azure.Identity package. During authentication it loops trough all possible methods of authentication. When running test on runner it's using AzureCliCredential with CLI context set on runner by azure/login action

krukowskid avatar Dec 06 '23 10:12 krukowskid

Sticking my me too on this problem, exactly the same error message and reporting of a 5 minute token. Out of curiosity, is there a point where the v1 tag should be dropped back to a previously working commit in order to avoid lots of issues? I know that best practice is that workflows should us commit hashes instead of tags when referencing actions but I'm sure there are lots of workflows that don't.

shaneholder avatar Dec 08 '23 17:12 shaneholder

Sticking my me too on this problem, exactly the same error message and reporting of a 5 minute token. Out of curiosity, is there a point where the v1 tag should be dropped back to a previously working commit in order to avoid lots of issues? I know that best practice is that workflows should us commit hashes instead of tags when referencing actions but I'm sure there are lots of workflows that don't.

Hi @shaneholder could you please provide more details about your issues? As we know, v1.5.1 will not introduce the issues like this. We're trying to reproduce this issue and figure out how it happens now. FYI, we would drop back v1 to an older version if the latest version truely introduces some issues, e.g. #371 . However, about moving the v1 to the latest version or not, everyone has different opinions, e.g. #380. Let's focus on this issue itself. Please help us to provide more details to reproduce it. If it's indeed an issue, we'll take the right action on it.

YanaXu avatar Dec 12 '23 01:12 YanaXu

I don't know why but I can't replicate it anymore. However if you are still curious on how my workflow looks like :

name: Test Workflow for Debugging Azure Cli Credentials Timeout

on:
  workflow_dispatch:

permissions:
  id-token: write
  contents: read

jobs:
  azure:
    name: "Testing Azure Cli Timeout"
    runs-on: [self-hosted, linux, x64] # ubuntu-latest
    environment: Production
    steps:
      - name: Install Azure cli
        run: |
          sudo apt-get install ca-certificates curl apt-transport-https lsb-release gnupg -y
          curl -sL https://packages.microsoft.com/keys/microsoft.asc | gpg --dearmor | sudo tee /etc/apt/trusted.gpg.d/microsoft.gpg > /dev/null
          AZ_REPO=$(lsb_release -cs)
          echo "deb [arch=amd64] https://packages.microsoft.com/repos/azure-cli/ $AZ_REPO main" | sudo tee /etc/apt/sources.list.d/azure-cli.list
          sudo apt-get update
          sudo apt-get install azure-cli

      - name: Az CLI login
        uses: azure/login@v1
        with:
          client-id: ${{ vars.AZURE_CLIENT_ID }}
          tenant-id: ${{ vars.AZURE_TENANT_ID }}
          allow-no-subscriptions: true

      - name: Sleep for 10 minutes
        run: sleep 600

      - name: Az CLI Account Show
        run: az account show

what I'm suspecting is that for the ubuntu runner we are using, azure cli might have been updated ? (I'm not sure which version of ubuntu we are running, but it might be that azure cli latest was not yet the right version for our distrib ?)

benjamin-rousseau-shift avatar Dec 13 '23 12:12 benjamin-rousseau-shift

Scratch that I actually still face it, but my real pipeline is a bit different as it also install azure-cli-core using pip3 for some requirements with the azure ansible collection.

I wonder if it's the azure-cli-core (2.34.0) that messes up with the token expiration even though I login with the action before even installing this azure-cli-core, I am lost.

EDIT: it's not, I tested by forcing the installation of 2.55.0 with pip3 and still the same thing. I'm trying some more workflows to see if I can replicate in an isolated environment

benjamin-rousseau-shift avatar Dec 14 '23 03:12 benjamin-rousseau-shift

@benjamin-rousseau-shift i think the issue is with the underlying OIDC token issued by Github (5 minutes expiry). it seems like its not a fault of Azure Cli. I've started having issues similar to yours after migrating to federated identity. I solved them:

https://stackoverflow.com/questions/77686072/issues-with-azure-identity-when-using-federated-credentials

I'm using python, but you can implement this fix in any other language:

def get_azure_credentials():
    token_request = os.environ.get("ACTIONS_ID_TOKEN_REQUEST_TOKEN")
    token_uri = os.environ.get("ACTIONS_ID_TOKEN_REQUEST_URL")
    subprocess_helper(f'token=$(curl -H "Authorization: bearer {token_request}" "{token_uri}&audience=api://AzureADTokenExchange" | jq .value -r) && az login --service-principal -u {CLIENT_ID} -t {TENANT_ID} --federated-token $token')
    return AzureCliCredential()

4c74356b41 avatar Dec 20 '23 12:12 4c74356b41

@4c74356b41 By doing this I think you're basically doing exactly the same thing as the github action. My workaround for now is to azure login again (just like you do in your python script) right before I need to fetch something from azure. Not the fanciest solution but yeah the OIDC token are only valid 5 minutes that's a fact no matter what the documentation is saying :/

benjamin-rousseau-shift avatar Dec 26 '23 15:12 benjamin-rousseau-shift

@4c74356b41 @benjamin-rousseau-shift, you are right. The GitHub OIDC provider issues a JWT ID token with a 5-minute expiration time. Its lifespan is not officially documented. By decoding the OIDC token, we can find it is actually expired in 5 minutes. You can also verify this in the sample token.

During login, Azure CLI will use the GitHub OIDC token to fetch an access token from MSAL. This access token will be stored in msal_token_cache. This access token is assigned a random value ranging between 60-90 minutes (75 minutes on average). See https://learn.microsoft.com/en-us/entra/identity-platform/access-tokens#access-token-lifetime.

AzureCliCredential() authenticates by requesting a token from the Azure CLI. The instantiation of AzureCliCredential() alone will not raise the error. The error should occur when calling its method get_token(). It executes az account get-access-token --output json --resource {} to request a token from Azure CLI. See https://github.com/Azure/azure-sdk-for-python/blob/6aa171f81c0111996a2785b14864e961a7942e87/sdk/identity/azure-identity/azure/identity/_credentials/azure_cli.py#L24.

For az account get-access-token, Azure CLI first calls acquire_token_silent to attempt to get an access token from token cache. If no access token is returned, it calls acquire_token_for_client to get a new access token with client assertion in OIDC scenario, see https://github.com/Azure/azure-cli/issues/13276#issuecomment-1301828386.

Regarding @krukowskid's issue, the error ERROR: AADSTS700024: Client assertion is not within its valid time range. is most likely because DefaultAzureCredential fails to find or accept the access token in token cache and attempts to fetch a new access token again. At this point, the GitHub OIDC token is expired and cannot be used to fetch an access token.

In my local testing, it works seamlessly under normal conditions, returning the access token from the cache without needing to fetch a new access token from MSAL. I am wondering if you use GetToken() to issue a different scope from the access token stored in token cache. You may double check the TokenRequestContext argument for DefaultAzureCredential().GetToken().

MoChilia avatar Dec 29 '23 09:12 MoChilia

not sure if I'm interpreting what you say right. basically what you are saying that the default token in token cache should still be valid for 75 minutes on average and if we somehow retrieve that it should work (even though OIDC token expired)?

4c74356b41 avatar Dec 29 '23 11:12 4c74356b41

@4c74356b41, you're correct. Azure CLI stores the access token fetched from MSAL, which is valid for 75 minutes on average. If you are trying to retrieve this token from cache, it should work without the need of OIDC token. But if you are retrieving a new access token from remote MSAL, it needs OIDC token.

MoChilia avatar Jan 02 '24 02:01 MoChilia

mkay, can you, please, help me understand how to reliably request token from the cache and not a new token? thanks!

4c74356b41 avatar Jan 03 '24 16:01 4c74356b41

@4c74356b41, I tried the following python code, it will return the token form cache if it is still valid.

from azure.identity import AzureCliCredential
azure_cli_credential = AzureCliCredential()
print("AzureCliCredential: ", azure_cli_credential.get_token("https://management.core.windows.net/"))

MoChilia avatar Jan 04 '24 05:01 MoChilia

thats what i was using and its definitely isnt working with OIDC

4c74356b41 avatar Jan 04 '24 07:01 4c74356b41

Just faced similar issue

In my case - workflow is quite long running scheduled job to cleanup some unwanted images from azure container registry

Here is workflow file, nothing fancy inside, technically it has only two moving parts:

  1. azure login
  2. run powershell script
cleanup.yml
name: cleanup
on:
  workflow_dispatch:
env:
  ARM_CLIENT_ID: 000000000-0000-0000-0000-000000000000
  ARM_USE_OIDC: true

permissions:
  contents: read
  id-token: write

jobs:
  cleanup:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: azure/login@v1
        with:
          client-id: 000000000-0000-0000-0000-000000000000
          tenant-id: 000000000-0000-0000-0000-000000000000
          subscription-id: 000000000-0000-0000-0000-000000000000
      - run: pwsh cleanup.ps1

the script itself is something like this (stripped out all irrelevant details) aka it is iterating over images and deletes them from container registry

$ErrorActionPreference = "Stop"

$registry = 'demo'
az acr login -n $registry

# Step 1: retrieve images
# pretend we received images here
$used = @('demo.azurecr.io/foo:latest', 'demo.azurecr.io/bar:1.2.0')

# Step 2: delete images
$counter = 0
foreach ($image in $items) {
  try {
    az acr repository delete -n $registry --image $image --yes --only-show-errors
    Write-Host "$image - deleted" -ForegroundColor Green
    $counter += 1
  }
  catch {
    Write-Host "$image - failed" -ForegroundColor Red
  }
  # ♻️ workaround - manually refresh token
  if ($env:ARM_CLIENT_ID -and $counter % 100 -eq 0) {
    az login --service-principal -u $env:ARM_CLIENT_ID -t (az account show --query tenantId -o tsv) --federated-token (Invoke-RestMethod -Uri "$($env:ACTIONS_ID_TOKEN_REQUEST_URL)&audience=api://AzureADTokenExchange" -Headers @{Authorization = "Bearer $($env:ACTIONS_ID_TOKEN_REQUEST_TOKEN)" } | Select-Object -ExpandProperty value)
  }
}

as you can guess because it is deleting images one by one it took some time, definitely more than 5 minutes, in my case job took 2 hour

so after a while all attempts to delete images are failed with following error:

ERROR: AADSTS700024: Client assertion is not within its valid time range. Current time: 2024-03-14T16:07:58.2005292Z, assertion valid from 2024-03-14T15:12:23.0000000Z, expiry time of assertion 2024-03-14T15:17:23.0000000Z. Review the documentation at https://docs.microsoft.com/azure/active-directory/develop/active-directory-certificate-credentials . Trace ID: 849defde-0aa5-4a2f-a30d-ec73d2266000 Correlation ID: 9706d64c-2538-4e10-8808-cb3f37cb0a93 Timestamp: 2024-03-14 16:07:58Z
  Interactive authentication is needed. Please run:
  az login

so i was wondering if there is a some kind of workaround, aka az refresh or something like that 🤔

and many thanks to @4c74356b41 for pointing me out - there is, added an example of how it may be done in powershell

marchenko1985 avatar Mar 14 '24 19:03 marchenko1985

use this work around detailed previously:

token=$(curl -H "Authorization: bearer {token_request}" "{token_uri}&audience=api://AzureADTokenExchange" | jq .value -r) 
az login --service-principal -u {CLIENT_ID} -t {TENANT_ID} --federated-token $token')

you can create a timer to call this every 5 minutes or you can simply do this every iteration (or every other iteration, etc)

you can also use runspaces to finish everything 10x faster or smth

4c74356b41 avatar Mar 15 '24 03:03 4c74356b41

We've recently been experiencing this issue, it was working fine before, and no changes have been made to the workflow.

Setup:

  • Runner using the ubuntu-latest image.
  • azure/login using OIDC login.
  • run steps calling the Azure CLI directly.

We noticed that the issue arose when the GitHub hosted runner image went from 20240324.2.0 to 20240407.1.0. The PR shows that the Azure CLI was updated from 2.58.0 to 2.59.0, see https://github.com/actions/runner-images/pull/9656/files#diff-66aec6097318276b09842a3ba2caf3037afbd8dadca2dfcdf76631100613ea69R111.

I'm not aware of nice workarounds for now, so I'll add more azure/login steps...

mderriey avatar Apr 10 '24 08:04 mderriey

Same here, now experiencing it way more often... gotta put in more login steps. Azure is slow with deploying some resources and it's just a pain in the ... to have to relog for every action.

Workaround in pwsh


                Write-Verbose -Verbose "Force refresh token" # https://github.com/Azure/login/issues/372
                $uri = "$($ENV:ACTIONS_ID_TOKEN_REQUEST_URL)&audience=api://AzureADTokenExchange"
                $reqToken = "bearer $($ENV:ACTIONS_ID_TOKEN_REQUEST_TOKEN)"

                Write-Verbose -Verbose "Get token"
                $token = Invoke-RestMethod -Method GET -Uri "$($uri)&audience=api://AzureADTokenExchange" -Headers @{ "Authorization" = "$($reqToken)" } | Select-Object -ExpandProperty value
                Write-Verbose -Verbose "Login"
                az login --service-principal -u REPLACE_W_CLIENTID -t REPLACE_W_TENANTID --federated-token $token

Kaloszer avatar Apr 10 '24 09:04 Kaloszer

I am the developer of Azure CLI for federated identity credential support. Please see https://github.com/Azure/azure-cli/issues/28708#issuecomment-2047256166 for a temporary mitigation to extend the task duration to 60 minutes.

jiasli avatar Apr 10 '24 12:04 jiasli

@jiasli, thanks for suggesting this workaround. I tried your suggestion in my pipeline, but still run into the same issue as before. Example run: https://github.com/microsoft/hi-ml/actions/runs/8642139946/job/23692828663, using the workflow updated like this: https://github.com/microsoft/hi-ml/pull/925/

Roughly speaking, in our test suite, we repeatedly run tests that

  • get a credential (AzureCliCredential or service principal)
  • run an Azure or AzureML operation using that credentials object

Despite having added various different scoped access tokens, I always eventually hit a token expiry problem

ant0nsc avatar Apr 11 '24 08:04 ant0nsc

A nice solution with automatic periodic refresh has been suggested in https://github.com/Azure/azure-cli/issues/28708#issuecomment-2049014471 which you can wrap in a custom github action like show below. Can potentially be used as a temporary replacement of this action for long running workflows.

name: Azure Federated Login

inputs:
  client-id:
    description: Azure client id
    type: string
  tenant-id:
    description: Azure tenant id
    type: string
  subscription-id:
    description: Azure subscription id
    type: string
    default: none
  refresh-interval-seconds:
    description: Refresh interval in seconds
    type: number
    default: 240


runs:
  using: "composite"
  steps:
    - name: Fetch OID token every ${{ inputs.refresh-interval-seconds }} seconds
      shell: bash
      run: |
        first_time=true
        while true; do
          token=$(curl -s -H "Authorization: bearer ${ACTIONS_ID_TOKEN_REQUEST_TOKEN}" "${ACTIONS_ID_TOKEN_REQUEST_URL}&audience=api://AzureADTokenExchange" | jq .value -r)
          az login --service-principal -u ${{ inputs.client-id }} -t ${{ inputs.tenant-id }} --federated-token $token --output none
          if [ "$first_time" = true ] && [ "${{ inputs.subscription-id }}" != "none" ]; then
            az account set -s ${{ inputs.subscription-id }}
            first_time=false
          fi
          sleep ${{ inputs.refresh-interval-seconds }}
        done &

nlighten avatar Apr 11 '24 21:04 nlighten