Self hosted runners failing with WRITE ERROR: A session for this runner already exists
We are using self-hosted ephimeral runners for our github jobs. Recently we are seeing jobs failing randomly with the error This job failed on the UI. On further digging we found out that it's trying to run the job on a runner which is already in Active state instead of Idle. We also saw the following logs.
ERR Terminal] WRITE ERROR: A session for this runner already exists
It gets stuck in this process for long and then fails. Ex:
Waiting for a runner to pick up this job...
Job is about to start running on the runner: ip-10-0-87-43 (organization)
We even tried upgrading the action agent to v2.319.1 , but the issue still persists.
All of our runners have the same labels hence the job could easily pickup any runner in Idle state, but instead its trying to pickup a runner which is in Active state.
Hi @soumik-avoma, do you have a link to a workflow run where this issue is occurring?
Hi @luketomlinson, Since the workflow are private, hence won't be able to share that. But here are the logs:
2024-08-26 16:05:53.816 | [2024-08-26 10:35:53Z INFO MessageListener] Sleeping for 30 seconds before retrying. |
-- | -- | --
| | 2024-08-26 16:05:53.816 | [2024-08-26 10:35:53Z INFO MessageListener] The session conflict exception haven't reached retry limit. |
| | 2024-08-26 16:05:53.816 | [2024-08-26 10:35:53Z ERR Terminal] WRITE ERROR: A session for this runner already exists. |
| | 2024-08-26 16:05:53.816 | [2024-08-26 10:35:53Z INFO MessageListener] The session for this runner already exists. |
| | 2024-08-26 16:05:53.816 | at GitHub.Runner.Listener.MessageListener.CreateSessionAsync(CancellationToken token) |
| | 2024-08-26 16:05:53.816 | at GitHub.Services.WebApi.VssHttpClientBase.SendAsync[T](HttpMethod method, IEnumerable`1 additionalHeaders, Guid locationId, Object routeValues, ApiResourceVersion version, HttpContent content, IEnumerable`1 queryParameters, Object userState, CancellationToken cancellationToken) |
| | 2024-08-26 16:05:53.816 | at GitHub.Services.WebApi.VssHttpClientBase.SendAsync[T](HttpRequestMessage message, Object userState, CancellationToken cancellationToken) |
| | 2024-08-26 16:05:53.816 | at GitHub.Services.WebApi.VssHttpClientBase.SendAsync(HttpRequestMessage message, HttpCompletionOption completionOption, Object userState, CancellationToken cancellationToken) |
| | 2024-08-26 16:05:53.816 | at GitHub.Services.WebApi.VssHttpClientBase.HandleResponseAsync(HttpResponseMessage response, CancellationToken cancellationToken) |
| | 2024-08-26 16:05:53.816 | [2024-08-26 10:35:53Z ERR MessageListener] GitHub.DistributedTask.WebApi.TaskAgentSessionConflictException: The actions runner ip-10-0-74-151-1724668474 already has an active session. |
| | 2024-08-26 16:05:53.816 | [2024-08-26 10:35:53Z ERR MessageListener] Catch exception during create session. |
| | 2024-08-26 16:05:53.816 | [2024-08-26 10:35:53Z ERR GitHubActionsService] POST request to https://pipelinesghubeus2.actions.githubusercontent.com/85uueRDPTlsbvfv9f8kaXsAIyvEp4gmsRFQMwFV501YbBtzszu/_apis/distributedtask/pools/1/sessions failed. HTTP Status: Conflict |
| | 2024-08-26 16:05:53.566 | [2024-08-26 10:35:53Z INFO Terminal] WRITE LINE: |
| | 2024-08-26 16:05:53.566 | [2024-08-26 10:35:53Z INFO Terminal] WRITE LINE: |
| | 2024-08-26 16:05:53.566 | [2024-08-26 10:35:53Z INFO MessageListener] VssConnection created |
| | 2024-08-26 16:05:52.813 | [2024-08-26 10:35:52Z INFO GitHubActionsService] Finished operation Location.GetConnectionData |
| | 2024-08-26 16:05:52.813 | [2024-08-26 10:35:52Z INFO GitHubActionsService] Finished operation Location.GetConnectionData |
| | 2024-08-26 16:05:52.813 | [2024-08-26 10:35:52Z INFO GitHubActionsService] Finished operation Location.GetConnectionData |
| | 2024-08-26 16:05:52.312 | [2024-08-26 10:35:52Z INFO GitHubActionsService] Starting operation Location.GetConnectionData |
| | 2024-08-26 16:05:52.312 | [2024-08-26 10:35:52Z INFO RunnerServer] Establish connection with 60 seconds timeout. |
| | 2024-08-26 16:05:52.312 | [2024-08-26 10:35:52Z INFO RunnerServer] EstablishVssConnection |
| | 2024-08-26 16:05:52.312 | [2024-08-26 10:35:52Z INFO GitHubActionsService] Starting operation Location.GetConnectionData |
| | 2024-08-26 16:05:52.312 | [2024-08-26 10:35:52Z INFO RunnerServer] Establish connection with 60 seconds timeout. |
| | 2024-08-26 16:05:52.312 | [2024-08-26 10:35:52Z INFO RunnerServer] EstablishVssConnection |
| | 2024-08-26 16:05:52.312 | [2024-08-26 10:35:52Z INFO GitHubActionsService] Starting operation Location.GetConnectionData |
| | 2024-08-26 16:05:52.312 | [2024-08-26 10:35:52Z INFO RunnerServer] Establish connection with 100 seconds timeout. |
| | 2024-08-26 16:05:52.312 | [2024-08-26 10:35:52Z INFO RunnerServer] EstablishVssConnection |
| | 2024-08-26 16:05:52.312 | [2024-08-26 10:35:52Z INFO MessageListener] Connecting to the Runner Server... |
| | 2024-08-26 16:05:52.312 | [2024-08-26 10:35:52Z INFO MessageListener] Attempt to create session. |
| | 2024-08-26 16:05:25.232 | [2024-08-26 10:34:56Z INFO Listener] Runner execution has finished with return code 0 |
| | 2024-08-26 16:05:25.232 | [2024-08-26 10:34:56Z INFO ProcessInvokerWrapper] Finished process 2052 with exit code 0, and elapsed time 00:00:00.0011824. |
| | 2024-08-26 16:05:25.232 | [2024-08-26 10:34:56Z INFO ProcessInvokerWrapper] STDOUT/STDERR stream read finished. |
| | 2024-08-26 16:05:25.232 | [2024-08-26 10:34:56Z INFO ProcessInvokerWrapper] STDOUT/STDERR stream read finished. |
| | 2024-08-26 16:05:25.232 | [2024-08-26 10:34:56Z INFO ProcessInvokerWrapper] Process started with process id 2052, waiting for process exit. |
| | 2024-08-26 16:05:25.232 | [2024-08-26 10:34:56Z INFO ProcessInvokerWrapper] Updated oom_score_adj to 500 for PID: 2052. |
| | 2024-08-26 16:05:25.232 | [2024-08-26 10:34:56Z INFO ProcessInvokerWrapper] High priority process: 'False' |
| | 2024-08-26 16:05:25.232 | [2024-08-26 10:34:56Z INFO ProcessInvokerWrapper] Keep redirected STDIN open: 'False' |
| | 2024-08-26 16:05:25.232 | [2024-08-26 10:34:56Z INFO ProcessInvokerWrapper] Persist current code page: 'False' |
| | 2024-08-26 16:05:25.232 | [2024-08-26 10:34:56Z INFO ProcessInvokerWrapper] Redirected STDIN: 'False' |
| | 2024-08-26 16:05:25.232 | [2024-08-26 10:34:56Z INFO ProcessInvokerWrapper] Force kill process on cancellation: 'False' |
| | 2024-08-26 16:05:25.232 | [2024-08-26 10:34:56Z INFO ProcessInvokerWrapper] Encoding web name: ; code page: '' |
| | 2024-08-26 16:05:25.232 | [2024-08-26 10:34:56Z INFO ProcessInvokerWrapper] Require exit code zero: 'True' |
| | 2024-08-26 16:05:25.232 | [2024-08-26 10:34:56Z INFO ProcessInvokerWrapper] Working directory: '/home/ubuntu/actions-runner' |
| | 2024-08-26 16:05:25.232 | [2024-08-26 10:34:56Z INFO ProcessInvokerWrapper] Arguments: '755 "/home/ubuntu/actions-runner/svc.sh"' |
| | 2024-08-26 16:05:25.232 | [2024-08-26 10:34:56Z INFO ProcessInvokerWrapper] File name: '/usr/bin/chmod' |
| | 2024-08-26 16:05:25.232 | [2024-08-26 10:34:56Z INFO ProcessInvokerWrapper] Starting process: |
| | 2024-08-26 16:05:25.231 | [2024-08-26 10:34:56Z INFO UnixUtil] Running /usr/bin/chmod 755 "/home/ubuntu/actions-runner/svc.sh" |
| | 2024-08-26 16:05:25.231 | [2024-08-26 10:34:56Z INFO UnixUtil] Location: '/usr/bin/chmod' |
| | 2024-08-26 16:05:25.231 | [2024-08-26 10:34:56Z INFO UnixUtil] Which2: 'chmod' |
| | 2024-08-26 16:05:25.231 | [2024-08-26 10:34:56Z INFO HostContext] Well known directory 'Root': '/home/ubuntu/actions-runner' |
| | 2024-08-26 16:05:25.231 | [2024-08-26 10:34:56Z INFO HostContext] Well known directory 'Bin': '/home/ubuntu/actions-runner/bin' |
| | 2024-08-26 16:05:25.231 | [2024-08-26 10:34:56Z INFO HostContext] Well known directory 'Bin': '/home/ubuntu/actions-runner/bin' |
| | 2024-08-26 16:05:25.231 | [2024-08-26 10:34:56Z INFO HostContext] Well known directory 'Root': '/home/ubuntu/actions-runner' |
| | 2024-08-26 16:05:25.231 | [2024-08-26 10:34:56Z INFO HostContext] Well known directory 'Bin': '/home/ubuntu/actions-runner/bin' |
| | 2024-08-26 16:05:25.231 | [2024-08-26 10:34:56Z INFO SystemDControlManager] Service name 'actions.runner.ORG_NAME.ip-10-0-74-151-1724668474.service' display name 'GitHub Actions Runner (ORG_NAME.ip-10-0-74-151-1724668474)' will be used for service configuration. |
| | 2024-08-26 16:05:25.231 | [2024-08-26 10:34:56Z INFO Terminal] WRITE LINE: |
| | 2024-08-26 16:05:25.231 | [2024-08-26 10:34:56Z INFO Terminal] WRITE LINE: |
| | 2024-08-26 16:05:25.231 | [2024-08-26 10:34:56Z INFO ConfigurationStore] Settings Saved. |
| | 2024-08-26 16:05:25.231 | [2024-08-26 10:34:56Z INFO ConfigurationStore] Saving runner settings. |
| | 2024-08-26 16:05:25.231 | [2024-08-26 10:34:56Z INFO PromptManager] ReadValue |
| | 2024-08-26 16:05:25.231 | [2024-08-26 10:34:56Z INFO CommandSettings] Flag 'unattended': 'True' |
| | 2024-08-26 16:05:25.231 | [2024-08-26 10:34:56Z INFO CommandSettings] Arg 'work': '' |
| | 2024-08-26 16:05:25.231 | [2024-08-26 10:34:55Z INFO GitHubActionsService] AAD Correlation ID for this token request: Unknown |
| | 2024-08-26 16:05:25.231 | [2024-08-26 10:34:55Z INFO RSAFileKeyManager] Loading RSA key parameters from file /home/ubuntu/actions-runner/.credentials_rsaparams |
| | 2024-08-26 16:05:25.231 | [2024-08-26 10:34:55Z INFO RSAFileKeyManager] Loading RSA key parameters from file /home/ubuntu/actions-runner/.credentials_rsaparams |
| | 2024-08-26 16:05:25.231 | [2024-08-26 10:34:54Z INFO GitHubActionsService] Finished operation Location.GetConnectionData |
| | 2024-08-26 16:05:25.231 | [2024-08-26 10:34:54Z INFO GitHubActionsService] Finished operation Location.GetConnectionData |
| | 2024-08-26 16:05:25.231 | [2024-08-26 10:34:54Z INFO GitHubActionsService] Finished operation Location.GetConnectionData |
| | 2024-08-26 16:05:25.231 | [2024-08-26 10:34:54Z INFO GitHubActionsService] Starting operation Location.GetConnectionData |
| | 2024-08-26 16:05:25.231 | [2024-08-26 10:34:54Z INFO RunnerServer] Establish connection with 60 seconds timeout. |
| | 2024-08-26 16:05:25.231 | [2024-08-26 10:34:54Z INFO RunnerServer] EstablishVssConnection |
| | 2024-08-26 16:05:25.231 | [2024-08-26 10:34:54Z INFO GitHubActionsService] Starting operation Location.GetConnectionData |
| | 2024-08-26 16:05:25.231 | [2024-08-26 10:34:54Z INFO RunnerServer] Establish connection with 60 seconds timeout. |
| | 2024-08-26 16:05:25.231 | [2024-08-26 10:34:54Z INFO RunnerServer] EstablishVssConnection |
| | 2024-08-26 16:05:25.231 | [2024-08-26 10:34:54Z INFO GitHubActionsService] Starting operation Location.GetConnectionData |
| | 2024-08-26 16:05:25.231 | [2024-08-26 10:34:54Z INFO RunnerServer] Establish connection with 100 seconds timeout. |
| | 2024-08-26 16:05:25.231 | [2024-08-26 10:34:54Z INFO RunnerServer] EstablishVssConnection |
| | 2024-08-26 16:05:25.231 | [2024-08-26 10:34:53Z INFO RSAFileKeyManager] Loading RSA key parameters from file /home/ubuntu/actions-runner/.credentials_rsaparams |
| | 2024-08-26 16:05:25.231 | [2024-08-26 10:34:53Z INFO CredentialManager] Creating credential type: OAuth |
| | 2024-08-26 16:05:25.231 | [2024-08-26 10:34:53Z INFO CredentialManager] Creating type OAuth |
| | 2024-08-26 16:05:25.231 | [2024-08-26 10:34:53Z INFO CredentialManager] GetCredentialProvider |
| | 2024-08-26 16:05:25.231 | [2024-08-26 10:34:53Z INFO ConfigurationStore] stored True |
| | 2024-08-26 16:05:25.231 | [2024-08-26 10:34:53Z INFO ConfigurationStore] HasCredentials() |
| | 2024-08-26 16:05:25.231 | [2024-08-26 10:34:53Z INFO ConfigurationStore] Credentials Saved. |
| | 2024-08-26 16:05:25.231 | [2024-08-26 10:34:53Z INFO ConfigurationStore] Saving OAuth credential @ /home/ubuntu/actions-runner/.credentials |
| | 2024-08-26 16:05:25.231 | [2024-08-26 10:34:53Z INFO CommandSettings] Flag 'ephemeral': 'True' |
| | 2024-08-26 16:05:25.231 | [2024-08-26 10:34:53Z INFO CommandSettings] Flag 'ephemeral': 'True' |
| | 2024-08-26 16:05:25.231 | [2024-08-26 10:34:53Z INFO CommandSettings] Flag 'disableupdate': 'False' |
| | 2024-08-26 16:05:25.231 | [2024-08-26 10:34:53Z INFO CommandSettings] Flag 'no-default-labels': 'False' |
| | 2024-08-26 16:05:25.231 | [2024-08-26 10:34:53Z INFO CommandSettings] Flag 'disableupdate': 'False'
👋Hi @soumik-avoma
From our logs, I see that runner attempting to create a session twice. Once at 2024-08-26T10:35:13UTC and again 11 seconds later at 2024-08-26T10:35:24UTC. The 2nd one results in the conflict you were seeing. Is it possible in your setup/automation there is something causing ./run.sh to be called multiple times?
This is what we use for registering the runner. We do not call run.sh anywhere explicitly.
payload=$(curl -sX POST -H "Authorization: token $GITHUB_PAT" https://api.github.com/orgs/ORG_NAME/actions/runners/registration-token)
export RUNNER_TOKEN=$(echo $payload | jq .token --raw-output)
mkdir -p actions-runner && cd actions-runner
curl -o actions-runner-linux-$ARCH-2.319.1.tar.gz -L https://github.com/actions/runner/releases/download/v2.319.1/actions-runner-linux-$ARCH-2.319.1.tar.gz
echo "03d993c65e0c4daa5e3bf5a5a35ba356f363bdb5ceb6b5808fd52fdb813dd8e8 actions-runner-linux-$ARCH-2.319.1.tar.gz" | shasum -a 256 -c
tar xzf ./actions-runner-linux-$ARCH-2.319.1.tar.gz
cd /home/ubuntu/actions-runner
./config.sh \
--name $(hostname) \
--token $RUNNER_TOKEN \
--url https://github.com/ORG_NAME \
--unattended \
--labels $PURPOSE \
--replace \
--ephemeral
sudo ./svc.sh install
Hi @soumik-avoma please try at least doubling the memory of your runner. It happens, that after an OOM event, the runner restarts. And then it tries to create a new session whereas the previous session was not gracefully ended.
Hi @soumik-avoma please try at least doubling the memory of your runner. It happens, that after an OOM event, the runner restarts. And then it tries to create a new session whereas the previous session was not gracefully ended.
How can we verify this with logs or other hints?
We have the same situation in our self hosted runners windows
Running into the same issue with macOS runners. Action runners keep randomly disappearing from github and if I check the local logs, I see they failed like this a few days before.
We had a similar issue. We run the ARC in EKS and deleting the pod wasn't helping because we kept seeing that A session for this runner already exists.
Manually removing the runner (not the runner scale set) helped. After the pod went down, a new one was spawned, a new runner was registered and it started accepting jobs.
Same issue, subscribed
I'm having the exact same issue like wozniakpl
Adding more memory to our runners resolved the issue for me, even though I noticed that OOM (Out Of Memory) errors aren't handled as gracefully as I would have expected. Just wanted to share in case it helps others facing similar problems.
I also had some success increasing memory (after I had tried to remove all ARC related components from my kubernetes cluster) and deleting all idle/offline runners on the github side.
If resources are the issue, would it be possible to make this very obvious in a log/event/error?
Increasing memory is only kicking the can down the road. If there's a power outage, network partition, etc. then things are going to be non-gracefully disconnected from GH. The fact that ./config.sh has --replace but then ./run.sh still fails is a failure of the Github API design and only Github can fix that.
Which it hasn't, for several years now :|
same issue on macOS