AutoscalingRunnerSet gets stuck thinking runners exist when they do not
Checks
- [X] I've already read https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners-with-actions-runner-controller/troubleshooting-actions-runner-controller-errors and I'm sure my issue is not covered in the troubleshooting guide.
- [X] I am using charts that are officially provided
Controller Version
0.9.3
Deployment Method
Kustomize
Checks
- [X] This isn't a question or user support case (For Q&A and community support, go to Discussions).
- [X] I've read the Changelog before submitting this issue and I'm sure it's not due to any recently-introduced backward-incompatible changes
To Reproduce
Not sure how to reproduce. The bug happens intermittently.
Describe the bug
One of our AutoscalingRunnerSets will often think that runners exist when they do not (the rest of our groups have not been affected). At this point, the github action runs will be pending a runner, but no pod will spin up. When it gets stuck in this state, the only way I've been able to correct is deleting the AutoscalingRunnerSet and re-creating it. Deleting the controller & listener did not help.
Here's an example of the kubernetes commands ran to diagnose and fix (argoCD re-creates the deleted resource for me).
# see here that no backend-tests-group-* pods exist
> k get pods -n arc-runners | grep backend-test
# See here that it thinks that 2 runners exist
> k get autoscalingrunnersets -n arc-runners | grep backend-test
NAME MINIMUM RUNNERS MAXIMUM RUNNERS CURRENT RUNNERS STATE PENDING RUNNERS RUNNING RUNNERS FINISHED RUNNERS DELETING RUNNERS
backend-tests-group 1 180 2 0 0
> k delete -n arc-runners AutoscalingRunnerSet backend-tests-group
autoscalingrunnerset.actions.github.com "backend-tests-group" deleted
# Now it reflects the correct number of runners because it was able to create one
> k get autoscalingrunnersets -n arc-runners | grep backend-test
NAME MINIMUM RUNNERS MAXIMUM RUNNERS CURRENT RUNNERS STATE PENDING RUNNERS RUNNING RUNNERS FINISHED RUNNERS DELETING RUNNERS
backend-tests-group 1 180 1 1
> k get pods -n arc-runners | grep backend-test
backend-tests-group-5hq6w-runner-6jbqg 2/2 Running 0 32s
Describe the expected behavior
The AutoscalingRunnerSet should create pods when desired, and not think pods exist that do not. I'm curious if the pod tracking mechanism is a counter that increments/decrements rather than looking at source of truth (how many pods actually exist)
Additional Context
# backend-tests-group.yaml
---
# Source: gha-runner-scale-set/templates/autoscalingrunnerset.yaml
apiVersion: actions.github.com/v1alpha1
kind: AutoscalingRunnerSet
metadata:
name: backend-tests-group
namespace: arc-runners
labels:
app: github-actions-runner-backend-tests
app.kubernetes.io/component: autoscaling-runner-set
helm.sh/chart: gha-rs-0.9.3
app.kubernetes.io/name: backend-tests-group
app.kubernetes.io/instance: backend-tests-group
app.kubernetes.io/version: 0.9.3
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/part-of: gha-rs
actions.github.com/scale-set-name: backend-tests-group
actions.github.com/scale-set-namespace: arc-runners
annotations:
actions.github.com/values-hash: e2b17f77a09f6e007f4ddf7e68a0667e6616ed579405ac4ac32b05ab154b08f
actions.github.com/cleanup-manager-role-binding: arc-runners
actions.github.com/cleanup-manager-role-name: arc-runners
actions.github.com/cleanup-no-permission-service-account-name: arc-runners
spec:
githubConfigUrl: https://github.com/<redacted>
githubConfigSecret: github-actions-runner-secrets
runnerGroup: backend-tests-group
maxRunners: 180
minRunners: 1
template:
spec:
restartPolicy: Never
serviceAccountName: arc-runners
initContainers:
- name: init-dind-externals
image: <redacted-ecr-url>/github-actions-runner:latest
imagePullPolicy: Always
command: [cp]
args: [-r, -v, /home/runner/externals/., /home/runner/tmpDir/]
volumeMounts:
- name: dind-externals
mountPath: /home/runner/tmpDir
containers:
- name: runner
resources:
limits:
memory: 3000Mi
requests:
cpu: 1500m
memory: 3000Mi
command:
- /bin/sh
- -c
- /home/runner/bootstrap.sh && /home/runner/run.sh
image: <redacted-ecr-url>/github-actions-runner:latest
imagePullPolicy: Always
env:
- name: DOCKER_HOST
value: unix:///var/run/docker.sock
- name: RUNNER_WAIT_FOR_DOCKER_IN_SECONDS
value: '120'
- name: DISABLE_RUNNER_UPDATE
value: 'true'
volumeMounts:
- name: work
mountPath: /home/runner/_work
- name: dind-sock
mountPath: /var/run
- name: dind
image: <redacted-ecr-url>mirror:dind
args:
- dockerd
- --host=unix:///var/run/docker.sock
- --group=$(DOCKER_GROUP_GID)
env:
- name: DOCKER_GROUP_GID
value: '123'
securityContext:
privileged: true
volumeMounts:
- name: work
mountPath: /home/runner/_work
- name: dind-sock
mountPath: /var/run
- name: dind-externals
mountPath: /home/runner/externals
- name: aws-config
mountPath: /root/.aws/config
subPath: config
readOnly: true
volumes:
- name: aws-config
secret:
secretName: github-actions-runner-secrets
items:
- key: AWS_CONFIG
path: config
- name: dind-sock
emptyDir: {}
- name: dind-externals
emptyDir: {}
- name: work
emptyDir: {}
tolerations:
- key: purpose
operator: Equal
value: github-runner-backend-tests
effect: NoSchedule
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: purpose
operator: In
values:
- github-runner-backend-tests
- key: family
operator: In
values:
- c5a
- key: type
operator: In
values:
- spot
Controller Logs
Logs when the AutoscalingRunnerSet for the backend-tests group drifted from reality
2024-11-19 17:37:14.168 {"severity":"info","ts":"2024-11-20T00:37:14Z","logger":"listener-app.worker.kubernetesworker","message":"Calculated target runner count","assigned job":0,"decision":40,"min":40,"max":120,"currentRunnerCount":40,"jobsCompleted":0}
2024-11-19 17:37:14.168 {"severity":"info","ts":"2024-11-20T00:37:14Z","logger":"listener-app.worker.kubernetesworker","message":"Compare","original":"{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"replicas\":-1,\"patchID\":-1,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}","patch":"{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"replicas\":40,\"patchID\":0,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}"}
2024-11-19 17:37:14.168 {"severity":"info","ts":"2024-11-20T00:37:14Z","logger":"listener-app.worker.kubernetesworker","message":"Preparing EphemeralRunnerSet update","json":"{\"spec\":{\"patchID\":0,\"replicas\":40}}"}
2024-11-19 17:37:14.182 {"severity":"info","ts":"2024-11-20T00:37:14Z","logger":"listener-app.worker.kubernetesworker","message":"Ephemeral runner set scaled.","namespace":"arc-runners","name":"lightweight-group-b2g6j","replicas":40}
2024-11-19 17:37:14.182 {"severity":"info","ts":"2024-11-20T00:37:14Z","logger":"listener-app.listener","message":"Getting next message","lastMessageID":72017}
2024-11-19 17:37:28.935 {"severity":"info","ts":"2024-11-20T00:37:28Z","logger":"listener-app.worker.kubernetesworker","message":"Calculated target runner count","assigned job":0,"decision":4,"min":4,"max":40,"currentRunnerCount":4,"jobsCompleted":0}
2024-11-19 17:37:28.935 {"severity":"info","ts":"2024-11-20T00:37:28Z","logger":"listener-app.worker.kubernetesworker","message":"Compare","original":"{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"replicas\":-1,\"patchID\":-1,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}","patch":"{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"replicas\":4,\"patchID\":0,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}"}
2024-11-19 17:37:28.935 {"severity":"info","ts":"2024-11-20T00:37:28Z","logger":"listener-app.worker.kubernetesworker","message":"Preparing EphemeralRunnerSet update","json":"{\"spec\":{\"patchID\":0,\"replicas\":4}}"}
2024-11-19 17:37:28.945 {"severity":"info","ts":"2024-11-20T00:37:28Z","logger":"listener-app.worker.kubernetesworker","message":"Ephemeral runner set scaled.","namespace":"arc-runners","name":"group-2wd6h","replicas":4}
2024-11-19 17:37:28.945 {"severity":"info","ts":"2024-11-20T00:37:28Z","logger":"listener-app.listener","message":"Getting next message","lastMessageID":138900}
2024-11-19 17:37:40.930 {"severity":"info","ts":"2024-11-20T00:37:40Z","logger":"listener-app.worker.kubernetesworker","message":"Calculated target runner count","assigned job":0,"decision":0,"min":0,"max":20,"currentRunnerCount":0,"jobsCompleted":0}
2024-11-19 17:37:40.930 {"severity":"info","ts":"2024-11-20T00:37:40Z","logger":"listener-app.worker.kubernetesworker","message":"Compare","original":"{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"replicas\":-1,\"patchID\":-1,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}","patch":"{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"patchID\":0,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}"}
2024-11-19 17:37:40.930 {"severity":"info","ts":"2024-11-20T00:37:40Z","logger":"listener-app.worker.kubernetesworker","message":"Preparing EphemeralRunnerSet update","json":"{\"spec\":{\"patchID\":0,\"replicas\":null}}"}
2024-11-19 17:37:40.940 {"severity":"info","ts":"2024-11-20T00:37:40Z","logger":"listener-app.worker.kubernetesworker","message":"Ephemeral runner set scaled.","namespace":"arc-runners","name":"automation-group-l2754","replicas":0}
2024-11-19 17:37:40.940 {"severity":"info","ts":"2024-11-20T00:37:40Z","logger":"listener-app.listener","message":"Getting next message","lastMessageID":12610}
2024-11-19 17:37:46.282 {"severity":"info","ts":"2024-11-20T00:37:46Z","logger":"listener-app.worker.kubernetesworker","message":"Calculated target runner count","assigned job":0,"decision":1,"min":1,"max":180,"currentRunnerCount":1,"jobsCompleted":0}
2024-11-19 17:37:46.282 {"severity":"info","ts":"2024-11-20T00:37:46Z","logger":"listener-app.worker.kubernetesworker","message":"Compare","original":"{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"replicas\":-1,\"patchID\":-1,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}","patch":"{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"replicas\":1,\"patchID\":0,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}"}
2024-11-19 17:37:46.282 {"severity":"info","ts":"2024-11-20T00:37:46Z","logger":"listener-app.worker.kubernetesworker","message":"Preparing EphemeralRunnerSet update","json":"{\"spec\":{\"patchID\":0,\"replicas\":1}}"}
2024-11-19 17:37:46.292 {"severity":"info","ts":"2024-11-20T00:37:46Z","logger":"listener-app.worker.kubernetesworker","message":"Ephemeral runner set scaled.","namespace":"arc-runners","name":"backend-tests-group-tb758","replicas":1}
2024-11-19 17:37:46.292 {"severity":"info","ts":"2024-11-20T00:37:46Z","logger":"listener-app.listener","message":"Getting next message","lastMessageID":19216}
2024-11-19 17:37:47.083 {"severity":"info","ts":"2024-11-20T00:37:47Z","logger":"listener-app.worker.kubernetesworker","message":"Calculated target runner count","assigned job":0,"decision":0,"min":0,"max":40,"currentRunnerCount":0,"jobsCompleted":0}
2024-11-19 17:37:47.083 {"severity":"info","ts":"2024-11-20T00:37:47Z","logger":"listener-app.worker.kubernetesworker","message":"Compare","original":"{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"replicas\":-1,\"patchID\":-1,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}","patch":"{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"patchID\":0,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}"}
2024-11-19 17:37:47.083 {"severity":"info","ts":"2024-11-20T00:37:47Z","logger":"listener-app.worker.kubernetesworker","message":"Preparing EphemeralRunnerSet update","json":"{\"spec\":{\"patchID\":0,\"replicas\":null}}"}
2024-11-19 17:37:47.094 {"severity":"info","ts":"2024-11-20T00:37:47Z","logger":"listener-app.worker.kubernetesworker","message":"Ephemeral runner set scaled.","namespace":"arc-runners","name":"test-group-r6jw6","replicas":0}
2024-11-19 17:37:47.094 {"severity":"info","ts":"2024-11-20T00:37:47Z","logger":"listener-app.listener","message":"Getting next message","lastMessageID":0}
2024-11-19 17:38:04.281 {"severity":"info","ts":"2024-11-20T00:38:04Z","logger":"listener-app.worker.kubernetesworker","message":"Calculated target runner count","assigned job":0,"decision":40,"min":40,"max":120,"currentRunnerCount":40,"jobsCompleted":0}
2024-11-19 17:38:04.281 {"severity":"info","ts":"2024-11-20T00:38:04Z","logger":"listener-app.worker.kubernetesworker","message":"Compare","original":"{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"replicas\":-1,\"patchID\":-1,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}","patch":"{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"replicas\":40,\"patchID\":0,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}"}
2024-11-19 17:38:04.281 {"severity":"info","ts":"2024-11-20T00:38:04Z","logger":"listener-app.worker.kubernetesworker","message":"Preparing EphemeralRunnerSet update","json":"{\"spec\":{\"patchID\":0,\"replicas\":40}}"}
2024-11-19 17:38:04.292 {"severity":"info","ts":"2024-11-20T00:38:04Z","logger":"listener-app.worker.kubernetesworker","message":"Ephemeral runner set scaled.","namespace":"arc-runners","name":"lightweight-group-b2g6j","replicas":40}
2024-11-19 17:38:04.292 {"severity":"info","ts":"2024-11-20T00:38:04Z","logger":"listener-app.listener","message":"Getting next message","lastMessageID":72017}
2024-11-19 17:38:19.041 {"severity":"info","ts":"2024-11-20T00:38:19Z","logger":"listener-app.worker.kubernetesworker","message":"Calculated target runner count","assigned job":0,"decision":4,"min":4,"max":40,"currentRunnerCount":4,"jobsCompleted":0}
2024-11-19 17:38:19.041 {"severity":"info","ts":"2024-11-20T00:38:19Z","logger":"listener-app.worker.kubernetesworker","message":"Compare","original":"{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"replicas\":-1,\"patchID\":-1,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}","patch":"{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"replicas\":4,\"patchID\":0,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}"}
2024-11-19 17:38:19.041 {"severity":"info","ts":"2024-11-20T00:38:19Z","logger":"listener-app.worker.kubernetesworker","message":"Preparing EphemeralRunnerSet update","json":"{\"spec\":{\"patchID\":0,\"replicas\":4}}"}
2024-11-19 17:38:19.051 {"severity":"info","ts":"2024-11-20T00:38:19Z","logger":"listener-app.worker.kubernetesworker","message":"Ephemeral runner set scaled.","namespace":"arc-runners","name":"group-2wd6h","replicas":4}
2024-11-19 17:38:19.051 {"severity":"info","ts":"2024-11-20T00:38:19Z","logger":"listener-app.listener","message":"Getting next message","lastMessageID":138900}
2024-11-19 17:38:31.035 {"severity":"info","ts":"2024-11-20T00:38:31Z","logger":"listener-app.worker.kubernetesworker","message":"Calculated target runner count","assigned job":0,"decision":0,"min":0,"max":20,"currentRunnerCount":0,"jobsCompleted":0}
2024-11-19 17:38:31.035 {"severity":"info","ts":"2024-11-20T00:38:31Z","logger":"listener-app.worker.kubernetesworker","message":"Compare","original":"{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"replicas\":-1,\"patchID\":-1,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}","patch":"{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"patchID\":0,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}"}
2024-11-19 17:38:31.035 {"severity":"info","ts":"2024-11-20T00:38:31Z","logger":"listener-app.worker.kubernetesworker","message":"Preparing EphemeralRunnerSet update","json":"{\"spec\":{\"patchID\":0,\"replicas\":null}}"}
2024-11-19 17:38:31.045 {"severity":"info","ts":"2024-11-20T00:38:31Z","logger":"listener-app.worker.kubernetesworker","message":"Ephemeral runner set scaled.","namespace":"arc-runners","name":"automation-group-l2754","replicas":0}
2024-11-19 17:38:31.045 {"severity":"info","ts":"2024-11-20T00:38:31Z","logger":"listener-app.listener","message":"Getting next message","lastMessageID":12610}
2024-11-19 17:38:36.387 {"severity":"info","ts":"2024-11-20T00:38:36Z","logger":"listener-app.worker.kubernetesworker","message":"Calculated target runner count","assigned job":0,"decision":1,"min":1,"max":180,"currentRunnerCount":1,"jobsCompleted":0}
2024-11-19 17:38:36.387 {"severity":"info","ts":"2024-11-20T00:38:36Z","logger":"listener-app.worker.kubernetesworker","message":"Compare","original":"{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"replicas\":-1,\"patchID\":-1,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}","patch":"{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"replicas\":1,\"patchID\":0,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}"}
2024-11-19 17:38:36.387 {"severity":"info","ts":"2024-11-20T00:38:36Z","logger":"listener-app.worker.kubernetesworker","message":"Preparing EphemeralRunnerSet update","json":"{\"spec\":{\"patchID\":0,\"replicas\":1}}"}
2024-11-19 17:38:36.397 {"severity":"info","ts":"2024-11-20T00:38:36Z","logger":"listener-app.worker.kubernetesworker","message":"Ephemeral runner set scaled.","namespace":"arc-runners","name":"backend-tests-group-tb758","replicas":1}
2024-11-19 17:38:36.397 {"severity":"info","ts":"2024-11-20T00:38:36Z","logger":"listener-app.listener","message":"Getting next message","lastMessageID":19216}
2024-11-19 17:38:37.186 {"severity":"info","ts":"2024-11-20T00:38:37Z","logger":"listener-app.worker.kubernetesworker","message":"Calculated target runner count","assigned job":0,"decision":0,"min":0,"max":40,"currentRunnerCount":0,"jobsCompleted":0}
2024-11-19 17:38:37.186 {"severity":"info","ts":"2024-11-20T00:38:37Z","logger":"listener-app.worker.kubernetesworker","message":"Compare","original":"{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"replicas\":-1,\"patchID\":-1,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}","patch":"{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"patchID\":0,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}"}
2024-11-19 17:38:37.186 {"severity":"info","ts":"2024-11-20T00:38:37Z","logger":"listener-app.worker.kubernetesworker","message":"Preparing EphemeralRunnerSet update","json":"{\"spec\":{\"patchID\":0,\"replicas\":null}}"}
2024-11-19 17:38:37.196 {"severity":"info","ts":"2024-11-20T00:38:37Z","logger":"listener-app.worker.kubernetesworker","message":"Ephemeral runner set scaled.","namespace":"arc-runners","name":"test-group-r6jw6","replicas":0}
2024-11-19 17:38:37.196 {"severity":"info","ts":"2024-11-20T00:38:37Z","logger":"listener-app.listener","message":"Getting next message","lastMessageID":0}
2024-11-19 17:38:54.391 {"severity":"info","ts":"2024-11-20T00:38:54Z","logger":"listener-app.worker.kubernetesworker","message":"Calculated target runner count","assigned job":0,"decision":40,"min":40,"max":120,"currentRunnerCount":40,"jobsCompleted":0}
2024-11-19 17:38:54.392 {"severity":"info","ts":"2024-11-20T00:38:54Z","logger":"listener-app.worker.kubernetesworker","message":"Compare","original":"{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"replicas\":-1,\"patchID\":-1,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}","patch":"{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"replicas\":40,\"patchID\":0,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}"}
2024-11-19 17:38:54.392 {"severity":"info","ts":"2024-11-20T00:38:54Z","logger":"listener-app.worker.kubernetesworker","message":"Preparing EphemeralRunnerSet update","json":"{\"spec\":{\"patchID\":0,\"replicas\":40}}"}
2024-11-19 17:38:54.402 {"severity":"info","ts":"2024-11-20T00:38:54Z","logger":"listener-app.worker.kubernetesworker","message":"Ephemeral runner set scaled.","namespace":"arc-runners","name":"lightweight-group-b2g6j","replicas":40}
2024-11-19 17:38:54.402 {"severity":"info","ts":"2024-11-20T00:38:54Z","logger":"listener-app.listener","message":"Getting next message","lastMessageID":72017}
Logs from when it was fixed by deleting the AutoscalingRunnerSet and re-creating
Common labels: {"namespace":"arc-systems"}
Line limit: "5000 (71 displayed)"
Total bytes processed: "1.29 MB"
2024-11-19 18:24:26.831 {"severity":"info","ts":"2024-11-20T01:24:26Z","logger":"listener-app.worker.kubernetesworker","message":"Calculated target runner count","assigned job":0,"decision":0,"min":0,"max":20,"currentRunnerCount":0,"jobsCompleted":0}
2024-11-19 18:24:26.831 {"severity":"info","ts":"2024-11-20T01:24:26Z","logger":"listener-app.worker.kubernetesworker","message":"Compare","original":"{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"replicas\":-1,\"patchID\":-1,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}","patch":"{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"patchID\":0,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}"}
2024-11-19 18:24:26.831 {"severity":"info","ts":"2024-11-20T01:24:26Z","logger":"listener-app.worker.kubernetesworker","message":"Preparing EphemeralRunnerSet update","json":"{\"spec\":{\"patchID\":0,\"replicas\":null}}"}
2024-11-19 18:24:26.840 {"severity":"info","ts":"2024-11-20T01:24:26Z","logger":"listener-app.worker.kubernetesworker","message":"Ephemeral runner set scaled.","namespace":"arc-runners","name":"automation-group-l2754","replicas":0}
2024-11-19 18:24:26.840 {"severity":"info","ts":"2024-11-20T01:24:26Z","logger":"listener-app.listener","message":"Getting next message","lastMessageID":12610}
2024-11-19 18:24:31.443 {"severity":"info","ts":"2024-11-20T01:24:31Z","logger":"listener-app.worker.kubernetesworker","message":"Calculated target runner count","assigned job":0,"decision":2,"min":1,"max":180,"currentRunnerCount":2,"jobsCompleted":0}
2024-11-19 18:24:31.443 {"severity":"info","ts":"2024-11-20T01:24:31Z","logger":"listener-app.worker.kubernetesworker","message":"Compare","original":"{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"replicas\":-1,\"patchID\":-1,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}","patch":"{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"replicas\":2,\"patchID\":3,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}"}
2024-11-19 18:24:31.443 {"severity":"info","ts":"2024-11-20T01:24:31Z","logger":"listener-app.worker.kubernetesworker","message":"Preparing EphemeralRunnerSet update","json":"{\"spec\":{\"patchID\":3,\"replicas\":2}}"}
2024-11-19 18:24:31.455 {"severity":"info","ts":"2024-11-20T01:24:31Z","logger":"listener-app.worker.kubernetesworker","message":"Ephemeral runner set scaled.","namespace":"arc-runners","name":"backend-tests-group-tb758","replicas":2}
2024-11-19 18:24:31.455 {"severity":"info","ts":"2024-11-20T01:24:31Z","logger":"listener-app.listener","message":"Getting next message","lastMessageID":0}
2024-11-19 18:24:31.980 {"severity":"info","ts":"2024-11-20T01:24:31Z","logger":"listener-app.worker.kubernetesworker","message":"Calculated target runner count","assigned job":0,"decision":4,"min":4,"max":40,"currentRunnerCount":4,"jobsCompleted":0}
2024-11-19 18:24:31.980 {"severity":"info","ts":"2024-11-20T01:24:31Z","logger":"listener-app.worker.kubernetesworker","message":"Compare","original":"{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"replicas\":-1,\"patchID\":-1,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}","patch":"{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"replicas\":4,\"patchID\":0,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}"}
2024-11-19 18:24:31.980 {"severity":"info","ts":"2024-11-20T01:24:31Z","logger":"listener-app.worker.kubernetesworker","message":"Preparing EphemeralRunnerSet update","json":"{\"spec\":{\"patchID\":0,\"replicas\":4}}"}
2024-11-19 18:24:31.991 {"severity":"info","ts":"2024-11-20T01:24:31Z","logger":"listener-app.worker.kubernetesworker","message":"Ephemeral runner set scaled.","namespace":"arc-runners","name":"group-2wd6h","replicas":4}
2024-11-19 18:24:31.991 {"severity":"info","ts":"2024-11-20T01:24:31Z","logger":"listener-app.listener","message":"Getting next message","lastMessageID":138914}
2024-11-19 18:24:33.042 {"severity":"info","ts":"2024-11-20T01:24:33Z","logger":"listener-app.worker.kubernetesworker","message":"Calculated target runner count","assigned job":0,"decision":0,"min":0,"max":40,"currentRunnerCount":0,"jobsCompleted":0}
2024-11-19 18:24:33.042 {"severity":"info","ts":"2024-11-20T01:24:33Z","logger":"listener-app.worker.kubernetesworker","message":"Compare","original":"{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"replicas\":-1,\"patchID\":-1,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}","patch":"{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"patchID\":0,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}"}
2024-11-19 18:24:33.042 {"severity":"info","ts":"2024-11-20T01:24:33Z","logger":"listener-app.worker.kubernetesworker","message":"Preparing EphemeralRunnerSet update","json":"{\"spec\":{\"patchID\":0,\"replicas\":null}}"}
2024-11-19 18:24:33.051 {"severity":"info","ts":"2024-11-20T01:24:33Z","logger":"listener-app.worker.kubernetesworker","message":"Ephemeral runner set scaled.","namespace":"arc-runners","name":"test-group-r6jw6","replicas":0}
2024-11-19 18:24:33.051 {"severity":"info","ts":"2024-11-20T01:24:33Z","logger":"listener-app.listener","message":"Getting next message","lastMessageID":0}
2024-11-19 18:24:44.075 {"severity":"info","ts":"2024-11-20T01:24:44Z","logger":"listener-app.worker.kubernetesworker","message":"Calculated target runner count","assigned job":0,"decision":40,"min":40,"max":120,"currentRunnerCount":40,"jobsCompleted":0}
2024-11-19 18:24:44.076 {"severity":"info","ts":"2024-11-20T01:24:44Z","logger":"listener-app.worker.kubernetesworker","message":"Compare","original":"{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"replicas\":-1,\"patchID\":-1,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}","patch":"{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"replicas\":40,\"patchID\":0,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}"}
2024-11-19 18:24:44.076 {"severity":"info","ts":"2024-11-20T01:24:44Z","logger":"listener-app.worker.kubernetesworker","message":"Preparing EphemeralRunnerSet update","json":"{\"spec\":{\"patchID\":0,\"replicas\":40}}"}
2024-11-19 18:24:44.084 {"severity":"info","ts":"2024-11-20T01:24:44Z","logger":"listener-app.worker.kubernetesworker","message":"Ephemeral runner set scaled.","namespace":"arc-runners","name":"lightweight-group-b2g6j","replicas":40}
2024-11-19 18:24:44.084 {"severity":"info","ts":"2024-11-20T01:24:44Z","logger":"listener-app.listener","message":"Getting next message","lastMessageID":72029}
2024-11-19 18:24:53.000 name=backend-tests-group-754b578d-listener kind=Pod objectAPIversion=v1 objectRV=176800489 eventRV=176802092 reportinginstance=ip-10-11-15-220.us-west-2.compute.internal reportingcontroller=kubelet sourcecomponent=kubelet sourcehost=ip-10-11-15-220.us-west-2.compute.internal reason=Killing type=Normal count=1 msg="Stopping container listener"
2024-11-19 18:24:53.448 {"severity":"error","ts":"2024-11-20T01:24:53Z","logger":"listener-app","message":"Retryable client error","error":"Get \"https://pipelinesghubeus23.actions.githubusercontent.com/zib2we2z5kXEQnMTiAfTm9qQisOTuzKmjWWop27TyRThguaypt/_apis/runtime/runnerscalesets/201/messages?sessionId=6779845a-e127-4b87-b337-f3869797fb5a&api-version=6.0-preview\": context canceled","method":"GET","url":"https://pipelinesghubeus23.actions.githubusercontent.com/zib2we2z5kXEQnMTiAfTm9qQisOTuzKmjWWop27TyRThguaypt/_apis/runtime/runnerscalesets/201/messages?sessionId=6779845a-e127-4b87-b337-f3869797fb5a&api-version=6.0-preview","error":"request failed","stacktrace":"github.com/actions/actions-runner-controller/github/actions.(*clientLogger).Error\n\tgithub.com/actions/actions-runner-controller/github/actions/client.go:76\ngithub.com/hashicorp/go-retryablehttp.(*Client).Do\n\tgithub.com/hashicorp/[email protected]/client.go:718\ngithub.com/hashicorp/go-retryablehttp.(*RoundTripper).RoundTrip\n\tgithub.com/hashicorp/[email protected]/roundtripper.go:47\nnet/http.send\n\tnet/http/client.go:259\nnet/http.(*Client).send\n\tnet/http/client.go:180\nnet/http.(*Client).do\n\tnet/http/client.go:724\nnet/http.(*Client).Do\n\tnet/http/client.go:590\ngithub.com/actions/actions-runner-controller/github/actions.(*Client).Do\n\tgithub.com/actions/actions-runner-controller/github/actions/client.go:273\ngithub.com/actions/actions-runner-controller/github/actions.(*Client).GetMessage\n\tgithub.com/actions/actions-runner-controller/github/actions/client.go:577\ngithub.com/actions/actions-runner-controller/cmd/ghalistener/listener.(*Listener).getMessage\n\tgithub.com/actions/actions-runner-controller/cmd/ghalistener/listener/listener.go:272\ngithub.com/actions/actions-runner-controller/cmd/ghalistener/listener.(*Listener).Listen\n\tgithub.com/actions/actions-runner-controller/cmd/ghalistener/listener/listener.go:163\ngithub.com/actions/actions-runner-controller/cmd/ghalistener/app.(*App).Run.func1\n\tgithub.com/actions/actions-runner-controller/cmd/ghalistener/app/app.go:124\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\tgolang.org/x/[email protected]/errgroup/errgroup.go:78"}
2024-11-19 18:24:53.448 {"severity":"info","ts":"2024-11-20T01:24:53Z","logger":"listener-app.listener","message":"Deleting message session"}
2024-11-19 18:24:53.823 2024/11/20 01:24:53 Application returned an error: http: Server closed
2024-11-19 18:25:00.000 name=backend-tests-group-754b578d-listener kind=Pod objectAPIversion=v1 objectRV=176802206 eventRV=176802216 reportinginstance=ip-10-11-15-220.us-west-2.compute.internal reportingcontroller=kubelet sourcecomponent=kubelet sourcehost=ip-10-11-15-220.us-west-2.compute.internal reason=Started type=Normal count=1 msg="Started container listener"
2024-11-19 18:25:00.000 name=backend-tests-group-754b578d-listener kind=Pod objectAPIversion=v1 objectRV=176802206 eventRV=176802215 reportinginstance=ip-10-11-15-220.us-west-2.compute.internal reportingcontroller=kubelet sourcecomponent=kubelet sourcehost=ip-10-11-15-220.us-west-2.compute.internal reason=Created type=Normal count=1 msg="Created container listener"
2024-11-19 18:25:00.000 name=backend-tests-group-754b578d-listener kind=Pod objectAPIversion=v1 objectRV=176802206 eventRV=176802214 reportinginstance=ip-10-11-15-220.us-west-2.compute.internal reportingcontroller=kubelet sourcecomponent=kubelet sourcehost=ip-10-11-15-220.us-west-2.compute.internal reason=Pulled type=Normal count=1 msg="Container image \"ghcr.io/actions/gha-runner-scale-set-controller:0.9.3\" already present on machine"
2024-11-19 18:25:00.000 name=backend-tests-group-754b578d-listener kind=Pod objectAPIversion=v1 objectRV=176802205 eventRV=176802207 reportingcontroller=default-scheduler sourcecomponent=default-scheduler reason=Scheduled type=Normal count=1 msg="Successfully assigned arc-systems/backend-tests-group-754b578d-listener to ip-10-11-15-220.us-west-2.compute.internal"
2024-11-19 18:25:00.897 {"severity":"info","ts":"2024-11-20T01:25:00Z","logger":"listener-app","message":"app initialized"}
2024-11-19 18:25:00.897 {"severity":"info","ts":"2024-11-20T01:25:00Z","logger":"listener-app","message":"Starting listener"}
2024-11-19 18:25:00.898 {"severity":"info","ts":"2024-11-20T01:25:00Z","logger":"listener-app","message":"Starting metrics server"}
2024-11-19 18:25:00.898 {"severity":"info","ts":"2024-11-20T01:25:00Z","logger":"listener-app","message":"refreshing token","githubConfigUrl":"https://github.com/<redacted>"}
2024-11-19 18:25:00.900 {"severity":"info","ts":"2024-11-20T01:25:00Z","logger":"listener-app","message":"getting access token for GitHub App auth","accessTokenURL":"https://api.github.com/app/installations/30877269/access_tokens"}
2024-11-19 18:25:01.059 {"severity":"info","ts":"2024-11-20T01:25:01Z","logger":"listener-app","message":"getting runner registration token","registrationTokenURL":"https://api.github.com/orgs/<redacted>/actions/runners/registration-token"}
2024-11-19 18:25:01.227 {"severity":"info","ts":"2024-11-20T01:25:01Z","logger":"listener-app","message":"getting Actions tenant URL and JWT","registrationURL":"https://api.github.com/actions/runner-registration"}
2024-11-19 18:25:01.759 {"severity":"info","ts":"2024-11-20T01:25:01Z","logger":"listener-app.listener","message":"Current runner scale set statistics.","statistics":"{\"totalAvailableJobs\":0,\"totalAcquiredJobs\":0,\"totalAssignedJobs\":0,\"totalRunningJobs\":0,\"totalRegisteredRunners\":0,\"totalBusyRunners\":0,\"totalIdleRunners\":0}"}
2024-11-19 18:25:01.759 {"severity":"info","ts":"2024-11-20T01:25:01Z","logger":"listener-app.worker.kubernetesworker","message":"Calculated target runner count","assigned job":0,"decision":1,"min":1,"max":180,"currentRunnerCount":1,"jobsCompleted":0}
2024-11-19 18:25:01.763 {"severity":"info","ts":"2024-11-20T01:25:01Z","logger":"listener-app.worker.kubernetesworker","message":"Compare","original":"{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"replicas\":-1,\"patchID\":-1,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}","patch":"{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"replicas\":1,\"patchID\":0,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}"}
2024-11-19 18:25:01.763 {"severity":"info","ts":"2024-11-20T01:25:01Z","logger":"listener-app.worker.kubernetesworker","message":"Preparing EphemeralRunnerSet update","json":"{\"spec\":{\"patchID\":0,\"replicas\":1}}"}
2024-11-19 18:25:01.794 {"severity":"info","ts":"2024-11-20T01:25:01Z","logger":"listener-app.worker.kubernetesworker","message":"Ephemeral runner set scaled.","namespace":"arc-runners","name":"backend-tests-group-5hq6w","replicas":1}
2024-11-19 18:25:01.795 {"severity":"info","ts":"2024-11-20T01:25:01Z","logger":"listener-app.listener","message":"Getting next message","lastMessageID":0}
2024-11-19 18:25:16.950 {"severity":"info","ts":"2024-11-20T01:25:16Z","logger":"listener-app.worker.kubernetesworker","message":"Calculated target runner count","assigned job":0,"decision":0,"min":0,"max":20,"currentRunnerCount":0,"jobsCompleted":0}
2024-11-19 18:25:16.950 {"severity":"info","ts":"2024-11-20T01:25:16Z","logger":"listener-app.worker.kubernetesworker","message":"Compare","original":"{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"replicas\":-1,\"patchID\":-1,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}","patch":"{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"patchID\":0,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}"}
2024-11-19 18:25:16.950 {"severity":"info","ts":"2024-11-20T01:25:16Z","logger":"listener-app.worker.kubernetesworker","message":"Preparing EphemeralRunnerSet update","json":"{\"spec\":{\"patchID\":0,\"replicas\":null}}"}
2024-11-19 18:25:16.960 {"severity":"info","ts":"2024-11-20T01:25:16Z","logger":"listener-app.worker.kubernetesworker","message":"Ephemeral runner set scaled.","namespace":"arc-runners","name":"automation-group-l2754","replicas":0}
2024-11-19 18:25:16.960 {"severity":"info","ts":"2024-11-20T01:25:16Z","logger":"listener-app.listener","message":"Getting next message","lastMessageID":12610}
2024-11-19 18:25:22.087 {"severity":"info","ts":"2024-11-20T01:25:22Z","logger":"listener-app.worker.kubernetesworker","message":"Calculated target runner count","assigned job":0,"decision":4,"min":4,"max":40,"currentRunnerCount":4,"jobsCompleted":0}
2024-11-19 18:25:22.087 {"severity":"info","ts":"2024-11-20T01:25:22Z","logger":"listener-app.worker.kubernetesworker","message":"Compare","original":"{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"replicas\":-1,\"patchID\":-1,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}","patch":"{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"replicas\":4,\"patchID\":0,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}"}
2024-11-19 18:25:22.087 {"severity":"info","ts":"2024-11-20T01:25:22Z","logger":"listener-app.worker.kubernetesworker","message":"Preparing EphemeralRunnerSet update","json":"{\"spec\":{\"patchID\":0,\"replicas\":4}}"}
2024-11-19 18:25:22.096 {"severity":"info","ts":"2024-11-20T01:25:22Z","logger":"listener-app.worker.kubernetesworker","message":"Ephemeral runner set scaled.","namespace":"arc-runners","name":"group-2wd6h","replicas":4}
2024-11-19 18:25:22.096 {"severity":"info","ts":"2024-11-20T01:25:22Z","logger":"listener-app.listener","message":"Getting next message","lastMessageID":138914}
2024-11-19 18:25:23.145 {"severity":"info","ts":"2024-11-20T01:25:23Z","logger":"listener-app.worker.kubernetesworker","message":"Calculated target runner count","assigned job":0,"decision":0,"min":0,"max":40,"currentRunnerCount":0,"jobsCompleted":0}
2024-11-19 18:25:23.145 {"severity":"info","ts":"2024-11-20T01:25:23Z","logger":"listener-app.worker.kubernetesworker","message":"Compare","original":"{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"replicas\":-1,\"patchID\":-1,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}","patch":"{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"patchID\":0,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}"}
2024-11-19 18:25:23.145 {"severity":"info","ts":"2024-11-20T01:25:23Z","logger":"listener-app.worker.kubernetesworker","message":"Preparing EphemeralRunnerSet update","json":"{\"spec\":{\"patchID\":0,\"replicas\":null}}"}
2024-11-19 18:25:23.158 {"severity":"info","ts":"2024-11-20T01:25:23Z","logger":"listener-app.worker.kubernetesworker","message":"Ephemeral runner set scaled.","namespace":"arc-runners","name":"test-group-r6jw6","replicas":0}
2024-11-19 18:25:23.158 {"severity":"info","ts":"2024-11-20T01:25:23Z","logger":"listener-app.listener","message":"Getting next message","lastMessageID":0}
2024-11-19 18:25:34.193 {"severity":"info","ts":"2024-11-20T01:25:34Z","logger":"listener-app.worker.kubernetesworker","message":"Calculated target runner count","assigned job":0,"decision":40,"min":40,"max":120,"currentRunnerCount":40,"jobsCompleted":0}
2024-11-19 18:25:34.193 {"severity":"info","ts":"2024-11-20T01:25:34Z","logger":"listener-app.worker.kubernetesworker","message":"Compare","original":"{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"replicas\":-1,\"patchID\":-1,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}","patch":"{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"replicas\":40,\"patchID\":0,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}"}
2024-11-19 18:25:34.193 {"severity":"info","ts":"2024-11-20T01:25:34Z","logger":"listener-app.worker.kubernetesworker","message":"Preparing EphemeralRunnerSet update","json":"{\"spec\":{\"patchID\":0,\"replicas\":40}}"}
2024-11-19 18:25:34.203 {"severity":"info","ts":"2024-11-20T01:25:34Z","logger":"listener-app.worker.kubernetesworker","message":"Ephemeral runner set scaled.","namespace":"arc-runners","name":"lightweight-group-b2g6j","replicas":40}
2024-11-19 18:25:34.203 {"severity":"info","ts":"2024-11-20T01:25:34Z","logger":"listener-app.listener","message":"Getting next message","lastMessageID":72029}
2024-11-19 18:25:51.886 {"severity":"info","ts":"2024-11-20T01:25:51Z","logger":"listener-app.worker.kubernetesworker","message":"Calculated target runner count","assigned job":0,"decision":1,"min":1,"max":180,"currentRunnerCount":1,"jobsCompleted":0}
2024-11-19 18:25:51.886 {"severity":"info","ts":"2024-11-20T01:25:51Z","logger":"listener-app.worker.kubernetesworker","message":"Compare","original":"{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"replicas\":-1,\"patchID\":-1,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}","patch":"{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"replicas\":1,\"patchID\":0,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}"}
2024-11-19 18:25:51.886 {"severity":"info","ts":"2024-11-20T01:25:51Z","logger":"listener-app.worker.kubernetesworker","message":"Preparing EphemeralRunnerSet update","json":"{\"spec\":{\"patchID\":0,\"replicas\":1}}"}
2024-11-19 18:25:51.896 {"severity":"info","ts":"2024-11-20T01:25:51Z","logger":"listener-app.worker.kubernetesworker","message":"Ephemeral runner set scaled.","namespace":"arc-runners","name":"backend-tests-group-5hq6w","replicas":1}
2024-11-19 18:25:51.896 {"severity":"info","ts":"2024-11-20T01:25:51Z","logger":"listener-app.listener","message":"Getting next message","lastMessageID":0}
### Runner Pod Logs
```shell
There weren't runner pods - which was the problem
Hello! Thank you for filing an issue.
The maintainers will triage your issue shortly.
In the meantime, please take a look at the troubleshooting guide for bug reports.
If this is a feature request, please review our contribution guidelines.
Having the same issue, wondering if you got this fixed by reverting to the previous version
Yesterday solved same issue with 0.9.2. Had to delete custom resource with status failed , then it all returned to normal functioning
I also tried fully deleting everything and creating it again. Some of my node pools are working, and most of them are not.
This seems to be related to something on the Github Side, but I can't confirm what's going on.
I had the same issue.
I tried recreating the resources, including the CRD, but that didn't work. However, I found that giving a new runnerScaleSetName to the resource did the trick.
I had the same issue. I tried recreating the resources, including the CRD, but that didn't work. However, I found that giving a new runnerScaleSetName to the resource did the trick.
Same issue from my side, deployment using helm. I resolved it re-deploying (delete&apply) of EphemeralRunnerSet using the same name that it was before.
Logs from orchestrator pod (in loop, before issue was solved, pods of runners were not running at that time):
025-02-27T17:06:08Z INFO EphemeralRunner EphemeralRunner has already finished. Stopping reconciliation and waiting for EphemeralRunnerSet to clean it up {"version": "0.10.1", "ephemeralrunner": {"name":"xxxx-xxxx-yyy-runner-yyy","namespace":"namespace"}, "phase": "Failed"}
Similar to this issue: https://github.com/actions/actions-runner-controller/discussions/3402
I'm currently having this issue, running 0.10.1. When I do a fresh deploy, all shows up as expected, including my min 2 runners. then after they have been used a few times, 0 pods and the autoscalingrunnerset shows:
Status:
Current Runners: 2
Failed Ephemeral Runners: 0
Pending Ephemeral Runners: 0
Running Ephemeral Runners: 2
Events: <none>
No pods show up at all when running
k get all,autoscalingrunnerset,ephemeralrunnerset -n namespace
NAME MINIMUM RUNNERS MAXIMUM RUNNERS CURRENT RUNNERS STATE PENDING RUNNERS RUNNING RUNNERS FINISHED RUNNERS DELETING RUNNERS
autoscalingrunnerset.actions.github.com/runner00 2 6 2 0 2
autoscalingrunnerset.actions.github.com/runner01 2 6 2 0 2
NAME DESIREDREPLICAS CURRENTREPLICAS PENDING RUNNERS RUNNING RUNNERS FINISHED RUNNERS DELETING RUNNERS
ephemeralrunnerset.actions.github.com/runner00-6vncx 2 2 0 0
ephemeralrunnerset.actions.github.com/runner01-9xrzx 2 2 0 0
The listener runner is still running in the arc-systems namespace. Just no pods running in the runnerset namespace.
Update
I've just noticed the below in the ephemeralrunnerset
status:
currentReplicas: 2
failedEphemeralRunners: 2
pendingEphemeralRunners: 0
runningEphemeralRunners: 0
Found the logs:
2025-03-06T14:46:30Z ERROR EphemeralRunner Failed to clean up runner from the service (not an ActionsError) {"version": "0.10.1", "ephemeralrunner": {"name":"runner01-fnbht-runner-jnq4d","namespace":"runner01-set"}, "error": "failed to remove runner from the service: failed to get runner registration token on refresh: github api error: StatusCode 403, RequestID \"----------------------------\": {\"message\":\"Resource not accessible by integration\",\"documentation_url\":\"https://docs.github.com/rest/actions/self-hosted-runners#c │
github.com/actions/actions-runner-controller/controllers/actions%2egithub%2ecom.(*EphemeralRunnerReconciler).cleanupRunnerFromService
github.com/actions/actions-runner-controller/controllers/actions.github.com/ephemeralrunner_controller.go:313
github.com/actions/actions-runner-controller/controllers/actions%2egithub%2ecom.(*EphemeralRunnerReconciler).Reconcile
github.com/actions/actions-runner-controller/controllers/actions.github.com/ephemeralrunner_controller.go:97
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile
sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:119
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:316
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:266
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:227
I was nodding along with the original filer, but some more digging including looking at the conversation in https://github.com/actions/actions-runner-controller/issues/3619
clued me in to the fact that number of "current runners" is not going to match number of pods. It's counting the number of ephemeralrunner resources, which persist even when pods die, and if you describe them may have a message like "Pod has failed to start more than 5 times". In my case the container was dying immediately on startup.
Not saying that's what's happening in your case, but at least for me the seemingly stale runner count was due to this intentional ephemeralrunner resource persistence. Understanding that layer of indirection between the runnerset and pods was helpful in diagnosing.
I was nodding along with the original filer, but some more digging including looking at the conversation in #3619 clued me in to the fact that number of "current runners" is not going to match number of pods. It's counting the number of
ephemeralrunnerresources, which persist even when pods die, and if you describe them may have a message like "Pod has failed to start more than 5 times". In my case the container was dying immediately on startup.Not saying that's what's happening in your case, but at least for me the seemingly stale runner count was due to this intentional ephemeralrunner resource persistence. Understanding that layer of indirection between the runnerset and pods was helpful in diagnosing.
Much appreciated, this helped a whole lot.
I was also having a similar issue. In my case, pods will very occasionally fail on startup, leaving the controller in the state where it thinks a runner exists and it doesn't. I ended up creating a little script running in a deployment to watch the runner namespace and delete these failed pods. This then allows the controller to recreate the ephemeral runner:
NAMESPACE="arc-runners"
# This is pretty inefficient because it hammers the pods endpoint on every event, but this isn't a super active namespace anyway
delete_failed_pods() {
FAILED_PODS=$(kubectl get pods --namespace=$NAMESPACE --field-selector=status.phase=Failed -o jsonpath='{.items[*].metadata.name}')
for POD in $FAILED_PODS; do
echo "Deleting failed pod: $POD"
kubectl get pod $POD --namespace=$NAMESPACE
kubectl delete pod $POD --namespace=$NAMESPACE
done
}
echo "Checking permissions - Can we get events in $NAMESPACE?"
kubectl auth can-i get events --namespace=$NAMESPACE
echo "Watching for failed pods in $NAMESPACE"
kubectl get events --namespace=$NAMESPACE --watch-only | while read -r event; do
delete_failed_pods
done
Bit hacky, but does the job.
Running into a similar issue. Does the job that was scheduled on the failed runner get "lost"? If failed ephemeralrunners are cleaned up quickly, will GH reassign that job to a new runner?
I was also having a similar issue. In my case, pods will very occasionally fail on startup, leaving the controller in the state where it thinks a runner exists and it doesn't. I ended up creating a little script running in a deployment to watch the runner namespace and delete these failed pods. This then allows the controller to recreate the ephemeral runner:
NAMESPACE="arc-runners"
This is pretty inefficient because it hammers the pods endpoint on every event, but this isn't a super active namespace anyway
delete_failed_pods() { FAILED_PODS=$(kubectl get pods --namespace=$NAMESPACE --field-selector=status.phase=Failed -o jsonpath='{.items[*].metadata.name}') for POD in $FAILED_PODS; do echo "Deleting failed pod: $POD" kubectl get pod $POD --namespace=$NAMESPACE kubectl delete pod $POD --namespace=$NAMESPACE done }
echo "Checking permissions - Can we get events in $NAMESPACE?" kubectl auth can-i get events --namespace=$NAMESPACE
echo "Watching for failed pods in $NAMESPACE"
kubectl get events --namespace=$NAMESPACE --watch-only | while read -r event; do delete_failed_pods done Bit hacky, but does the job.
I too have had to write something, as I am seeing pods stuck terminating, however their status.phase reports as Running. So now I am looking for long running pods as most of ours, currently, don't run as long as 15min+
Running into a similar issue. Does the job that was scheduled on the failed runner get "lost"? If failed ephemeralrunners are cleaned up quickly, will GH reassign that job to a new runner?
In my experience, the runner fails before being able to pick up the job, so it's simply picked up by the next successfully-booted runner with no events lost.
This is happening for cluster I have as well. Controller logs "running": 4 now, while only 2 runner pods are in cluster.
Note: this did not happen with the previous community-developed version (summerwind).
UPD: restarting controller helped - it has created the missing pods, and briefly during start seemed to notice that there was only 2 running pods :
running": 2, "finished": 0, "failed": 0, "deleting": 0
Happening here too on version 0.11.10
CURRENT RUNNERS: 7
PENDING RUNNERS 7
Resource ephemeralrunners.actions.github.com shows all 7 pods with Failed. Had to delete them manually for them to kick start other jobs