actions-runner-controller icon indicating copy to clipboard operation
actions-runner-controller copied to clipboard

AutoscalingRunnerSet gets stuck thinking runners exist when they do not

Open koreyGambill opened this issue 1 year ago • 15 comments

Checks

  • [X] I've already read https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners-with-actions-runner-controller/troubleshooting-actions-runner-controller-errors and I'm sure my issue is not covered in the troubleshooting guide.
  • [X] I am using charts that are officially provided

Controller Version

0.9.3

Deployment Method

Kustomize

Checks

  • [X] This isn't a question or user support case (For Q&A and community support, go to Discussions).
  • [X] I've read the Changelog before submitting this issue and I'm sure it's not due to any recently-introduced backward-incompatible changes

To Reproduce

Not sure how to reproduce. The bug happens intermittently.

Describe the bug

One of our AutoscalingRunnerSets will often think that runners exist when they do not (the rest of our groups have not been affected). At this point, the github action runs will be pending a runner, but no pod will spin up. When it gets stuck in this state, the only way I've been able to correct is deleting the AutoscalingRunnerSet and re-creating it. Deleting the controller & listener did not help.

Here's an example of the kubernetes commands ran to diagnose and fix (argoCD re-creates the deleted resource for me).

# see here that no backend-tests-group-* pods exist
> k get pods -n arc-runners | grep backend-test

# See here that it thinks that 2 runners exist
> k get autoscalingrunnersets -n arc-runners | grep backend-test
NAME                               MINIMUM RUNNERS   MAXIMUM RUNNERS   CURRENT RUNNERS   STATE   PENDING RUNNERS   RUNNING RUNNERS   FINISHED RUNNERS   DELETING RUNNERS
backend-tests-group   1                 180               2                         0                 0


> k delete -n arc-runners AutoscalingRunnerSet backend-tests-group
autoscalingrunnerset.actions.github.com "backend-tests-group" deleted

# Now it reflects the correct number of runners because it was able to create one
> k get autoscalingrunnersets -n arc-runners | grep backend-test
NAME                               MINIMUM RUNNERS   MAXIMUM RUNNERS   CURRENT RUNNERS   STATE   PENDING RUNNERS   RUNNING RUNNERS   FINISHED RUNNERS   DELETING RUNNERS
backend-tests-group   1                 180               1                         1


> k get pods -n arc-runners | grep backend-test
backend-tests-group-5hq6w-runner-6jbqg               2/2     Running   0          32s

Describe the expected behavior

The AutoscalingRunnerSet should create pods when desired, and not think pods exist that do not. I'm curious if the pod tracking mechanism is a counter that increments/decrements rather than looking at source of truth (how many pods actually exist)

Additional Context

# backend-tests-group.yaml
---
# Source: gha-runner-scale-set/templates/autoscalingrunnerset.yaml
apiVersion: actions.github.com/v1alpha1
kind: AutoscalingRunnerSet
metadata:
  name: backend-tests-group
  namespace: arc-runners
  labels:
    app: github-actions-runner-backend-tests
    app.kubernetes.io/component: autoscaling-runner-set
    helm.sh/chart: gha-rs-0.9.3
    app.kubernetes.io/name: backend-tests-group
    app.kubernetes.io/instance: backend-tests-group
    app.kubernetes.io/version: 0.9.3
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/part-of: gha-rs
    actions.github.com/scale-set-name: backend-tests-group
    actions.github.com/scale-set-namespace: arc-runners
  annotations:
    actions.github.com/values-hash: e2b17f77a09f6e007f4ddf7e68a0667e6616ed579405ac4ac32b05ab154b08f
    actions.github.com/cleanup-manager-role-binding: arc-runners
    actions.github.com/cleanup-manager-role-name: arc-runners
    actions.github.com/cleanup-no-permission-service-account-name: arc-runners
spec:
  githubConfigUrl: https://github.com/<redacted>
  githubConfigSecret: github-actions-runner-secrets
  runnerGroup: backend-tests-group
  maxRunners: 180
  minRunners: 1

  template:
    spec:
      restartPolicy: Never
      serviceAccountName: arc-runners
      initContainers:
        - name: init-dind-externals

          image: <redacted-ecr-url>/github-actions-runner:latest
          imagePullPolicy: Always
          command: [cp]
          args: [-r, -v, /home/runner/externals/., /home/runner/tmpDir/]
          volumeMounts:
            - name: dind-externals
              mountPath: /home/runner/tmpDir
      containers:
        - name: runner
          resources:
            limits:
              memory: 3000Mi
            requests:
              cpu: 1500m
              memory: 3000Mi

          command:
            - /bin/sh
            - -c
            - /home/runner/bootstrap.sh && /home/runner/run.sh

          image: <redacted-ecr-url>/github-actions-runner:latest
          imagePullPolicy: Always
          env:
            - name: DOCKER_HOST
              value: unix:///var/run/docker.sock
            - name: RUNNER_WAIT_FOR_DOCKER_IN_SECONDS
              value: '120'
            - name: DISABLE_RUNNER_UPDATE
              value: 'true'
          volumeMounts:
            - name: work
              mountPath: /home/runner/_work
            - name: dind-sock
              mountPath: /var/run
        - name: dind
          image: <redacted-ecr-url>mirror:dind
          args:
            - dockerd
            - --host=unix:///var/run/docker.sock
            - --group=$(DOCKER_GROUP_GID)
          env:
            - name: DOCKER_GROUP_GID
              value: '123'
          securityContext:
            privileged: true
          volumeMounts:
            - name: work
              mountPath: /home/runner/_work
            - name: dind-sock
              mountPath: /var/run
            - name: dind-externals
              mountPath: /home/runner/externals
            - name: aws-config
              mountPath: /root/.aws/config
              subPath: config
              readOnly: true


      volumes:
        - name: aws-config
          secret:
            secretName: github-actions-runner-secrets
            items:
              - key: AWS_CONFIG
                path: config

        - name: dind-sock
          emptyDir: {}
        - name: dind-externals
          emptyDir: {}

        - name: work
          emptyDir: {}

      tolerations:
        - key: purpose
          operator: Equal
          value: github-runner-backend-tests
          effect: NoSchedule
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                  - key: purpose
                    operator: In
                    values:
                      - github-runner-backend-tests
                  - key: family
                    operator: In
                    values:
                      - c5a
                  - key: type
                    operator: In
                    values:
                      - spot

Controller Logs

Logs when the AutoscalingRunnerSet for the backend-tests group drifted from reality

2024-11-19 17:37:14.168	{"severity":"info","ts":"2024-11-20T00:37:14Z","logger":"listener-app.worker.kubernetesworker","message":"Calculated target runner count","assigned job":0,"decision":40,"min":40,"max":120,"currentRunnerCount":40,"jobsCompleted":0}
2024-11-19 17:37:14.168	{"severity":"info","ts":"2024-11-20T00:37:14Z","logger":"listener-app.worker.kubernetesworker","message":"Compare","original":"{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"replicas\":-1,\"patchID\":-1,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}","patch":"{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"replicas\":40,\"patchID\":0,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}"}
2024-11-19 17:37:14.168	{"severity":"info","ts":"2024-11-20T00:37:14Z","logger":"listener-app.worker.kubernetesworker","message":"Preparing EphemeralRunnerSet update","json":"{\"spec\":{\"patchID\":0,\"replicas\":40}}"}
2024-11-19 17:37:14.182	{"severity":"info","ts":"2024-11-20T00:37:14Z","logger":"listener-app.worker.kubernetesworker","message":"Ephemeral runner set scaled.","namespace":"arc-runners","name":"lightweight-group-b2g6j","replicas":40}
2024-11-19 17:37:14.182	{"severity":"info","ts":"2024-11-20T00:37:14Z","logger":"listener-app.listener","message":"Getting next message","lastMessageID":72017}
2024-11-19 17:37:28.935	{"severity":"info","ts":"2024-11-20T00:37:28Z","logger":"listener-app.worker.kubernetesworker","message":"Calculated target runner count","assigned job":0,"decision":4,"min":4,"max":40,"currentRunnerCount":4,"jobsCompleted":0}
2024-11-19 17:37:28.935	{"severity":"info","ts":"2024-11-20T00:37:28Z","logger":"listener-app.worker.kubernetesworker","message":"Compare","original":"{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"replicas\":-1,\"patchID\":-1,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}","patch":"{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"replicas\":4,\"patchID\":0,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}"}
2024-11-19 17:37:28.935	{"severity":"info","ts":"2024-11-20T00:37:28Z","logger":"listener-app.worker.kubernetesworker","message":"Preparing EphemeralRunnerSet update","json":"{\"spec\":{\"patchID\":0,\"replicas\":4}}"}
2024-11-19 17:37:28.945	{"severity":"info","ts":"2024-11-20T00:37:28Z","logger":"listener-app.worker.kubernetesworker","message":"Ephemeral runner set scaled.","namespace":"arc-runners","name":"group-2wd6h","replicas":4}
2024-11-19 17:37:28.945	{"severity":"info","ts":"2024-11-20T00:37:28Z","logger":"listener-app.listener","message":"Getting next message","lastMessageID":138900}
2024-11-19 17:37:40.930	{"severity":"info","ts":"2024-11-20T00:37:40Z","logger":"listener-app.worker.kubernetesworker","message":"Calculated target runner count","assigned job":0,"decision":0,"min":0,"max":20,"currentRunnerCount":0,"jobsCompleted":0}
2024-11-19 17:37:40.930	{"severity":"info","ts":"2024-11-20T00:37:40Z","logger":"listener-app.worker.kubernetesworker","message":"Compare","original":"{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"replicas\":-1,\"patchID\":-1,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}","patch":"{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"patchID\":0,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}"}
2024-11-19 17:37:40.930	{"severity":"info","ts":"2024-11-20T00:37:40Z","logger":"listener-app.worker.kubernetesworker","message":"Preparing EphemeralRunnerSet update","json":"{\"spec\":{\"patchID\":0,\"replicas\":null}}"}
2024-11-19 17:37:40.940	{"severity":"info","ts":"2024-11-20T00:37:40Z","logger":"listener-app.worker.kubernetesworker","message":"Ephemeral runner set scaled.","namespace":"arc-runners","name":"automation-group-l2754","replicas":0}
2024-11-19 17:37:40.940	{"severity":"info","ts":"2024-11-20T00:37:40Z","logger":"listener-app.listener","message":"Getting next message","lastMessageID":12610}
2024-11-19 17:37:46.282	{"severity":"info","ts":"2024-11-20T00:37:46Z","logger":"listener-app.worker.kubernetesworker","message":"Calculated target runner count","assigned job":0,"decision":1,"min":1,"max":180,"currentRunnerCount":1,"jobsCompleted":0}
2024-11-19 17:37:46.282	{"severity":"info","ts":"2024-11-20T00:37:46Z","logger":"listener-app.worker.kubernetesworker","message":"Compare","original":"{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"replicas\":-1,\"patchID\":-1,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}","patch":"{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"replicas\":1,\"patchID\":0,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}"}
2024-11-19 17:37:46.282	{"severity":"info","ts":"2024-11-20T00:37:46Z","logger":"listener-app.worker.kubernetesworker","message":"Preparing EphemeralRunnerSet update","json":"{\"spec\":{\"patchID\":0,\"replicas\":1}}"}
2024-11-19 17:37:46.292	{"severity":"info","ts":"2024-11-20T00:37:46Z","logger":"listener-app.worker.kubernetesworker","message":"Ephemeral runner set scaled.","namespace":"arc-runners","name":"backend-tests-group-tb758","replicas":1}
2024-11-19 17:37:46.292	{"severity":"info","ts":"2024-11-20T00:37:46Z","logger":"listener-app.listener","message":"Getting next message","lastMessageID":19216}
2024-11-19 17:37:47.083	{"severity":"info","ts":"2024-11-20T00:37:47Z","logger":"listener-app.worker.kubernetesworker","message":"Calculated target runner count","assigned job":0,"decision":0,"min":0,"max":40,"currentRunnerCount":0,"jobsCompleted":0}
2024-11-19 17:37:47.083	{"severity":"info","ts":"2024-11-20T00:37:47Z","logger":"listener-app.worker.kubernetesworker","message":"Compare","original":"{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"replicas\":-1,\"patchID\":-1,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}","patch":"{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"patchID\":0,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}"}
2024-11-19 17:37:47.083	{"severity":"info","ts":"2024-11-20T00:37:47Z","logger":"listener-app.worker.kubernetesworker","message":"Preparing EphemeralRunnerSet update","json":"{\"spec\":{\"patchID\":0,\"replicas\":null}}"}
2024-11-19 17:37:47.094	{"severity":"info","ts":"2024-11-20T00:37:47Z","logger":"listener-app.worker.kubernetesworker","message":"Ephemeral runner set scaled.","namespace":"arc-runners","name":"test-group-r6jw6","replicas":0}
2024-11-19 17:37:47.094	{"severity":"info","ts":"2024-11-20T00:37:47Z","logger":"listener-app.listener","message":"Getting next message","lastMessageID":0}
2024-11-19 17:38:04.281	{"severity":"info","ts":"2024-11-20T00:38:04Z","logger":"listener-app.worker.kubernetesworker","message":"Calculated target runner count","assigned job":0,"decision":40,"min":40,"max":120,"currentRunnerCount":40,"jobsCompleted":0}
2024-11-19 17:38:04.281	{"severity":"info","ts":"2024-11-20T00:38:04Z","logger":"listener-app.worker.kubernetesworker","message":"Compare","original":"{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"replicas\":-1,\"patchID\":-1,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}","patch":"{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"replicas\":40,\"patchID\":0,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}"}
2024-11-19 17:38:04.281	{"severity":"info","ts":"2024-11-20T00:38:04Z","logger":"listener-app.worker.kubernetesworker","message":"Preparing EphemeralRunnerSet update","json":"{\"spec\":{\"patchID\":0,\"replicas\":40}}"}
2024-11-19 17:38:04.292	{"severity":"info","ts":"2024-11-20T00:38:04Z","logger":"listener-app.worker.kubernetesworker","message":"Ephemeral runner set scaled.","namespace":"arc-runners","name":"lightweight-group-b2g6j","replicas":40}
2024-11-19 17:38:04.292	{"severity":"info","ts":"2024-11-20T00:38:04Z","logger":"listener-app.listener","message":"Getting next message","lastMessageID":72017}
2024-11-19 17:38:19.041	{"severity":"info","ts":"2024-11-20T00:38:19Z","logger":"listener-app.worker.kubernetesworker","message":"Calculated target runner count","assigned job":0,"decision":4,"min":4,"max":40,"currentRunnerCount":4,"jobsCompleted":0}
2024-11-19 17:38:19.041	{"severity":"info","ts":"2024-11-20T00:38:19Z","logger":"listener-app.worker.kubernetesworker","message":"Compare","original":"{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"replicas\":-1,\"patchID\":-1,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}","patch":"{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"replicas\":4,\"patchID\":0,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}"}
2024-11-19 17:38:19.041	{"severity":"info","ts":"2024-11-20T00:38:19Z","logger":"listener-app.worker.kubernetesworker","message":"Preparing EphemeralRunnerSet update","json":"{\"spec\":{\"patchID\":0,\"replicas\":4}}"}
2024-11-19 17:38:19.051	{"severity":"info","ts":"2024-11-20T00:38:19Z","logger":"listener-app.worker.kubernetesworker","message":"Ephemeral runner set scaled.","namespace":"arc-runners","name":"group-2wd6h","replicas":4}
2024-11-19 17:38:19.051	{"severity":"info","ts":"2024-11-20T00:38:19Z","logger":"listener-app.listener","message":"Getting next message","lastMessageID":138900}
2024-11-19 17:38:31.035	{"severity":"info","ts":"2024-11-20T00:38:31Z","logger":"listener-app.worker.kubernetesworker","message":"Calculated target runner count","assigned job":0,"decision":0,"min":0,"max":20,"currentRunnerCount":0,"jobsCompleted":0}
2024-11-19 17:38:31.035	{"severity":"info","ts":"2024-11-20T00:38:31Z","logger":"listener-app.worker.kubernetesworker","message":"Compare","original":"{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"replicas\":-1,\"patchID\":-1,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}","patch":"{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"patchID\":0,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}"}
2024-11-19 17:38:31.035	{"severity":"info","ts":"2024-11-20T00:38:31Z","logger":"listener-app.worker.kubernetesworker","message":"Preparing EphemeralRunnerSet update","json":"{\"spec\":{\"patchID\":0,\"replicas\":null}}"}
2024-11-19 17:38:31.045	{"severity":"info","ts":"2024-11-20T00:38:31Z","logger":"listener-app.worker.kubernetesworker","message":"Ephemeral runner set scaled.","namespace":"arc-runners","name":"automation-group-l2754","replicas":0}
2024-11-19 17:38:31.045	{"severity":"info","ts":"2024-11-20T00:38:31Z","logger":"listener-app.listener","message":"Getting next message","lastMessageID":12610}
2024-11-19 17:38:36.387	{"severity":"info","ts":"2024-11-20T00:38:36Z","logger":"listener-app.worker.kubernetesworker","message":"Calculated target runner count","assigned job":0,"decision":1,"min":1,"max":180,"currentRunnerCount":1,"jobsCompleted":0}
2024-11-19 17:38:36.387	{"severity":"info","ts":"2024-11-20T00:38:36Z","logger":"listener-app.worker.kubernetesworker","message":"Compare","original":"{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"replicas\":-1,\"patchID\":-1,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}","patch":"{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"replicas\":1,\"patchID\":0,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}"}
2024-11-19 17:38:36.387	{"severity":"info","ts":"2024-11-20T00:38:36Z","logger":"listener-app.worker.kubernetesworker","message":"Preparing EphemeralRunnerSet update","json":"{\"spec\":{\"patchID\":0,\"replicas\":1}}"}
2024-11-19 17:38:36.397	{"severity":"info","ts":"2024-11-20T00:38:36Z","logger":"listener-app.worker.kubernetesworker","message":"Ephemeral runner set scaled.","namespace":"arc-runners","name":"backend-tests-group-tb758","replicas":1}
2024-11-19 17:38:36.397	{"severity":"info","ts":"2024-11-20T00:38:36Z","logger":"listener-app.listener","message":"Getting next message","lastMessageID":19216}
2024-11-19 17:38:37.186	{"severity":"info","ts":"2024-11-20T00:38:37Z","logger":"listener-app.worker.kubernetesworker","message":"Calculated target runner count","assigned job":0,"decision":0,"min":0,"max":40,"currentRunnerCount":0,"jobsCompleted":0}
2024-11-19 17:38:37.186	{"severity":"info","ts":"2024-11-20T00:38:37Z","logger":"listener-app.worker.kubernetesworker","message":"Compare","original":"{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"replicas\":-1,\"patchID\":-1,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}","patch":"{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"patchID\":0,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}"}
2024-11-19 17:38:37.186	{"severity":"info","ts":"2024-11-20T00:38:37Z","logger":"listener-app.worker.kubernetesworker","message":"Preparing EphemeralRunnerSet update","json":"{\"spec\":{\"patchID\":0,\"replicas\":null}}"}
2024-11-19 17:38:37.196	{"severity":"info","ts":"2024-11-20T00:38:37Z","logger":"listener-app.worker.kubernetesworker","message":"Ephemeral runner set scaled.","namespace":"arc-runners","name":"test-group-r6jw6","replicas":0}
2024-11-19 17:38:37.196	{"severity":"info","ts":"2024-11-20T00:38:37Z","logger":"listener-app.listener","message":"Getting next message","lastMessageID":0}
2024-11-19 17:38:54.391	{"severity":"info","ts":"2024-11-20T00:38:54Z","logger":"listener-app.worker.kubernetesworker","message":"Calculated target runner count","assigned job":0,"decision":40,"min":40,"max":120,"currentRunnerCount":40,"jobsCompleted":0}
2024-11-19 17:38:54.392	{"severity":"info","ts":"2024-11-20T00:38:54Z","logger":"listener-app.worker.kubernetesworker","message":"Compare","original":"{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"replicas\":-1,\"patchID\":-1,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}","patch":"{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"replicas\":40,\"patchID\":0,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}"}
2024-11-19 17:38:54.392	{"severity":"info","ts":"2024-11-20T00:38:54Z","logger":"listener-app.worker.kubernetesworker","message":"Preparing EphemeralRunnerSet update","json":"{\"spec\":{\"patchID\":0,\"replicas\":40}}"}
2024-11-19 17:38:54.402	{"severity":"info","ts":"2024-11-20T00:38:54Z","logger":"listener-app.worker.kubernetesworker","message":"Ephemeral runner set scaled.","namespace":"arc-runners","name":"lightweight-group-b2g6j","replicas":40}
2024-11-19 17:38:54.402	{"severity":"info","ts":"2024-11-20T00:38:54Z","logger":"listener-app.listener","message":"Getting next message","lastMessageID":72017}

Logs from when it was fixed by deleting the AutoscalingRunnerSet and re-creating

Common labels: {"namespace":"arc-systems"}
Line limit: "5000 (71 displayed)"
Total bytes processed: "1.29  MB"


2024-11-19 18:24:26.831	{"severity":"info","ts":"2024-11-20T01:24:26Z","logger":"listener-app.worker.kubernetesworker","message":"Calculated target runner count","assigned job":0,"decision":0,"min":0,"max":20,"currentRunnerCount":0,"jobsCompleted":0}
2024-11-19 18:24:26.831	{"severity":"info","ts":"2024-11-20T01:24:26Z","logger":"listener-app.worker.kubernetesworker","message":"Compare","original":"{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"replicas\":-1,\"patchID\":-1,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}","patch":"{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"patchID\":0,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}"}
2024-11-19 18:24:26.831	{"severity":"info","ts":"2024-11-20T01:24:26Z","logger":"listener-app.worker.kubernetesworker","message":"Preparing EphemeralRunnerSet update","json":"{\"spec\":{\"patchID\":0,\"replicas\":null}}"}
2024-11-19 18:24:26.840	{"severity":"info","ts":"2024-11-20T01:24:26Z","logger":"listener-app.worker.kubernetesworker","message":"Ephemeral runner set scaled.","namespace":"arc-runners","name":"automation-group-l2754","replicas":0}
2024-11-19 18:24:26.840	{"severity":"info","ts":"2024-11-20T01:24:26Z","logger":"listener-app.listener","message":"Getting next message","lastMessageID":12610}
2024-11-19 18:24:31.443	{"severity":"info","ts":"2024-11-20T01:24:31Z","logger":"listener-app.worker.kubernetesworker","message":"Calculated target runner count","assigned job":0,"decision":2,"min":1,"max":180,"currentRunnerCount":2,"jobsCompleted":0}
2024-11-19 18:24:31.443	{"severity":"info","ts":"2024-11-20T01:24:31Z","logger":"listener-app.worker.kubernetesworker","message":"Compare","original":"{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"replicas\":-1,\"patchID\":-1,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}","patch":"{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"replicas\":2,\"patchID\":3,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}"}
2024-11-19 18:24:31.443	{"severity":"info","ts":"2024-11-20T01:24:31Z","logger":"listener-app.worker.kubernetesworker","message":"Preparing EphemeralRunnerSet update","json":"{\"spec\":{\"patchID\":3,\"replicas\":2}}"}
2024-11-19 18:24:31.455	{"severity":"info","ts":"2024-11-20T01:24:31Z","logger":"listener-app.worker.kubernetesworker","message":"Ephemeral runner set scaled.","namespace":"arc-runners","name":"backend-tests-group-tb758","replicas":2}
2024-11-19 18:24:31.455	{"severity":"info","ts":"2024-11-20T01:24:31Z","logger":"listener-app.listener","message":"Getting next message","lastMessageID":0}
2024-11-19 18:24:31.980	{"severity":"info","ts":"2024-11-20T01:24:31Z","logger":"listener-app.worker.kubernetesworker","message":"Calculated target runner count","assigned job":0,"decision":4,"min":4,"max":40,"currentRunnerCount":4,"jobsCompleted":0}
2024-11-19 18:24:31.980	{"severity":"info","ts":"2024-11-20T01:24:31Z","logger":"listener-app.worker.kubernetesworker","message":"Compare","original":"{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"replicas\":-1,\"patchID\":-1,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}","patch":"{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"replicas\":4,\"patchID\":0,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}"}
2024-11-19 18:24:31.980	{"severity":"info","ts":"2024-11-20T01:24:31Z","logger":"listener-app.worker.kubernetesworker","message":"Preparing EphemeralRunnerSet update","json":"{\"spec\":{\"patchID\":0,\"replicas\":4}}"}
2024-11-19 18:24:31.991	{"severity":"info","ts":"2024-11-20T01:24:31Z","logger":"listener-app.worker.kubernetesworker","message":"Ephemeral runner set scaled.","namespace":"arc-runners","name":"group-2wd6h","replicas":4}
2024-11-19 18:24:31.991	{"severity":"info","ts":"2024-11-20T01:24:31Z","logger":"listener-app.listener","message":"Getting next message","lastMessageID":138914}
2024-11-19 18:24:33.042	{"severity":"info","ts":"2024-11-20T01:24:33Z","logger":"listener-app.worker.kubernetesworker","message":"Calculated target runner count","assigned job":0,"decision":0,"min":0,"max":40,"currentRunnerCount":0,"jobsCompleted":0}
2024-11-19 18:24:33.042	{"severity":"info","ts":"2024-11-20T01:24:33Z","logger":"listener-app.worker.kubernetesworker","message":"Compare","original":"{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"replicas\":-1,\"patchID\":-1,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}","patch":"{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"patchID\":0,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}"}
2024-11-19 18:24:33.042	{"severity":"info","ts":"2024-11-20T01:24:33Z","logger":"listener-app.worker.kubernetesworker","message":"Preparing EphemeralRunnerSet update","json":"{\"spec\":{\"patchID\":0,\"replicas\":null}}"}
2024-11-19 18:24:33.051	{"severity":"info","ts":"2024-11-20T01:24:33Z","logger":"listener-app.worker.kubernetesworker","message":"Ephemeral runner set scaled.","namespace":"arc-runners","name":"test-group-r6jw6","replicas":0}
2024-11-19 18:24:33.051	{"severity":"info","ts":"2024-11-20T01:24:33Z","logger":"listener-app.listener","message":"Getting next message","lastMessageID":0}
2024-11-19 18:24:44.075	{"severity":"info","ts":"2024-11-20T01:24:44Z","logger":"listener-app.worker.kubernetesworker","message":"Calculated target runner count","assigned job":0,"decision":40,"min":40,"max":120,"currentRunnerCount":40,"jobsCompleted":0}
2024-11-19 18:24:44.076	{"severity":"info","ts":"2024-11-20T01:24:44Z","logger":"listener-app.worker.kubernetesworker","message":"Compare","original":"{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"replicas\":-1,\"patchID\":-1,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}","patch":"{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"replicas\":40,\"patchID\":0,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}"}
2024-11-19 18:24:44.076	{"severity":"info","ts":"2024-11-20T01:24:44Z","logger":"listener-app.worker.kubernetesworker","message":"Preparing EphemeralRunnerSet update","json":"{\"spec\":{\"patchID\":0,\"replicas\":40}}"}
2024-11-19 18:24:44.084	{"severity":"info","ts":"2024-11-20T01:24:44Z","logger":"listener-app.worker.kubernetesworker","message":"Ephemeral runner set scaled.","namespace":"arc-runners","name":"lightweight-group-b2g6j","replicas":40}
2024-11-19 18:24:44.084	{"severity":"info","ts":"2024-11-20T01:24:44Z","logger":"listener-app.listener","message":"Getting next message","lastMessageID":72029}
2024-11-19 18:24:53.000	name=backend-tests-group-754b578d-listener kind=Pod objectAPIversion=v1 objectRV=176800489 eventRV=176802092 reportinginstance=ip-10-11-15-220.us-west-2.compute.internal reportingcontroller=kubelet sourcecomponent=kubelet sourcehost=ip-10-11-15-220.us-west-2.compute.internal reason=Killing type=Normal count=1 msg="Stopping container listener" 
2024-11-19 18:24:53.448	{"severity":"error","ts":"2024-11-20T01:24:53Z","logger":"listener-app","message":"Retryable client error","error":"Get \"https://pipelinesghubeus23.actions.githubusercontent.com/zib2we2z5kXEQnMTiAfTm9qQisOTuzKmjWWop27TyRThguaypt/_apis/runtime/runnerscalesets/201/messages?sessionId=6779845a-e127-4b87-b337-f3869797fb5a&api-version=6.0-preview\": context canceled","method":"GET","url":"https://pipelinesghubeus23.actions.githubusercontent.com/zib2we2z5kXEQnMTiAfTm9qQisOTuzKmjWWop27TyRThguaypt/_apis/runtime/runnerscalesets/201/messages?sessionId=6779845a-e127-4b87-b337-f3869797fb5a&api-version=6.0-preview","error":"request failed","stacktrace":"github.com/actions/actions-runner-controller/github/actions.(*clientLogger).Error\n\tgithub.com/actions/actions-runner-controller/github/actions/client.go:76\ngithub.com/hashicorp/go-retryablehttp.(*Client).Do\n\tgithub.com/hashicorp/[email protected]/client.go:718\ngithub.com/hashicorp/go-retryablehttp.(*RoundTripper).RoundTrip\n\tgithub.com/hashicorp/[email protected]/roundtripper.go:47\nnet/http.send\n\tnet/http/client.go:259\nnet/http.(*Client).send\n\tnet/http/client.go:180\nnet/http.(*Client).do\n\tnet/http/client.go:724\nnet/http.(*Client).Do\n\tnet/http/client.go:590\ngithub.com/actions/actions-runner-controller/github/actions.(*Client).Do\n\tgithub.com/actions/actions-runner-controller/github/actions/client.go:273\ngithub.com/actions/actions-runner-controller/github/actions.(*Client).GetMessage\n\tgithub.com/actions/actions-runner-controller/github/actions/client.go:577\ngithub.com/actions/actions-runner-controller/cmd/ghalistener/listener.(*Listener).getMessage\n\tgithub.com/actions/actions-runner-controller/cmd/ghalistener/listener/listener.go:272\ngithub.com/actions/actions-runner-controller/cmd/ghalistener/listener.(*Listener).Listen\n\tgithub.com/actions/actions-runner-controller/cmd/ghalistener/listener/listener.go:163\ngithub.com/actions/actions-runner-controller/cmd/ghalistener/app.(*App).Run.func1\n\tgithub.com/actions/actions-runner-controller/cmd/ghalistener/app/app.go:124\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\tgolang.org/x/[email protected]/errgroup/errgroup.go:78"}
2024-11-19 18:24:53.448	{"severity":"info","ts":"2024-11-20T01:24:53Z","logger":"listener-app.listener","message":"Deleting message session"}
2024-11-19 18:24:53.823	2024/11/20 01:24:53 Application returned an error: http: Server closed
2024-11-19 18:25:00.000	name=backend-tests-group-754b578d-listener kind=Pod objectAPIversion=v1 objectRV=176802206 eventRV=176802216 reportinginstance=ip-10-11-15-220.us-west-2.compute.internal reportingcontroller=kubelet sourcecomponent=kubelet sourcehost=ip-10-11-15-220.us-west-2.compute.internal reason=Started type=Normal count=1 msg="Started container listener" 
2024-11-19 18:25:00.000	name=backend-tests-group-754b578d-listener kind=Pod objectAPIversion=v1 objectRV=176802206 eventRV=176802215 reportinginstance=ip-10-11-15-220.us-west-2.compute.internal reportingcontroller=kubelet sourcecomponent=kubelet sourcehost=ip-10-11-15-220.us-west-2.compute.internal reason=Created type=Normal count=1 msg="Created container listener" 
2024-11-19 18:25:00.000	name=backend-tests-group-754b578d-listener kind=Pod objectAPIversion=v1 objectRV=176802206 eventRV=176802214 reportinginstance=ip-10-11-15-220.us-west-2.compute.internal reportingcontroller=kubelet sourcecomponent=kubelet sourcehost=ip-10-11-15-220.us-west-2.compute.internal reason=Pulled type=Normal count=1 msg="Container image \"ghcr.io/actions/gha-runner-scale-set-controller:0.9.3\" already present on machine" 
2024-11-19 18:25:00.000	name=backend-tests-group-754b578d-listener kind=Pod objectAPIversion=v1 objectRV=176802205 eventRV=176802207 reportingcontroller=default-scheduler sourcecomponent=default-scheduler reason=Scheduled type=Normal count=1 msg="Successfully assigned arc-systems/backend-tests-group-754b578d-listener to ip-10-11-15-220.us-west-2.compute.internal" 
2024-11-19 18:25:00.897	{"severity":"info","ts":"2024-11-20T01:25:00Z","logger":"listener-app","message":"app initialized"}
2024-11-19 18:25:00.897	{"severity":"info","ts":"2024-11-20T01:25:00Z","logger":"listener-app","message":"Starting listener"}
2024-11-19 18:25:00.898	{"severity":"info","ts":"2024-11-20T01:25:00Z","logger":"listener-app","message":"Starting metrics server"}
2024-11-19 18:25:00.898	{"severity":"info","ts":"2024-11-20T01:25:00Z","logger":"listener-app","message":"refreshing token","githubConfigUrl":"https://github.com/<redacted>"}
2024-11-19 18:25:00.900	{"severity":"info","ts":"2024-11-20T01:25:00Z","logger":"listener-app","message":"getting access token for GitHub App auth","accessTokenURL":"https://api.github.com/app/installations/30877269/access_tokens"}
2024-11-19 18:25:01.059	{"severity":"info","ts":"2024-11-20T01:25:01Z","logger":"listener-app","message":"getting runner registration token","registrationTokenURL":"https://api.github.com/orgs/<redacted>/actions/runners/registration-token"}
2024-11-19 18:25:01.227	{"severity":"info","ts":"2024-11-20T01:25:01Z","logger":"listener-app","message":"getting Actions tenant URL and JWT","registrationURL":"https://api.github.com/actions/runner-registration"}
2024-11-19 18:25:01.759	{"severity":"info","ts":"2024-11-20T01:25:01Z","logger":"listener-app.listener","message":"Current runner scale set statistics.","statistics":"{\"totalAvailableJobs\":0,\"totalAcquiredJobs\":0,\"totalAssignedJobs\":0,\"totalRunningJobs\":0,\"totalRegisteredRunners\":0,\"totalBusyRunners\":0,\"totalIdleRunners\":0}"}
2024-11-19 18:25:01.759	{"severity":"info","ts":"2024-11-20T01:25:01Z","logger":"listener-app.worker.kubernetesworker","message":"Calculated target runner count","assigned job":0,"decision":1,"min":1,"max":180,"currentRunnerCount":1,"jobsCompleted":0}
2024-11-19 18:25:01.763	{"severity":"info","ts":"2024-11-20T01:25:01Z","logger":"listener-app.worker.kubernetesworker","message":"Compare","original":"{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"replicas\":-1,\"patchID\":-1,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}","patch":"{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"replicas\":1,\"patchID\":0,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}"}
2024-11-19 18:25:01.763	{"severity":"info","ts":"2024-11-20T01:25:01Z","logger":"listener-app.worker.kubernetesworker","message":"Preparing EphemeralRunnerSet update","json":"{\"spec\":{\"patchID\":0,\"replicas\":1}}"}
2024-11-19 18:25:01.794	{"severity":"info","ts":"2024-11-20T01:25:01Z","logger":"listener-app.worker.kubernetesworker","message":"Ephemeral runner set scaled.","namespace":"arc-runners","name":"backend-tests-group-5hq6w","replicas":1}
2024-11-19 18:25:01.795	{"severity":"info","ts":"2024-11-20T01:25:01Z","logger":"listener-app.listener","message":"Getting next message","lastMessageID":0}
2024-11-19 18:25:16.950	{"severity":"info","ts":"2024-11-20T01:25:16Z","logger":"listener-app.worker.kubernetesworker","message":"Calculated target runner count","assigned job":0,"decision":0,"min":0,"max":20,"currentRunnerCount":0,"jobsCompleted":0}
2024-11-19 18:25:16.950	{"severity":"info","ts":"2024-11-20T01:25:16Z","logger":"listener-app.worker.kubernetesworker","message":"Compare","original":"{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"replicas\":-1,\"patchID\":-1,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}","patch":"{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"patchID\":0,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}"}
2024-11-19 18:25:16.950	{"severity":"info","ts":"2024-11-20T01:25:16Z","logger":"listener-app.worker.kubernetesworker","message":"Preparing EphemeralRunnerSet update","json":"{\"spec\":{\"patchID\":0,\"replicas\":null}}"}
2024-11-19 18:25:16.960	{"severity":"info","ts":"2024-11-20T01:25:16Z","logger":"listener-app.worker.kubernetesworker","message":"Ephemeral runner set scaled.","namespace":"arc-runners","name":"automation-group-l2754","replicas":0}
2024-11-19 18:25:16.960	{"severity":"info","ts":"2024-11-20T01:25:16Z","logger":"listener-app.listener","message":"Getting next message","lastMessageID":12610}
2024-11-19 18:25:22.087	{"severity":"info","ts":"2024-11-20T01:25:22Z","logger":"listener-app.worker.kubernetesworker","message":"Calculated target runner count","assigned job":0,"decision":4,"min":4,"max":40,"currentRunnerCount":4,"jobsCompleted":0}
2024-11-19 18:25:22.087	{"severity":"info","ts":"2024-11-20T01:25:22Z","logger":"listener-app.worker.kubernetesworker","message":"Compare","original":"{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"replicas\":-1,\"patchID\":-1,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}","patch":"{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"replicas\":4,\"patchID\":0,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}"}
2024-11-19 18:25:22.087	{"severity":"info","ts":"2024-11-20T01:25:22Z","logger":"listener-app.worker.kubernetesworker","message":"Preparing EphemeralRunnerSet update","json":"{\"spec\":{\"patchID\":0,\"replicas\":4}}"}
2024-11-19 18:25:22.096	{"severity":"info","ts":"2024-11-20T01:25:22Z","logger":"listener-app.worker.kubernetesworker","message":"Ephemeral runner set scaled.","namespace":"arc-runners","name":"group-2wd6h","replicas":4}
2024-11-19 18:25:22.096	{"severity":"info","ts":"2024-11-20T01:25:22Z","logger":"listener-app.listener","message":"Getting next message","lastMessageID":138914}
2024-11-19 18:25:23.145	{"severity":"info","ts":"2024-11-20T01:25:23Z","logger":"listener-app.worker.kubernetesworker","message":"Calculated target runner count","assigned job":0,"decision":0,"min":0,"max":40,"currentRunnerCount":0,"jobsCompleted":0}
2024-11-19 18:25:23.145	{"severity":"info","ts":"2024-11-20T01:25:23Z","logger":"listener-app.worker.kubernetesworker","message":"Compare","original":"{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"replicas\":-1,\"patchID\":-1,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}","patch":"{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"patchID\":0,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}"}
2024-11-19 18:25:23.145	{"severity":"info","ts":"2024-11-20T01:25:23Z","logger":"listener-app.worker.kubernetesworker","message":"Preparing EphemeralRunnerSet update","json":"{\"spec\":{\"patchID\":0,\"replicas\":null}}"}
2024-11-19 18:25:23.158	{"severity":"info","ts":"2024-11-20T01:25:23Z","logger":"listener-app.worker.kubernetesworker","message":"Ephemeral runner set scaled.","namespace":"arc-runners","name":"test-group-r6jw6","replicas":0}
2024-11-19 18:25:23.158	{"severity":"info","ts":"2024-11-20T01:25:23Z","logger":"listener-app.listener","message":"Getting next message","lastMessageID":0}
2024-11-19 18:25:34.193	{"severity":"info","ts":"2024-11-20T01:25:34Z","logger":"listener-app.worker.kubernetesworker","message":"Calculated target runner count","assigned job":0,"decision":40,"min":40,"max":120,"currentRunnerCount":40,"jobsCompleted":0}
2024-11-19 18:25:34.193	{"severity":"info","ts":"2024-11-20T01:25:34Z","logger":"listener-app.worker.kubernetesworker","message":"Compare","original":"{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"replicas\":-1,\"patchID\":-1,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}","patch":"{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"replicas\":40,\"patchID\":0,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}"}
2024-11-19 18:25:34.193	{"severity":"info","ts":"2024-11-20T01:25:34Z","logger":"listener-app.worker.kubernetesworker","message":"Preparing EphemeralRunnerSet update","json":"{\"spec\":{\"patchID\":0,\"replicas\":40}}"}
2024-11-19 18:25:34.203	{"severity":"info","ts":"2024-11-20T01:25:34Z","logger":"listener-app.worker.kubernetesworker","message":"Ephemeral runner set scaled.","namespace":"arc-runners","name":"lightweight-group-b2g6j","replicas":40}
2024-11-19 18:25:34.203	{"severity":"info","ts":"2024-11-20T01:25:34Z","logger":"listener-app.listener","message":"Getting next message","lastMessageID":72029}
2024-11-19 18:25:51.886	{"severity":"info","ts":"2024-11-20T01:25:51Z","logger":"listener-app.worker.kubernetesworker","message":"Calculated target runner count","assigned job":0,"decision":1,"min":1,"max":180,"currentRunnerCount":1,"jobsCompleted":0}
2024-11-19 18:25:51.886	{"severity":"info","ts":"2024-11-20T01:25:51Z","logger":"listener-app.worker.kubernetesworker","message":"Compare","original":"{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"replicas\":-1,\"patchID\":-1,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}","patch":"{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"replicas\":1,\"patchID\":0,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}"}
2024-11-19 18:25:51.886	{"severity":"info","ts":"2024-11-20T01:25:51Z","logger":"listener-app.worker.kubernetesworker","message":"Preparing EphemeralRunnerSet update","json":"{\"spec\":{\"patchID\":0,\"replicas\":1}}"}
2024-11-19 18:25:51.896	{"severity":"info","ts":"2024-11-20T01:25:51Z","logger":"listener-app.worker.kubernetesworker","message":"Ephemeral runner set scaled.","namespace":"arc-runners","name":"backend-tests-group-5hq6w","replicas":1}
2024-11-19 18:25:51.896	{"severity":"info","ts":"2024-11-20T01:25:51Z","logger":"listener-app.listener","message":"Getting next message","lastMessageID":0}


### Runner Pod Logs

```shell
There weren't runner pods - which was the problem

koreyGambill avatar Nov 21 '24 18:11 koreyGambill

Hello! Thank you for filing an issue.

The maintainers will triage your issue shortly.

In the meantime, please take a look at the troubleshooting guide for bug reports.

If this is a feature request, please review our contribution guidelines.

github-actions[bot] avatar Nov 21 '24 18:11 github-actions[bot]

Having the same issue, wondering if you got this fixed by reverting to the previous version

pragmaticivan avatar Nov 27 '24 20:11 pragmaticivan

Yesterday solved same issue with 0.9.2. Had to delete custom resource with status failed , then it all returned to normal functioning

ggercman avatar Nov 28 '24 06:11 ggercman

I also tried fully deleting everything and creating it again. Some of my node pools are working, and most of them are not.

This seems to be related to something on the Github Side, but I can't confirm what's going on.

pragmaticivan avatar Dec 03 '24 01:12 pragmaticivan

I had the same issue.

I tried recreating the resources, including the CRD, but that didn't work. However, I found that giving a new runnerScaleSetName to the resource did the trick.

takashabe avatar Jan 14 '25 09:01 takashabe

I had the same issue. I tried recreating the resources, including the CRD, but that didn't work. However, I found that giving a new runnerScaleSetName to the resource did the trick.

Same issue from my side, deployment using helm. I resolved it re-deploying (delete&apply) of EphemeralRunnerSet using the same name that it was before.

Logs from orchestrator pod (in loop, before issue was solved, pods of runners were not running at that time):

025-02-27T17:06:08Z INFO EphemeralRunner EphemeralRunner has already finished. Stopping reconciliation and waiting for EphemeralRunnerSet to clean it up {"version": "0.10.1", "ephemeralrunner": {"name":"xxxx-xxxx-yyy-runner-yyy","namespace":"namespace"}, "phase": "Failed"}

Torvalds9 avatar Feb 27 '25 17:02 Torvalds9

Similar to this issue: https://github.com/actions/actions-runner-controller/discussions/3402

I'm currently having this issue, running 0.10.1. When I do a fresh deploy, all shows up as expected, including my min 2 runners. then after they have been used a few times, 0 pods and the autoscalingrunnerset shows:

Status:                        
  Current Runners:            2
  Failed Ephemeral Runners:   0
  Pending Ephemeral Runners:  0
  Running Ephemeral Runners:  2
Events:                       <none>

No pods show up at all when running k get all,autoscalingrunnerset,ephemeralrunnerset -n namespace


NAME                                                        MINIMUM RUNNERS   MAXIMUM RUNNERS   CURRENT RUNNERS   STATE   PENDING RUNNERS   RUNNING RUNNERS   FINISHED RUNNERS   DELETING RUNNERS
autoscalingrunnerset.actions.github.com/runner00          2                 6                 2                         0                 2                                    
autoscalingrunnerset.actions.github.com/runner01   2                 6                 2                         0                 2                                    

NAME                                                            DESIREDREPLICAS   CURRENTREPLICAS   PENDING RUNNERS   RUNNING RUNNERS   FINISHED RUNNERS   DELETING RUNNERS
ephemeralrunnerset.actions.github.com/runner00-6vncx          2                 2                 0                 0                                    
ephemeralrunnerset.actions.github.com/runner01-9xrzx   2                 2                 0                 0                                    

The listener runner is still running in the arc-systems namespace. Just no pods running in the runnerset namespace.

Update

I've just noticed the below in the ephemeralrunnerset

status:
    currentReplicas: 2
    failedEphemeralRunners: 2
    pendingEphemeralRunners: 0
    runningEphemeralRunners: 0

Found the logs:

2025-03-06T14:46:30Z    ERROR    EphemeralRunner    Failed to clean up runner from the service (not an ActionsError)    {"version": "0.10.1", "ephemeralrunner": {"name":"runner01-fnbht-runner-jnq4d","namespace":"runner01-set"}, "error": "failed to remove runner from the service: failed to get runner registration token on refresh: github api error: StatusCode 403, RequestID \"----------------------------\": {\"message\":\"Resource not accessible by integration\",\"documentation_url\":\"https://docs.github.com/rest/actions/self-hosted-runners#c │
github.com/actions/actions-runner-controller/controllers/actions%2egithub%2ecom.(*EphemeralRunnerReconciler).cleanupRunnerFromService
    github.com/actions/actions-runner-controller/controllers/actions.github.com/ephemeralrunner_controller.go:313                    
github.com/actions/actions-runner-controller/controllers/actions%2egithub%2ecom.(*EphemeralRunnerReconciler).Reconcile               
    github.com/actions/actions-runner-controller/controllers/actions.github.com/ephemeralrunner_controller.go:97                     
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile                                                       
    sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:119                                                 
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler                                                
    sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:316                                                 
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem                                             
    sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:266                                                 
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2                                                   
    sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:227 

Garethgr avatar Mar 06 '25 14:03 Garethgr

I was nodding along with the original filer, but some more digging including looking at the conversation in https://github.com/actions/actions-runner-controller/issues/3619 clued me in to the fact that number of "current runners" is not going to match number of pods. It's counting the number of ephemeralrunner resources, which persist even when pods die, and if you describe them may have a message like "Pod has failed to start more than 5 times". In my case the container was dying immediately on startup.

Not saying that's what's happening in your case, but at least for me the seemingly stale runner count was due to this intentional ephemeralrunner resource persistence. Understanding that layer of indirection between the runnerset and pods was helpful in diagnosing.

patrickvinograd avatar Mar 25 '25 23:03 patrickvinograd

I was nodding along with the original filer, but some more digging including looking at the conversation in #3619 clued me in to the fact that number of "current runners" is not going to match number of pods. It's counting the number of ephemeralrunner resources, which persist even when pods die, and if you describe them may have a message like "Pod has failed to start more than 5 times". In my case the container was dying immediately on startup.

Not saying that's what's happening in your case, but at least for me the seemingly stale runner count was due to this intentional ephemeralrunner resource persistence. Understanding that layer of indirection between the runnerset and pods was helpful in diagnosing.

Much appreciated, this helped a whole lot.

Garethgr avatar Mar 26 '25 10:03 Garethgr

I was also having a similar issue. In my case, pods will very occasionally fail on startup, leaving the controller in the state where it thinks a runner exists and it doesn't. I ended up creating a little script running in a deployment to watch the runner namespace and delete these failed pods. This then allows the controller to recreate the ephemeral runner:

NAMESPACE="arc-runners"

# This is pretty inefficient because it hammers the pods endpoint on every event, but this isn't a super active namespace anyway
delete_failed_pods() {
  FAILED_PODS=$(kubectl get pods --namespace=$NAMESPACE --field-selector=status.phase=Failed -o jsonpath='{.items[*].metadata.name}')
  for POD in $FAILED_PODS; do
    echo "Deleting failed pod: $POD"
    kubectl get pod $POD --namespace=$NAMESPACE
    kubectl delete pod $POD --namespace=$NAMESPACE
  done
}

echo "Checking permissions - Can we get events in $NAMESPACE?"
kubectl auth can-i get events --namespace=$NAMESPACE

echo "Watching for failed pods in $NAMESPACE"

kubectl get events --namespace=$NAMESPACE --watch-only | while read -r event; do
  delete_failed_pods
done

Bit hacky, but does the job.

Makeshift avatar Apr 07 '25 10:04 Makeshift

Running into a similar issue. Does the job that was scheduled on the failed runner get "lost"? If failed ephemeralrunners are cleaned up quickly, will GH reassign that job to a new runner?

avadhanij avatar Apr 10 '25 06:04 avadhanij

I was also having a similar issue. In my case, pods will very occasionally fail on startup, leaving the controller in the state where it thinks a runner exists and it doesn't. I ended up creating a little script running in a deployment to watch the runner namespace and delete these failed pods. This then allows the controller to recreate the ephemeral runner:

NAMESPACE="arc-runners"

This is pretty inefficient because it hammers the pods endpoint on every event, but this isn't a super active namespace anyway

delete_failed_pods() { FAILED_PODS=$(kubectl get pods --namespace=$NAMESPACE --field-selector=status.phase=Failed -o jsonpath='{.items[*].metadata.name}') for POD in $FAILED_PODS; do echo "Deleting failed pod: $POD" kubectl get pod $POD --namespace=$NAMESPACE kubectl delete pod $POD --namespace=$NAMESPACE done }

echo "Checking permissions - Can we get events in $NAMESPACE?" kubectl auth can-i get events --namespace=$NAMESPACE

echo "Watching for failed pods in $NAMESPACE"

kubectl get events --namespace=$NAMESPACE --watch-only | while read -r event; do delete_failed_pods done Bit hacky, but does the job.

I too have had to write something, as I am seeing pods stuck terminating, however their status.phase reports as Running. So now I am looking for long running pods as most of ours, currently, don't run as long as 15min+

Garethgr avatar Apr 10 '25 06:04 Garethgr

Running into a similar issue. Does the job that was scheduled on the failed runner get "lost"? If failed ephemeralrunners are cleaned up quickly, will GH reassign that job to a new runner?

In my experience, the runner fails before being able to pick up the job, so it's simply picked up by the next successfully-booted runner with no events lost.

Makeshift avatar Apr 10 '25 09:04 Makeshift

This is happening for cluster I have as well. Controller logs "running": 4 now, while only 2 runner pods are in cluster.

Note: this did not happen with the previous community-developed version (summerwind).

UPD: restarting controller helped - it has created the missing pods, and briefly during start seemed to notice that there was only 2 running pods :

running": 2, "finished": 0, "failed": 0, "deleting": 0

strowk avatar Jul 31 '25 16:07 strowk

Happening here too on version 0.11.10 CURRENT RUNNERS: 7 PENDING RUNNERS 7

Resource ephemeralrunners.actions.github.com shows all 7 pods with Failed. Had to delete them manually for them to kick start other jobs

michel-numan avatar Oct 13 '25 08:10 michel-numan