minRunners not being respected
Checks
- [X] I've already read https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners-with-actions-runner-controller/troubleshooting-actions-runner-controller-errors and I'm sure my issue is not covered in the troubleshooting guide.
- [X] I am using charts that are officially provided
Controller Version
0.9.2
Deployment Method
Helm
Checks
- [X] This isn't a question or user support case (For Q&A and community support, go to Discussions).
- [X] I've read the Changelog before submitting this issue and I'm sure it's not due to any recently-introduced backward-incompatible changes
To Reproduce
Deploy the latest ARC runners and latest ARC system, k8s 1.29
Describe the bug
Deploying the official helm chart with custom values as bellow with oci://ghcr.io/actions/actions-runner-controller-charts/gha-runner-scale-set
## maxRunners is the max number of runners the autoscaling runner set will scale up to.
maxRunners: 100
## minRunners is the min number of idle runners. The target number of runners created will be
## calculated as a sum of minRunners and the number of jobs assigned to the scale set.
minRunners: 6
containerMode:
type: "dind"
spec:
securityContext:
fsGroup: 1000
imagePullSecrets:
- name: regcred
containers:
- name: runner
image: ghcr.io/actions/actions-runner:latest
command: ["/bin/bash","-c","sudo apt-get update && sudo apt-get install curl unzip jq wget python3-pip git-all -y && /home/runner/run.sh"]
resources:
requests:
memory: 2Gi
cpu: 1.0
limits:
cpu: 4.0
memory: 8Gi
volumeMounts:
- name: docker-secret
mountPath: /home/runner/config.json
subPath: config.json
- name: docker-config-volume
mountPath: /home/runner/.docker
initContainers:
- name: dockerconfigwriter
image: alpine
command:
- sh
- -c
- cat /home/runner/config.json > /home/runner/.docker/config.json
volumeMounts:
- name: docker-secret
mountPath: /home/runner/config.json
subPath: config.json
- name: docker-config-volume
mountPath: /home/runner/.docker
volumes:
- name: docker-secret
secret:
secretName: regcred
items:
- key: .dockerconfigjson
path: config.json
- name: docker-config-volume
emptyDir: {}
after initial installation there are 6 minimum runner pods, after some time it becomes 2 without any reason.
Describe the expected behavior
runners respect MIN flag
Additional Context
arc runners does not respect minRunners flag
arc-runner-set-dk6ts-runner-dkbtd 2/2 Running 0 60m
arc-runner-set-dk6ts-runner-dpdzv 2/2 Running 0 61m
listener-app.worker.kubernetesworker Ephemeral runner set scaled. {"namespace": "arc-runners", "name": "arc-runner-set-dk6ts", "replicas": 6}
Controller Logs
-0-
Runner Pod Logs
2024-06-23T10:41:35Z INFO listener-app.worker.kubernetesworker Ephemeral runner set scaled. {"namespace": "arc-runners", "name": "arc-runner-set-dk6ts", "replicas": 6} │
│ 2024-06-23T10:41:35Z INFO listener-app.listener Getting next message {"lastMessageID": 0} │
│ 2024-06-23T10:42:26Z INFO listener-app.worker.kubernetesworker Calculated target runner count {"assigned job": 0, "decision": 6, "min": 6, "max": 100, "currentRunnerCount": 6, "jobsCompleted": 0} │
│ 2024-06-23T10:42:26Z INFO listener-app.worker.kubernetesworker Compare {"original": "{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"replicas\":-1,\"patchID\":-1,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTim │
│ 2024-06-23T10:42:26Z INFO listener-app.worker.kubernetesworker Preparing EphemeralRunnerSet update {"json": "{\"spec\":{\"patchID\":0,\"replicas\":6}}"} │
│ W0623 10:42:26.096882 1 warnings.go:70] unknown field "spec.patchID" │
│ 2024-06-23T10:42:26Z INFO listener-app.worker.kubernetesworker Ephemeral runner set scaled. {"namespace": "arc-runners", "name": "arc-runner-set-dk6ts", "replicas": 6} │
│ 2024-06-23T10:42:26Z INFO listener-app.listener Getting next message {"lastMessageID": 0} │
│ 2024-06-23T10:43:16Z INFO listener-app.worker.kubernetesworker Calculated target runner count {"assigned job": 0, "decision": 6, "min": 6, "max": 100, "currentRunnerCount": 6, "jobsCompleted": 0} │
│ 2024-06-23T10:43:16Z INFO listener-app.worker.kubernetesworker Compare {"original": "{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"replicas\":-1,\"patchID\":-1,\"ephemeralRunnerSpec\":{\"metadata\
Hi @gfrid,
Can you please inspect the ephemeralrunners? Are they in a failed state maybe?
Hi @gfrid,
Can you please inspect the ephemeralrunners? Are they in a failed state maybe?
going to check that, i removed the arc-runners and then found out some sets in ephemeralrunners, removed them manually and reinstalled the sets, will take a look at those
Our minRunners started doing something similar last night/today. We have 2 scalesets set with 1 for minRunners. Jobs are still being scheduled and running in actions but if I look into the arc-runners namespace at the pods, with minRunners = 1 I see no pods. I bumped minRunners to 2. Now I see 1 running pod. Logs from the listener say 2 replicas but from the Kubernetes perspective there is only 1.
Can you please post the controller and the listener log? And can you please show the describe of ephemeralrunners resource?
Maybe the pod had more than 5 failures, which means that it is going to be cleaned up. The ephemeral runner resource is kept so you can inspect the latest failure. That might be the disconnect. The ARC counts the number of ephemeral runners, not the count of pods
update, i spoke with AWS support and we found some jobs stuck in CRDs, this might be related to working with SPOT OCEAN. we moved to OD nodes and cleaned all the CRD stuck jobs updated finalizers ( --merge ) after removing all helm charts. Now its seems stable for a week.
I think this is safe to close now. I'm glad you resolved the issue!