runnerscale set min and max runner issue
Checks
- [X] I've already read https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners-with-actions-runner-controller/troubleshooting-actions-runner-controller-errors and I'm sure my issue is not covered in the troubleshooting guide.
- [X] I am using charts that are officially provided
Controller Version
0.9.1
Deployment Method
Helm
Checks
- [X] This isn't a question or user support case (For Q&A and community support, go to Discussions).
- [X] I've read the Changelog before submitting this issue and I'm sure it's not due to any recently-introduced backward-incompatible changes
To Reproduce
we are seeing one issue from yesterday
we configured runner scale set with min - 5 and max 20 but , always the desire count is showing 0 but when the job triggers it creates the runner pod . any specific changes
Describe the bug
we configured runner scaleset with min - 5 and max 20 but , always the desire count is showing 0 but when the job triggers it creates the runner pod . any specific changes?
Describe the expected behavior
since we gave min runners -5 , k get runners should always show minimum runners 5 in idle status or running status . but now it is showing 0 runners, only i can listenerpod and controller pod
Additional Context
+ kubectl get pods
NAME READY STATUS RESTARTS AGE
prosper-linux-prod-65655978-listener 1/1 Running 0 5h32m
prosper-runner-controller-gha-rs-controller-6bbfbc4996-nn9gf 1/1 Running 0 5h30m
~/Github_workspace2/actions-runner-controller/ch/gha-runner-scale-set main
Controller Logs
2024-09-17T05:00:02Z INFO listener-app.worker.kubernetesworker Created merge patch json for EphemeralRunnerSet update {"json": "{\"spec\":{\"patchID\":0,\"replicas\":5}}"} │
│ 2024-09-17T05:00:02Z INFO listener-app.worker.kubernetesworker Scaling ephemeral runner set {"assigned job": 0, "decision": 5, "min": 5, "max": 20, "currentRunnerCount": 5 │
│ 2024-09-17T05:00:02Z INFO listener-app.worker.kubernetesworker Ephemeral runner set scaled. {"namespace": "prosper-gha-runners", "name": "prosper-linux-prod-hfdr6", "repli │
│ 2024-09-17T05:00:02Z INFO listener-app.listener Getting next message {"lastMessageID": 4161}
Runner Pod Logs
2024-09-17T05:00:02Z INFO listener-app.worker.kubernetesworker Created merge patch json for EphemeralRunnerSet update {"json": "{\"spec\":{\"patchID\":0,\"replicas\":5}}"} │
│ 2024-09-17T05:00:02Z INFO listener-app.worker.kubernetesworker Scaling ephemeral runner set {"assigned job": 0, "decision": 5, "min": 5, "max": 20, "currentRunnerCount": 5 │
│ 2024-09-17T05:00:02Z INFO listener-app.worker.kubernetesworker Ephemeral runner set scaled. {"namespace": "prosper-gha-runners", "name": "prosper-linux-prod-hfdr6", "repli │
│ 2024-09-17T05:00:02Z INFO listener-app.listener Getting next message {"lastMessageID": 4161}
and also not able to delete the failed runners
- kubectl get EphemeralRunner NAME GITHUB CONFIG URL RUNNERID STATUS JOBREPOSITORY JOBWORKFLOWREF WORKFLOWRUNID JOBDISPLAYNAME MESSAGE AGE prosper-linux-np-zvc6w-runner-4hfld https://github.com/prosperllc 305885 Running prosperllc/svc-user prosperllc/actions-workflows/.github/workflows/cicd.yaml@refs/heads/main 10934111905 CICD / Docker_Image_Build 5m35s prosper-linux-np-zvc6w-runner-59czb https://github.com/prosperllc 301863 Failed Pod has failed to start more than 5 times: 2d6h prosper-linux-np-zvc6w-runner-ghn99 https://github.com/prosperllc 301857 Failed Pod has failed to start more than 5 times: 2d6h prosper-linux-np-zvc6w-runner-jw5l2 https://github.com/prosperllc 301859 Failed Pod has failed to start more than 5 times: 2d6h prosper-linux-np-zvc6w-runner-l78z5 https://github.com/prosperllc 301866 Failed Pod has failed to start more than 5 times: 2d6h prosper-linux-np-zvc6w-runner-rkwhg https://github.com/prosperllc 301867 Failed Pod has failed to start more than 5 times: 2d6h ~/ka/actions-runner-controller/ch/gha-runner-scale-set main !1
how to clear this failed runners ?
any suggestions on this ?
Hi.
I can confirm that we also have this issue in Production (version 0.9.3).
We have minRunners set to 2 and maxRunners set to 5, but only when a pipeline triggers the job, a runner pop-up.
This was not the behavior on 0.9.0.
Seeing this issue in 0.9.3 as well.
In the OP's logs you can see the controller seems to think there are already the min number of runners even when there isn't. When changing the value to something else the controller will actually spin them up temporarily and then destroy them. No actual error logs, but the controller does state failed: minNum
When I changed from 5 to 2 you can see here that it seems to think there was already 5, but there wasn't
2024-12-10T23:20:04Z INFO EphemeralRunnerSet Scaling comparison {"version": "0.9.3", "ephemeralrunnerset": {"name":"SCALESET","namespace":"SCALESETNS"}, "current": 5, "desired": 2}
Hey everyone, sorry for the late response. The controller counts the number of ephemeral runners, which doesn't have to be the number of runner pods. When you encounter this situation, can you check if there are ephemeral runners in a failed state?
Closing this one since there has been no interaction, and we had many improvements during this time. Please let us know if you are still seeing this issue, and we can re-open it.