actions-runner-controller icon indicating copy to clipboard operation
actions-runner-controller copied to clipboard

runnerscale set min and max runner issue

Open sravula84 opened this issue 1 year ago • 2 comments

Checks

  • [X] I've already read https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners-with-actions-runner-controller/troubleshooting-actions-runner-controller-errors and I'm sure my issue is not covered in the troubleshooting guide.
  • [X] I am using charts that are officially provided

Controller Version

0.9.1

Deployment Method

Helm

Checks

  • [X] This isn't a question or user support case (For Q&A and community support, go to Discussions).
  • [X] I've read the Changelog before submitting this issue and I'm sure it's not due to any recently-introduced backward-incompatible changes

To Reproduce

we are seeing one issue from yesterday

we configured runner scale set with min - 5 and max 20 but , always the desire count is showing 0 but when the job triggers it creates the runner pod . any specific changes

Describe the bug

we configured runner scaleset with min - 5 and max 20 but , always the desire count is showing 0 but when the job triggers it creates the runner pod . any specific changes?

Describe the expected behavior

since we gave min runners -5 , k get runners should always show minimum runners 5 in idle status or running status . but now it is showing 0 runners, only i can listenerpod and controller pod

Additional Context

+ kubectl get pods
NAME                                                           READY   STATUS    RESTARTS   AGE
prosper-linux-prod-65655978-listener                           1/1     Running   0          5h32m
prosper-runner-controller-gha-rs-controller-6bbfbc4996-nn9gf   1/1     Running   0          5h30m
    ~/Github_workspace2/actions-runner-controller/ch/gha-runner-scale-set    main 

Controller Logs

2024-09-17T05:00:02Z    INFO    listener-app.worker.kubernetesworker    Created merge patch json for EphemeralRunnerSet update    {"json": "{\"spec\":{\"patchID\":0,\"replicas\":5}}"} │
│ 2024-09-17T05:00:02Z    INFO    listener-app.worker.kubernetesworker    Scaling ephemeral runner set    {"assigned job": 0, "decision": 5, "min": 5, "max": 20, "currentRunnerCount": 5 │
│ 2024-09-17T05:00:02Z    INFO    listener-app.worker.kubernetesworker    Ephemeral runner set scaled.    {"namespace": "prosper-gha-runners", "name": "prosper-linux-prod-hfdr6", "repli │
│ 2024-09-17T05:00:02Z    INFO    listener-app.listener    Getting next message    {"lastMessageID": 4161}

Runner Pod Logs

2024-09-17T05:00:02Z    INFO    listener-app.worker.kubernetesworker    Created merge patch json for EphemeralRunnerSet update    {"json": "{\"spec\":{\"patchID\":0,\"replicas\":5}}"} │
│ 2024-09-17T05:00:02Z    INFO    listener-app.worker.kubernetesworker    Scaling ephemeral runner set    {"assigned job": 0, "decision": 5, "min": 5, "max": 20, "currentRunnerCount": 5 │
│ 2024-09-17T05:00:02Z    INFO    listener-app.worker.kubernetesworker    Ephemeral runner set scaled.    {"namespace": "prosper-gha-runners", "name": "prosper-linux-prod-hfdr6", "repli │
│ 2024-09-17T05:00:02Z    INFO    listener-app.listener    Getting next message    {"lastMessageID": 4161}

sravula84 avatar Sep 17 '24 05:09 sravula84

and also not able to delete the failed runners

  • kubectl get EphemeralRunner NAME GITHUB CONFIG URL RUNNERID STATUS JOBREPOSITORY JOBWORKFLOWREF WORKFLOWRUNID JOBDISPLAYNAME MESSAGE AGE prosper-linux-np-zvc6w-runner-4hfld https://github.com/prosperllc 305885 Running prosperllc/svc-user prosperllc/actions-workflows/.github/workflows/cicd.yaml@refs/heads/main 10934111905 CICD / Docker_Image_Build 5m35s prosper-linux-np-zvc6w-runner-59czb https://github.com/prosperllc 301863 Failed Pod has failed to start more than 5 times: 2d6h prosper-linux-np-zvc6w-runner-ghn99 https://github.com/prosperllc 301857 Failed Pod has failed to start more than 5 times: 2d6h prosper-linux-np-zvc6w-runner-jw5l2 https://github.com/prosperllc 301859 Failed Pod has failed to start more than 5 times: 2d6h prosper-linux-np-zvc6w-runner-l78z5 https://github.com/prosperllc 301866 Failed Pod has failed to start more than 5 times: 2d6h prosper-linux-np-zvc6w-runner-rkwhg https://github.com/prosperllc 301867 Failed Pod has failed to start more than 5 times: 2d6h    ~/ka/actions-runner-controller/ch/gha-runner-scale-set    main !1 

how to clear this failed runners ?

sravula84 avatar Sep 19 '24 03:09 sravula84

any suggestions on this ?

sravula84 avatar Sep 26 '24 18:09 sravula84

Hi.

I can confirm that we also have this issue in Production (version 0.9.3).

We have minRunners set to 2 and maxRunners set to 5, but only when a pipeline triggers the job, a runner pop-up.

This was not the behavior on 0.9.0.

ruivitit avatar Nov 12 '24 17:11 ruivitit

Seeing this issue in 0.9.3 as well.

In the OP's logs you can see the controller seems to think there are already the min number of runners even when there isn't. When changing the value to something else the controller will actually spin them up temporarily and then destroy them. No actual error logs, but the controller does state failed: minNum

When I changed from 5 to 2 you can see here that it seems to think there was already 5, but there wasn't

2024-12-10T23:20:04Z    INFO    EphemeralRunnerSet      Scaling comparison      {"version": "0.9.3", "ephemeralrunnerset": {"name":"SCALESET","namespace":"SCALESETNS"}, "current": 5, "desired": 2}

minsis avatar Dec 10 '24 23:12 minsis

Hey everyone, sorry for the late response. The controller counts the number of ephemeral runners, which doesn't have to be the number of runner pods. When you encounter this situation, can you check if there are ephemeral runners in a failed state?

nikola-jokic avatar Feb 25 '25 08:02 nikola-jokic

Closing this one since there has been no interaction, and we had many improvements during this time. Please let us know if you are still seeing this issue, and we can re-open it.

nikola-jokic avatar Apr 15 '25 09:04 nikola-jokic