actions-runner-controller icon indicating copy to clipboard operation
actions-runner-controller copied to clipboard

minRunners not being respected

Open gfrid opened this issue 1 year ago • 5 comments

Checks

  • [X] I've already read https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners-with-actions-runner-controller/troubleshooting-actions-runner-controller-errors and I'm sure my issue is not covered in the troubleshooting guide.
  • [X] I am using charts that are officially provided

Controller Version

0.9.2

Deployment Method

Helm

Checks

  • [X] This isn't a question or user support case (For Q&A and community support, go to Discussions).
  • [X] I've read the Changelog before submitting this issue and I'm sure it's not due to any recently-introduced backward-incompatible changes

To Reproduce

Deploy the latest ARC runners and latest ARC system, k8s 1.29

Describe the bug

Deploying the official helm chart with custom values as bellow with oci://ghcr.io/actions/actions-runner-controller-charts/gha-runner-scale-set

## maxRunners is the max number of runners the autoscaling runner set will scale up to.
maxRunners: 100

## minRunners is the min number of idle runners. The target number of runners created will be
## calculated as a sum of minRunners and the number of jobs assigned to the scale set.
minRunners: 6

containerMode:
   type: "dind"

  spec:
    securityContext:
      fsGroup: 1000
    imagePullSecrets:
      - name: regcred
    containers:
      - name: runner
        image: ghcr.io/actions/actions-runner:latest
        command: ["/bin/bash","-c","sudo apt-get update && sudo apt-get install curl unzip jq wget python3-pip git-all -y && /home/runner/run.sh"]
        resources:
          requests:
            memory: 2Gi
            cpu: 1.0
          limits:
            cpu: 4.0
            memory: 8Gi
        volumeMounts:
          - name: docker-secret
            mountPath: /home/runner/config.json
            subPath: config.json
          - name: docker-config-volume
            mountPath: /home/runner/.docker
    initContainers:
      - name: dockerconfigwriter
        image: alpine
        command:
          - sh
          - -c
          - cat /home/runner/config.json > /home/runner/.docker/config.json
        volumeMounts:
          - name: docker-secret
            mountPath: /home/runner/config.json
            subPath: config.json
          - name: docker-config-volume
            mountPath: /home/runner/.docker
    volumes:
      - name: docker-secret
        secret:
          secretName: regcred
          items:
            - key: .dockerconfigjson
              path: config.json
      - name: docker-config-volume
        emptyDir: {}

after initial installation there are 6 minimum runner pods, after some time it becomes 2 without any reason.

Describe the expected behavior

runners respect MIN flag

Additional Context

arc runners does not respect minRunners flag

arc-runner-set-dk6ts-runner-dkbtd   2/2     Running   0          60m
arc-runner-set-dk6ts-runner-dpdzv   2/2     Running   0          61m

 listener-app.worker.kubernetesworker    Ephemeral runner set scaled.    {"namespace": "arc-runners", "name": "arc-runner-set-dk6ts", "replicas": 6}

Controller Logs

-0-

Runner Pod Logs

2024-06-23T10:41:35Z    INFO    listener-app.worker.kubernetesworker    Ephemeral runner set scaled.    {"namespace": "arc-runners", "name": "arc-runner-set-dk6ts", "replicas": 6}                                                       │
│ 2024-06-23T10:41:35Z    INFO    listener-app.listener    Getting next message    {"lastMessageID": 0}                                                                                                                                     │
│ 2024-06-23T10:42:26Z    INFO    listener-app.worker.kubernetesworker    Calculated target runner count    {"assigned job": 0, "decision": 6, "min": 6, "max": 100, "currentRunnerCount": 6, "jobsCompleted": 0}                           │
│ 2024-06-23T10:42:26Z    INFO    listener-app.worker.kubernetesworker    Compare    {"original": "{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"replicas\":-1,\"patchID\":-1,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTim │
│ 2024-06-23T10:42:26Z    INFO    listener-app.worker.kubernetesworker    Preparing EphemeralRunnerSet update    {"json": "{\"spec\":{\"patchID\":0,\"replicas\":6}}"}                                                                      │
│ W0623 10:42:26.096882       1 warnings.go:70] unknown field "spec.patchID"                                                                                                                                                                │
│ 2024-06-23T10:42:26Z    INFO    listener-app.worker.kubernetesworker    Ephemeral runner set scaled.    {"namespace": "arc-runners", "name": "arc-runner-set-dk6ts", "replicas": 6}                                                       │
│ 2024-06-23T10:42:26Z    INFO    listener-app.listener    Getting next message    {"lastMessageID": 0}                                                                                                                                     │
│ 2024-06-23T10:43:16Z    INFO    listener-app.worker.kubernetesworker    Calculated target runner count    {"assigned job": 0, "decision": 6, "min": 6, "max": 100, "currentRunnerCount": 6, "jobsCompleted": 0}                           │
│ 2024-06-23T10:43:16Z    INFO    listener-app.worker.kubernetesworker    Compare    {"original": "{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"replicas\":-1,\"patchID\":-1,\"ephemeralRunnerSpec\":{\"metadata\

gfrid avatar Jun 23 '24 10:06 gfrid

Hi @gfrid,

Can you please inspect the ephemeralrunners? Are they in a failed state maybe?

nikola-jokic avatar Jun 24 '24 11:06 nikola-jokic

Hi @gfrid,

Can you please inspect the ephemeralrunners? Are they in a failed state maybe?

going to check that, i removed the arc-runners and then found out some sets in ephemeralrunners, removed them manually and reinstalled the sets, will take a look at those

gfrid avatar Jun 24 '24 18:06 gfrid

Our minRunners started doing something similar last night/today. We have 2 scalesets set with 1 for minRunners. Jobs are still being scheduled and running in actions but if I look into the arc-runners namespace at the pods, with minRunners = 1 I see no pods. I bumped minRunners to 2. Now I see 1 running pod. Logs from the listener say 2 replicas but from the Kubernetes perspective there is only 1.

casey-robertson-paypal avatar Jun 26 '24 20:06 casey-robertson-paypal

Can you please post the controller and the listener log? And can you please show the describe of ephemeralrunners resource? Maybe the pod had more than 5 failures, which means that it is going to be cleaned up. The ephemeral runner resource is kept so you can inspect the latest failure. That might be the disconnect. The ARC counts the number of ephemeral runners, not the count of pods

nikola-jokic avatar Jun 27 '24 12:06 nikola-jokic

update, i spoke with AWS support and we found some jobs stuck in CRDs, this might be related to working with SPOT OCEAN. we moved to OD nodes and cleaned all the CRD stuck jobs updated finalizers ( --merge ) after removing all helm charts. Now its seems stable for a week.

gfrid avatar Jul 02 '24 08:07 gfrid

I think this is safe to close now. I'm glad you resolved the issue!

nikola-jokic avatar Feb 25 '25 08:02 nikola-jokic