actions-runner-controller icon indicating copy to clipboard operation
actions-runner-controller copied to clipboard

Stuck at "Job is waiting for a runner from 'runner-name' to come online" in DinD-mode

Open paranerd opened this issue 1 year ago • 1 comments

Checks

  • [X] I've already read https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners-with-actions-runner-controller/troubleshooting-actions-runner-controller-errors and I'm sure my issue is not covered in the troubleshooting guide.
  • [X] I am using charts that are officially provided

Controller Version

0.9.1

Deployment Method

Helm

Checks

  • [X] This isn't a question or user support case (For Q&A and community support, go to Discussions).
  • [X] I've read the Changelog before submitting this issue and I'm sure it's not due to any recently-introduced backward-incompatible changes

To Reproduce

1. Installed ARC as per [these instructions](https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners-with-actions-runner-controller/quickstart-for-actions-runner-controller#installing-actions-runner-controller)
1. Deployed a runner as per [those instructions](https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners-with-actions-runner-controller/quickstart-for-actions-runner-controller#installing-actions-runner-controller)
    - Basically just downloaded the official [values](https://github.com/actions/actions-runner-controller/blob/master/charts/gha-runner-scale-set/values.yaml) to `my-values.yaml`
    - Uncommented lines 78+79 (`containerMode`)
    - Uncommented lines 114-158 (`template.spec`)
    - Set `--values "my-values.yaml`
- Installed via Helm
- Runner shows up in GitHub
- Running a job gets stuck in the above mentioned state

Describe the bug

When trying to host a DinD container, the runner shows up in GitHub but when trying to run jobs on it, it just gets stuck waiting.

Deploying a "regular" controller works as expected, though.

Describe the expected behavior

The DinD container should pick up available jobs and run them.

Additional Context

githubConfigUrl: ""

githubConfigSecret:
  github_token: ""

containerMode:
  type: "dind"

template:
  spec:
    initContainers:
    - name: init-dind-externals
      image: ghcr.io/actions/actions-runner:latest
      command: ["cp", "-r", "-v", "/home/runner/externals/.", "/home/runner/tmpDir/"]
      volumeMounts:
        - name: dind-externals
          mountPath: /home/runner/tmpDir
    containers:
    - name: runner
      image: ghcr.io/actions/actions-runner:latest
      command: ["/home/runner/run.sh"]
      env:
        - name: DOCKER_HOST
          value: unix:///var/run/docker.sock
      volumeMounts:
        - name: work
          mountPath: /home/runner/_work
        - name: dind-sock
          mountPath: /var/run
    - name: dind
      image: docker:dind
      args:
        - dockerd
        - --host=unix:///var/run/docker.sock
        - --group=$(DOCKER_GROUP_GID)
      env:
        - name: DOCKER_GROUP_GID
          value: "123"
      securityContext:
        privileged: true
      volumeMounts:
        - name: work
          mountPath: /home/runner/_work
        - name: dind-sock
          mountPath: /var/run
        - name: dind-externals
          mountPath: /home/runner/externals
    volumes:
    - name: work
      emptyDir: {}
    - name: dind-sock
      emptyDir: {}
    - name: dind-externals
      emptyDir: {}

Controller Logs

https://gist.github.com/paranerd/d41dd1de26c3c18c67ae179f41afb67b

Runner Pod Logs

I don't have those as the runner never even starts in the first place.

paranerd avatar Apr 30 '24 15:04 paranerd

Hello! Thank you for filing an issue.

The maintainers will triage your issue shortly.

In the meantime, please take a look at the troubleshooting guide for bug reports.

If this is a feature request, please review our contribution guidelines.

github-actions[bot] avatar Apr 30 '24 15:04 github-actions[bot]

Hey @paranerd,

If you inspect the log, it says that:

2024-04-30T13:24:54Z ERROR EphemeralRunner Failed to create pod resource for ephemeral runner. {"ephemeralrunner": {"name":"arc-runner-set-docker-1-998lp-runner-9v6sv","namespace":"arc-runners-docker-1"}, "error": "Pod "arc-runner-set-docker-1-998lp-runner-9v6sv" is invalid: [spec.volumes[3].name: Duplicate value: "dind-sock", spec.volumes[4].name: Duplicate value: "dind-externals", spec.initContainers[1].name: Duplicate value: "init-dind-externals"]"}

Since you already expanded the spec, you should leave container mode commented out.

nikola-jokic avatar May 23 '24 13:05 nikola-jokic

Thanks for looking into this!

As it turns out, I'm having the same issue as described here.

I fixed it by removing the containerMode lines (as you suggested) and using the following specs:

template:
spec:
initContainers:
- name: init-dind-externals
image: [ghcr.io/actions/actions-runner:latest](http://ghcr.io/actions/actions-runner:latest)
command: ['cp', '-r', '-v', '/home/runner/externals/.', '/home/runner/tmpDir/']
volumeMounts:
- name: dind-externals
mountPath: /home/runner/tmpDir
containers:
- name: runner
image: [ghcr.io/actions/actions-runner:latest](http://ghcr.io/actions/actions-runner:latest)
command: ['/home/runner/run.sh']
env:
- name: DOCKER_HOST
value: unix:///run/docker/docker.sock
volumeMounts:
- name: work
mountPath: /home/runner/_work
- name: dind-sock
mountPath: /run/docker
readOnly: true
- name: dind
image: docker:dind
args:
- dockerd
- --host=unix:///run/docker/docker.sock
- --group=$(DOCKER_GROUP_GID)
env:
- name: DOCKER_GROUP_GID
value: '123'
- name: DOCKER_IPTABLES_LEGACY
value: '1'
resources:
requests:
memory: "500Mi"
cpu: "300m"
limits:
memory: "500Mi"
cpu: "300m"
securityContext:
privileged: true
volumeMounts:
- name: work
mountPath: /home/runner/_work
- name: dind-sock
mountPath: /run/docker
- name: dind-externals
mountPath: /home/runner/externals
volumes:
- name: work
emptyDir: {}
- name: dind-sock
emptyDir: {}
- name: dind-externals
emptyDir: {}

with an emphasis on

- name: DOCKER_IPTABLES_LEGACY
  value: '1'

which seems to be the main fix.

paranerd avatar May 23 '24 13:05 paranerd

Thank you for letting us know! Legacy IP tables seems to be a problem on some platforms, but I'm just not sure if it should be the default spec that we expand to :confused:

nikola-jokic avatar May 23 '24 13:05 nikola-jokic