Stuck at "Job is waiting for a runner from 'runner-name' to come online" in DinD-mode
Checks
- [X] I've already read https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners-with-actions-runner-controller/troubleshooting-actions-runner-controller-errors and I'm sure my issue is not covered in the troubleshooting guide.
- [X] I am using charts that are officially provided
Controller Version
0.9.1
Deployment Method
Helm
Checks
- [X] This isn't a question or user support case (For Q&A and community support, go to Discussions).
- [X] I've read the Changelog before submitting this issue and I'm sure it's not due to any recently-introduced backward-incompatible changes
To Reproduce
1. Installed ARC as per [these instructions](https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners-with-actions-runner-controller/quickstart-for-actions-runner-controller#installing-actions-runner-controller)
1. Deployed a runner as per [those instructions](https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners-with-actions-runner-controller/quickstart-for-actions-runner-controller#installing-actions-runner-controller)
- Basically just downloaded the official [values](https://github.com/actions/actions-runner-controller/blob/master/charts/gha-runner-scale-set/values.yaml) to `my-values.yaml`
- Uncommented lines 78+79 (`containerMode`)
- Uncommented lines 114-158 (`template.spec`)
- Set `--values "my-values.yaml`
- Installed via Helm
- Runner shows up in GitHub
- Running a job gets stuck in the above mentioned state
Describe the bug
When trying to host a DinD container, the runner shows up in GitHub but when trying to run jobs on it, it just gets stuck waiting.
Deploying a "regular" controller works as expected, though.
Describe the expected behavior
The DinD container should pick up available jobs and run them.
Additional Context
githubConfigUrl: ""
githubConfigSecret:
github_token: ""
containerMode:
type: "dind"
template:
spec:
initContainers:
- name: init-dind-externals
image: ghcr.io/actions/actions-runner:latest
command: ["cp", "-r", "-v", "/home/runner/externals/.", "/home/runner/tmpDir/"]
volumeMounts:
- name: dind-externals
mountPath: /home/runner/tmpDir
containers:
- name: runner
image: ghcr.io/actions/actions-runner:latest
command: ["/home/runner/run.sh"]
env:
- name: DOCKER_HOST
value: unix:///var/run/docker.sock
volumeMounts:
- name: work
mountPath: /home/runner/_work
- name: dind-sock
mountPath: /var/run
- name: dind
image: docker:dind
args:
- dockerd
- --host=unix:///var/run/docker.sock
- --group=$(DOCKER_GROUP_GID)
env:
- name: DOCKER_GROUP_GID
value: "123"
securityContext:
privileged: true
volumeMounts:
- name: work
mountPath: /home/runner/_work
- name: dind-sock
mountPath: /var/run
- name: dind-externals
mountPath: /home/runner/externals
volumes:
- name: work
emptyDir: {}
- name: dind-sock
emptyDir: {}
- name: dind-externals
emptyDir: {}
Controller Logs
https://gist.github.com/paranerd/d41dd1de26c3c18c67ae179f41afb67b
Runner Pod Logs
I don't have those as the runner never even starts in the first place.
Hello! Thank you for filing an issue.
The maintainers will triage your issue shortly.
In the meantime, please take a look at the troubleshooting guide for bug reports.
If this is a feature request, please review our contribution guidelines.
Hey @paranerd,
If you inspect the log, it says that:
2024-04-30T13:24:54Z ERROR EphemeralRunner Failed to create pod resource for ephemeral runner. {"ephemeralrunner": {"name":"arc-runner-set-docker-1-998lp-runner-9v6sv","namespace":"arc-runners-docker-1"}, "error": "Pod "arc-runner-set-docker-1-998lp-runner-9v6sv" is invalid: [spec.volumes[3].name: Duplicate value: "dind-sock", spec.volumes[4].name: Duplicate value: "dind-externals", spec.initContainers[1].name: Duplicate value: "init-dind-externals"]"}
Since you already expanded the spec, you should leave container mode commented out.
Thanks for looking into this!
As it turns out, I'm having the same issue as described here.
I fixed it by removing the containerMode lines (as you suggested) and using the following specs:
template:
spec:
initContainers:
- name: init-dind-externals
image: [ghcr.io/actions/actions-runner:latest](http://ghcr.io/actions/actions-runner:latest)
command: ['cp', '-r', '-v', '/home/runner/externals/.', '/home/runner/tmpDir/']
volumeMounts:
- name: dind-externals
mountPath: /home/runner/tmpDir
containers:
- name: runner
image: [ghcr.io/actions/actions-runner:latest](http://ghcr.io/actions/actions-runner:latest)
command: ['/home/runner/run.sh']
env:
- name: DOCKER_HOST
value: unix:///run/docker/docker.sock
volumeMounts:
- name: work
mountPath: /home/runner/_work
- name: dind-sock
mountPath: /run/docker
readOnly: true
- name: dind
image: docker:dind
args:
- dockerd
- --host=unix:///run/docker/docker.sock
- --group=$(DOCKER_GROUP_GID)
env:
- name: DOCKER_GROUP_GID
value: '123'
- name: DOCKER_IPTABLES_LEGACY
value: '1'
resources:
requests:
memory: "500Mi"
cpu: "300m"
limits:
memory: "500Mi"
cpu: "300m"
securityContext:
privileged: true
volumeMounts:
- name: work
mountPath: /home/runner/_work
- name: dind-sock
mountPath: /run/docker
- name: dind-externals
mountPath: /home/runner/externals
volumes:
- name: work
emptyDir: {}
- name: dind-sock
emptyDir: {}
- name: dind-externals
emptyDir: {}
with an emphasis on
- name: DOCKER_IPTABLES_LEGACY
value: '1'
which seems to be the main fix.
Thank you for letting us know! Legacy IP tables seems to be a problem on some platforms, but I'm just not sure if it should be the default spec that we expand to :confused: