Rootless Docker in Docker documentation does not work
Checks
- [X] I've already read https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners-with-actions-runner-controller/troubleshooting-actions-runner-controller-errors and I'm sure my issue is not covered in the troubleshooting guide.
- [X] I am using charts that are officially provided
Controller Version
0.9.0
Deployment Method
Helm
Checks
- [X] This isn't a question or user support case (For Q&A and community support, go to Discussions).
- [X] I've read the Changelog before submitting this issue and I'm sure it's not due to any recently-introduced backward-incompatible changes
To Reproduce
1. Using own docs: https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners-with-actions-runner-controller/deploying-runner-scale-sets-with-actions-runner-controller#example-running-dind-rootless
2. Deploy Runner Scale Set using `dind-rootless` values.yaml
3. `dind` container `cmd` fails
Describe the bug
Documentation does not work for rootless dind, and previous functionality that existed in RunnerDeployment was removed, breaking an already existing solution.
Describe the expected behavior
dind container should exit cleanly allowing for docker usage on the runner container.
Additional Context
---
runnerScaleSetName: <redacted>
githubConfigUrl: <redacted>
githubConfigSecret: <redacted>
maxRunners: 16
minRunners: 0
metadata:
name: <redacted>
namespace: gha-runner-scale-set-controller
template:
metadata:
annotations:
cluster-autoscaler.kubernetes.io/safe-to-evict: "false"
spec:
nodeSelector:
cloud.google.com/gke-nodepool: gpu-single
kubernetes.io/arch: amd64
kubernetes.io/os: linux
volumes:
- name: tmpdir
emptyDir: {}
- name: work
emptyDir: {}
- name: dind-externals
emptyDir: {}
- name: dind-sock
emptyDir: {}
- name: dind-etc
emptyDir: {}
- name: dind-home
emptyDir: {}
initContainers:
- name: init-dind-externals
image: ghcr.io/actions/actions-runner:latest
command: ["cp", "-r", "-v", "/home/runner/externals/.", "/home/runner/tmpDir/"]
volumeMounts:
- name: dind-externals
mountPath: /home/runner/tmpDir
- name: init-dind-rootless
image: docker:dind-rootless
command:
- sh
- -c
- |
set -x
cp -a /etc/. /dind-etc/
echo 'runner:x:1001:1001:runner:/home/runner:/bin/ash' >> /dind-etc/passwd
echo 'runner:x:1001:' >> /dind-etc/group
echo 'runner:100000:65536' >> /dind-etc/subgid
echo 'runner:100000:65536' >> /dind-etc/subuid
chmod 755 /dind-etc;
chmod u=rwx,g=rx+s,o=rx /dind-home
chown 1001:1001 /dind-home
securityContext:
runAsUser: 0
volumeMounts:
- mountPath: /dind-etc
name: dind-etc
- mountPath: /dind-home
name: dind-home
containers:
- name: runner
image: ghcr.io/actions/actions-runner:latest
command: ["/home/runner/run.sh"]
env:
- name: DOCKER_HOST
value: unix:///var/run/docker.sock
volumeMounts:
- mountPath: /tmp
name: tmpdir
- name: work
mountPath: /home/runner/_work
- name: dind-sock
mountPath: /var/run
resources:
requests:
cpu: "2000m"
memory: "20Gi"
ephemeral-storage: "24Gi"
limits:
cpu: "3000m"
memory: "24Gi"
nvidia.com/gpu: 1
- name: dind
image: docker:dind-rootless
args:
- dockerd
- --host=unix:///var/run/docker.sock
securityContext:
privileged: true
runAsUser: 1001
runAsGroup: 1001
volumeMounts:
- name: work
mountPath: /home/runner/_work
- name: dind-sock
mountPath: /var/run
- name: dind-externals
mountPath: /home/runner/externals
- name: dind-etc
mountPath: /etc
- name: dind-home
mountPath: /home/runner
Controller Logs
Not applicable as the Pod is running and shutting down as expected.
Runner Pod Logs
https://gist.github.com/dillon-cullinan/82cabc257b19c8f0c172dc0b6808cf59
~~To get this working, there are a couple issues that had to be fixed. There is a typo in the provided chart in the docs:~~
~~ash should of course be bash.~~
Secondly, the latest dind-rootless container has a few issues. Rolled back the image version for docker to docker:24.0.6-dind-rootless and that solves some problems.
The second problem is the assumed socket used by docker which is defined in the docs as --host=unix:///var/run/docker.sock. After removing this argument from the command and letting the service choose whatever socket it wants, it chose the socket based on the UID: unix:///run/user/1001/docker.sock
With these two changes, it works. Here is the working PodSpec template:
template:
spec:
volumes:
- name: tmpdir
emptyDir: {}
- name: work
emptyDir: {}
- name: dind-externals
emptyDir: {}
- name: dind-sock
emptyDir: {}
- name: dind-etc
emptyDir: {}
- name: dind-home
emptyDir: {}
initContainers:
- name: init-dind-externals
image: ghcr.io/actions/actions-runner:latest
command: ["cp", "-r", "-v", "/home/runner/externals/.", "/home/runner/tmpDir/"]
volumeMounts:
- name: dind-externals
mountPath: /home/runner/tmpDir
- name: init-dind-rootless
image: docker:24.0.6-dind-rootless
command:
- sh
- -c
- |
set -x
cp -a /etc/. /dind-etc/
echo 'runner:x:1001:1001:runner:/home/runner:/bin/ash' >> /dind-etc/passwd
echo 'runner:x:1001:' >> /dind-etc/group
echo 'runner:100000:65536' >> /dind-etc/subgid
echo 'runner:100000:65536' >> /dind-etc/subuid
chmod 755 /dind-etc;
chmod u=rwx,g=rx+s,o=rx /dind-home
chown 1001:1001 /dind-home
securityContext:
runAsUser: 0
volumeMounts:
- mountPath: /dind-etc
name: dind-etc
- mountPath: /dind-home
name: dind-home
containers:
- name: runner
image: ghcr.io/actions/actions-runner:latest
command: ["/home/runner/run.sh"]
env:
- name: DOCKER_HOST
value: unix:///run/user/1001/docker.sock
volumeMounts:
- mountPath: /tmp
name: tmpdir
- name: work
mountPath: /home/runner/_work
- name: dind-sock
mountPath: /var/run
- name: dind
image: docker:24.0.6-dind-rootless
args:
- dockerd
securityContext:
privileged: true
runAsUser: 1001
runAsGroup: 1001
volumeMounts:
- name: work
mountPath: /home/runner/_work
- name: dind-sock
mountPath: /var/run
- name: dind-externals
mountPath: /home/runner/externals
- name: dind-etc
mountPath: /etc
- name: dind-home
mountPath: /home/runner
Are you on GKE COS nodes? I was able to get things started by building an Ubuntu node pool and pining my containers there.
edit:
To add more details here I get the following error when running on COS based images in GKE regardless of utilizing docker:24.0.6-dind-rootless or docker:dind-rootless
Error Message:
time="2024-05-03T22:57:33.920775537Z" level=info msg="unable to detect if iptables supports xlock: 'iptables --wait -L -n': `iptables v1.8.10 (nf_tables): Could not fetch rule set generation id: Invalid argument`" error="exit status 4"
time="2024-05-03T22:57:33.947346735Z" level=info msg="stopping event stream following graceful shutdown" error="<nil>" module=libcontainerd namespace=moby
time="2024-05-03T22:57:33.947888475Z" level=info msg="stopping healthcheck following graceful shutdown" module=libcontainerd
time="2024-05-03T22:57:33.947924935Z" level=info msg="stopping event stream following graceful shutdown" error="context canceled" module=libcontainerd namespace=plugins.moby
failed to start daemon: Error initializing network controller: error obtaining controller instance: failed to register "bridge" driver: failed to create NAT chain DOCKER: iptables failed: iptables -t nat -N DOCKER: iptables v1.8.10 (nf_tables): Could not fetch rule set generation id: Invalid argument
(exit status 4)
[rootlesskit:child ] error: command [docker-init -- dockerd --host=unix:///socket/docker.sock] exited: exit status 1
[rootlesskit:parent] error: child exited: exit status 1
The GKE Ubuntu based OS image seems to start fine for either.
@dillon-cullinan I also don't believe that echo 'runner:x:1001:1001:runner:/home/runner:/bin/ash' >> /dind-etc/passwd is a typo of bash I believe this is an image without bash installed and it should be /bin/ash. You can see the unmodified file in the dind-rootless container are all /bin/ash
docker run -it --rm --entrypoint /bin/sh docker:dind-rootless
/ $ cat /etc/passwd
root:x:0:0:root:/root:/bin/ash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
adm:x:3:4:adm:/var/adm:/sbin/nologin
lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin
sync:x:5:0:sync:/sbin:/bin/sync
shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown
halt:x:7:0:halt:/sbin:/sbin/halt
mail:x:8:12:mail:/var/mail:/sbin/nologin
news:x:9:13:news:/usr/lib/news:/sbin/nologin
uucp:x:10:14:uucp:/var/spool/uucppublic:/sbin/nologin
operator:x:11:0:operator:/root:/sbin/nologin
man:x:13:15:man:/usr/man:/sbin/nologin
postmaster:x:14:12:postmaster:/var/mail:/sbin/nologin
cron:x:16:16:cron:/var/spool/cron:/sbin/nologin
ftp:x:21:21::/var/lib/ftp:/sbin/nologin
sshd:x:22:22:sshd:/dev/null:/sbin/nologin
at:x:25:25:at:/var/spool/cron/atjobs:/sbin/nologin
squid:x:31:31:Squid:/var/cache/squid:/sbin/nologin
xfs:x:33:33:X Font Server:/etc/X11/fs:/sbin/nologin
games:x:35:35:games:/usr/games:/sbin/nologin
cyrus:x:85:12::/usr/cyrus:/sbin/nologin
vpopmail:x:89:89::/var/vpopmail:/sbin/nologin
ntp:x:123:123:NTP:/var/empty:/sbin/nologin
smmsp:x:209:209:smmsp:/var/spool/mqueue:/sbin/nologin
guest:x:405:100:guest:/dev/null:/sbin/nologin
nobody:x:65534:65534:nobody:/:/sbin/nologin
dockremap:x:100:101:Linux User,,,:/home/dockremap:/sbin/nologin
rootless:x:1000:1000:Rootless:/home/rootless:/bin/ash
/ $ ls -la /bin/ash
lrwxrwxrwx 1 root root 12 Jan 26 17:53 /bin/ash -> /bin/busybox
/ $
The socket problem is for sure an issue I fought with last week. I ended up putting my socket in a volume and sharing it to /var/run/docker.sock. This is mostly due to caution as I saw this issue hanging out there https://github.com/actions/actions-runner-controller/issues/2519 where if your socket isn't at /var/run/docker.sock on the runner container side bad things happened, and I wasn't sure if that was all fixed or not.
Are you on GKE COS nodes? I was able to get things started by building an Ubuntu node pool and pining my containers there.
edit: To add more details here I get the following error when running on COS based images in GKE regardless of utilizing
docker:24.0.6-dind-rootlessordocker:dind-rootlessError Message:
time="2024-05-03T22:57:33.920775537Z" level=info msg="unable to detect if iptables supports xlock: 'iptables --wait -L -n': `iptables v1.8.10 (nf_tables): Could not fetch rule set generation id: Invalid argument`" error="exit status 4" time="2024-05-03T22:57:33.947346735Z" level=info msg="stopping event stream following graceful shutdown" error="<nil>" module=libcontainerd namespace=moby time="2024-05-03T22:57:33.947888475Z" level=info msg="stopping healthcheck following graceful shutdown" module=libcontainerd time="2024-05-03T22:57:33.947924935Z" level=info msg="stopping event stream following graceful shutdown" error="context canceled" module=libcontainerd namespace=plugins.moby failed to start daemon: Error initializing network controller: error obtaining controller instance: failed to register "bridge" driver: failed to create NAT chain DOCKER: iptables failed: iptables -t nat -N DOCKER: iptables v1.8.10 (nf_tables): Could not fetch rule set generation id: Invalid argument (exit status 4) [rootlesskit:child ] error: command [docker-init -- dockerd --host=unix:///socket/docker.sock] exited: exit status 1 [rootlesskit:parent] error: child exited: exit status 1The GKE Ubuntu based OS image seems to start fine for either.
Yes, we are using GKE COS and we have it working right now, its interesting you are running into issues as well despite the changes. We are using gke version 1.28.7-gke.1026000 just in case this matters.
@dillon-cullinan I also don't believe that
echo 'runner:x:1001:1001:runner:/home/runner:/bin/ash' >> /dind-etc/passwdis a typo of bash I believe this is an image without bash installed and it should be /bin/ash. You can see the unmodified file in the dind-rootless container are all /bin/ash[...]
Thank you for the correction, I've edited my previous comment.
On RunnerDeployments the setup is much easier from what I've experienced. The PodSpec has a value you set: dockerdWithinRunnerContainer: true .
For our containers we basically pulled bits and pieces from here: https://github.com/actions/actions-runner-controller/blob/master/runner/actions-runner-dind-rootless.ubuntu-20.04.dockerfile
Added the relevant lines from that Dockerfile into our custom stuff and it worked, you can probably just use this image as a base if it fits your use case.
Snippet of the runner container values:
command:
- bash
- -c
- "mkdir -p /home/runner/.docker/docker /home/runner/.local/share && ln -s /home/runner/.docker/docker /home/runner/.local/share/docker && /bin/bash /usr/bin/entrypoint-dind-rootless.sh"
securityContext:
privileged: true
With the dockerd value set and the proper image, it all works with a singular container inside the pod, no dind container, no init containers. Much cleaner in general.
We are currently on 1.26 due to many many developers that won't move off deprecated API versions for a few things. I will see if we can get to 1.28 and try again.
I ended up putting my socket in a volume and sharing it to /var/run/docker.sock.
What other adjustments did you need to make to do this? I assumed simply having the emptyDir dind-sock mounted to /var/run in both containers would be enough but obviously not.