Occasionally dind fails to start: "sed: write error"
The issue can be reproduced with the following script. It takes some time for the issue to appear.
#!/bin/bash
while true; do
DIND_CONTAINER_ID=$(docker run -t --privileged -d docker:26.1.2-dind)
echo $DIND_CONTAINER_ID
while ! docker exec "$DIND_CONTAINER_ID" docker info | grep "Server Version: 26.1.2"; do
sleep 1
done
docker stop $DIND_CONTAINER_ID
docker rm $DIND_CONTAINER_ID
done
.
.
.
fc7502cfe87e48cfd464d4a5713f2efaf5e2b4341d5a13d5381c324ff80ec8df
ERROR: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
errors pretty printing info
ERROR: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
errors pretty printing info
Server Version: 26.1.2
fc7502cfe87e48cfd464d4a5713f2efaf5e2b4341d5a13d5381c324ff80ec8df
fc7502cfe87e48cfd464d4a5713f2efaf5e2b4341d5a13d5381c324ff80ec8df
4940ab34359e57a661bc80bfa6aa9afa2de2cf9fc2b2609a0ec595044e3c314e
ERROR: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
errors pretty printing info
ERROR: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
errors pretty printing info
ERROR: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
errors pretty printing info
Error response from daemon: container 4940ab34359e57a661bc80bfa6aa9afa2de2cf9fc2b2609a0ec595044e3c314e is not running
Error response from daemon: container 4940ab34359e57a661bc80bfa6aa9afa2de2cf9fc2b2609a0ec595044e3c314e is not running
Error response from daemon: container 4940ab34359e57a661bc80bfa6aa9afa2de2cf9fc2b2609a0ec595044e3c314e is not running
Error response from daemon: container 4940ab34359e57a661bc80bfa6aa9afa2de2cf9fc2b2609a0ec595044e3c314e is not running
Error response from daemon: container 4940ab34359e57a661bc80bfa6aa9afa2de2cf9fc2b2609a0ec595044e3c314e is not running
.
.
.
$ docker logs 4940ab34359e57a661
Certificate request self-signature ok
subject=CN = docker:dind server
/certs/server/cert.pem: OK
Certificate request self-signature ok
subject=CN = docker:dind client
/certs/client/cert.pem: OK
cat: can't open '/proc/net/ip6_tables_names': No such file or directory
cat: can't open '/proc/net/arp_tables_names': No such file or directory
iptables v1.8.10 (nf_tables)
sed: write error
Oh, your case is slightly different from the one I commented in https://github.com/docker-library/docker/issues/308#issuecomment-2115722582 -- in your case, I think it might actually be your docker exec that's causing the problem (since you're creating processes while the script is still trying to initialize and thus exacerbating the inherent race between the lines of the dind script trying to set up the cgroup appropriately).
What I might suggest instead is putting /run in a shared volume and using docker run for your docker info checks instead of docker exec (connecting to the socket from a second container instead of going into the first).
I seem to have a similar issue on cgroups v2 while in a loop using docker stats to wait to see if the daemon has started in a tinied container. Could that have the same issue as exec here that hits this race condition?
https://github.com/gocd/docker-gocd-agent-docker-dind
#!/bin/bash
$(which dind) dockerd --host=unix:///var/run/docker.sock ${DOCKERD_ADDITIONAL_ARGS:-'--host=tcp://localhost:2375'} > /var/log/dockerd.log 2>&1 &
waited=0
until [ $waited -ge ${DOCKERD_MAX_WAIT_SECS:-30} ] || docker stats --no-stream; do
sleep 1
((waited++))
done
# shellcheck disable=SC2181
if ! docker stats --no-stream; then
echo "dockerd startup failed..."
cat /var/log/dockerd.log
exit 1
fi
echo "dockerd started"
disown
[root@ip-172-31-45-33 bin]# docker run --privileged docker.io/gocddev/gocd-dev-build:dind-v3.19.11
$ sudo /run-docker-daemon.sh
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
dockerd startup failed...
echo: write error: No such process
sed: write error
/docker-entrypoint.sh: cannot sudo /run-docker-daemon.sh
Bit confused as it seems to work on the overall host for a while, and then persistently fails for new dind containers created after a time.
/ $ ps -ef
UID PID PPID C STIME TTY TIME CMD
go 1 0 0 04:42 ? 00:00:00 /bin/bash /docker-entrypoint.sh
root 7 1 0 04:42 ? 00:00:00 sudo /run-docker-daemon.sh
root 8 7 0 04:42 ? 00:00:00 /bin/bash /run-docker-daemon.sh
root 277 8 0 04:42 ? 00:00:00 sleep 1
Is this essentially the same issue mentioned here, potentially requiring a similar workaround using docker run to check the daemon is up, rather than docker stats? :-(
Adding a dodgy sleep 1 before forking any more "wait" processes such as our docker stats calls seems to work sufficiently to give the code at https://github.com/moby/moby/blob/b249c5ebd214e2977d0fdb1e07d82366f5849cf9/hack/dind#L59-L69 time to do its thing with a stable # of processes in the container, but does seem a bit unreliable, so should probably find a better way.
Yep, that's probably the exact same race! However, see https://github.com/moby/moby/pull/48850 for a promising PR that might ~fix the race in the upstream dind script :eyes:
Ahh, thanks! Somehow didn't come across that PR in my searching 🫡 looks promising.