docker icon indicating copy to clipboard operation
docker copied to clipboard

Occasionally dind fails to start: "sed: write error"

Open rousku opened this issue 1 year ago • 2 comments

The issue can be reproduced with the following script. It takes some time for the issue to appear.

#!/bin/bash

while true; do
   DIND_CONTAINER_ID=$(docker run -t --privileged -d docker:26.1.2-dind)
   echo $DIND_CONTAINER_ID
   while ! docker exec "$DIND_CONTAINER_ID" docker info | grep "Server Version: 26.1.2"; do
   	sleep 1
   done
   docker stop $DIND_CONTAINER_ID
   docker rm $DIND_CONTAINER_ID
done
.
.
.
fc7502cfe87e48cfd464d4a5713f2efaf5e2b4341d5a13d5381c324ff80ec8df
ERROR: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
errors pretty printing info
ERROR: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
errors pretty printing info
 Server Version: 26.1.2
fc7502cfe87e48cfd464d4a5713f2efaf5e2b4341d5a13d5381c324ff80ec8df
fc7502cfe87e48cfd464d4a5713f2efaf5e2b4341d5a13d5381c324ff80ec8df
4940ab34359e57a661bc80bfa6aa9afa2de2cf9fc2b2609a0ec595044e3c314e
ERROR: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
errors pretty printing info
ERROR: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
errors pretty printing info
ERROR: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
errors pretty printing info
Error response from daemon: container 4940ab34359e57a661bc80bfa6aa9afa2de2cf9fc2b2609a0ec595044e3c314e is not running
Error response from daemon: container 4940ab34359e57a661bc80bfa6aa9afa2de2cf9fc2b2609a0ec595044e3c314e is not running
Error response from daemon: container 4940ab34359e57a661bc80bfa6aa9afa2de2cf9fc2b2609a0ec595044e3c314e is not running
Error response from daemon: container 4940ab34359e57a661bc80bfa6aa9afa2de2cf9fc2b2609a0ec595044e3c314e is not running
Error response from daemon: container 4940ab34359e57a661bc80bfa6aa9afa2de2cf9fc2b2609a0ec595044e3c314e is not running
.
.
.
$ docker logs 4940ab34359e57a661
Certificate request self-signature ok
subject=CN = docker:dind server
/certs/server/cert.pem: OK
Certificate request self-signature ok
subject=CN = docker:dind client
/certs/client/cert.pem: OK
cat: can't open '/proc/net/ip6_tables_names': No such file or directory
cat: can't open '/proc/net/arp_tables_names': No such file or directory
iptables v1.8.10 (nf_tables)
sed: write error

rousku avatar May 16 '24 12:05 rousku

Oh, your case is slightly different from the one I commented in https://github.com/docker-library/docker/issues/308#issuecomment-2115722582 -- in your case, I think it might actually be your docker exec that's causing the problem (since you're creating processes while the script is still trying to initialize and thus exacerbating the inherent race between the lines of the dind script trying to set up the cgroup appropriately).

tianon avatar May 16 '24 16:05 tianon

What I might suggest instead is putting /run in a shared volume and using docker run for your docker info checks instead of docker exec (connecting to the socket from a second container instead of going into the first).

tianon avatar May 16 '24 16:05 tianon

I seem to have a similar issue on cgroups v2 while in a loop using docker stats to wait to see if the daemon has started in a tinied container. Could that have the same issue as exec here that hits this race condition?

https://github.com/gocd/docker-gocd-agent-docker-dind

#!/bin/bash
$(which dind) dockerd --host=unix:///var/run/docker.sock ${DOCKERD_ADDITIONAL_ARGS:-'--host=tcp://localhost:2375'} > /var/log/dockerd.log 2>&1 &

waited=0
until [ $waited -ge ${DOCKERD_MAX_WAIT_SECS:-30} ] || docker stats --no-stream; do
  sleep 1
  ((waited++))
done
# shellcheck disable=SC2181
if ! docker stats --no-stream; then
  echo "dockerd startup failed..."
  cat /var/log/dockerd.log
  exit 1
fi
echo "dockerd started"
disown
[root@ip-172-31-45-33 bin]# docker run --privileged docker.io/gocddev/gocd-dev-build:dind-v3.19.11
$ sudo /run-docker-daemon.sh
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
dockerd startup failed...
echo: write error: No such process
sed: write error
/docker-entrypoint.sh: cannot sudo /run-docker-daemon.sh

Bit confused as it seems to work on the overall host for a while, and then persistently fails for new dind containers created after a time.

/ $ ps -ef
UID          PID    PPID  C STIME TTY          TIME CMD
go             1       0  0 04:42 ?        00:00:00 /bin/bash /docker-entrypoint.sh
root           7       1  0 04:42 ?        00:00:00 sudo /run-docker-daemon.sh
root           8       7  0 04:42 ?        00:00:00 /bin/bash /run-docker-daemon.sh
root         277       8  0 04:42 ?        00:00:00 sleep 1

Is this essentially the same issue mentioned here, potentially requiring a similar workaround using docker run to check the daemon is up, rather than docker stats? :-(

chadlwilson avatar Dec 10 '24 04:12 chadlwilson

Adding a dodgy sleep 1 before forking any more "wait" processes such as our docker stats calls seems to work sufficiently to give the code at https://github.com/moby/moby/blob/b249c5ebd214e2977d0fdb1e07d82366f5849cf9/hack/dind#L59-L69 time to do its thing with a stable # of processes in the container, but does seem a bit unreliable, so should probably find a better way.

chadlwilson avatar Dec 10 '24 06:12 chadlwilson

Yep, that's probably the exact same race! However, see https://github.com/moby/moby/pull/48850 for a promising PR that might ~fix the race in the upstream dind script :eyes:

tianon avatar Dec 10 '24 17:12 tianon

Ahh, thanks! Somehow didn't come across that PR in my searching 🫡 looks promising.

chadlwilson avatar Dec 10 '24 17:12 chadlwilson