cli icon indicating copy to clipboard operation
cli copied to clipboard

docker-ce 29.0.0 with nested overlayfs storage driver seems not support whiteout (deleting file on lower layer) in improper dind setup

Open ny-a opened this issue 2 months ago • 10 comments

Description

In docker-ce_29.0.0-1~debian.11~bullseye_amd64.deb, Cannot build or load a image with following Dockerfile:

FROM alpine
RUN rm /etc/hostname

the problem is: docker-ce 29.0.0-1 cannot delete files in lower layer.

Reproduce

Run following commands in docker run --rm -it --privileged debian:bullseye (docker in docker setup)

  1. apt-get update && apt-get install -y ca-certificates curl
  2. install -m 0755 -d /etc/apt/keyrings
  3. curl -fsSL https://download.docker.com/linux/debian/gpg -o /etc/apt/keyrings/docker.asc
  4. chmod a+r /etc/apt/keyrings/docker.asc
  5. tee /etc/apt/sources.list.d/docker.sources <<EOF Types: deb URIs: https://download.docker.com/linux/debian Suites: bullseye Components: stable Signed-By: /etc/apt/keyrings/docker.asc EOF
  6. apt-get update
  7. curl -O https://download.docker.com/linux/debian/dists/bullseye/pool/stable/amd64/docker-ce_29.0.0-1~debian.11~bullseye_amd64.deb
  8. apt-get install -y docker-ce_29.0.0-1~debian.11~bullseye_amd64.deb
  9. dockerd &
  10. mkdir whiteout && cd whiteout
  11. (echo "FROM alpine"; echo "RUN rm /etc/hostname") | tee Dockerfile
  12. docker build .

result:

[+] Building 0.9s (5/5)                                                                                                                                                                              docker:default
 => [internal] load build definition from Dockerfile                                                                                                                                                           0.0s
 => => transferring dockerfile: 70B                                                                                                                                                                            0.0s
 => [internal] load metadata for docker.io/library/alpine:latest                                                                                                                                               0.8s
[+] Building 1.0s (5/5) FINISHED                                                                                                                                                                     docker:default
 => [internal] load build definition from Dockerfile                                                                                                                                                           0.0s
 => => transferring dockerfile: 70B                                                                                                                                                                            0.0s
 => [internal] load metadata for docker.io/library/alpine:latest                                                                                                                                               0.8s
 => [internal] load .dockerignore                                                                                                                                                                              0.0s
 => => transferring context: 2B                                                                                                                                                                                0.0sv
 => CACHED [1/2] FROM docker.io/library/alpine:latest@sha256:4b7ce07002c69e8f3d704a9c5d6fd3053be500b7f1c69fc0d80990c2ad8dd412                                                                                  0.0sy
 => => resolve docker.io/library/alpine:latest@sha256:4b7ce07002c69e8f3d704a9c5d6fd3053be500b7f1c69fc0d80990c2ad8dd412                                                                                         0.0sf
 => ERROR [2/2] RUN rm /etc/hostname                                                                                                                                                                           0.1s
------
 > [2/2] RUN rm /etc/hostname:
------
Dockerfile:2
--------------------
   1 |     FROM alpine
   2 | >>> RUN rm /etc/hostname
   3 |     
--------------------
ERROR: failed to build: failed to solve: process "/bin/sh -c rm /etc/hostname" did not complete successfully: mount source: "overlay", target: "/var/lib/docker/buildkit/containerd-overlayfs/cachemounts/buildkit4108886608", fstype: overlay, flags: 0, data: "workdir=/var/lib/docker/containerd/daemon/io.containerd.snapshotter.v1.overlayfs/snapshots/7/work,upperdir=/var/lib/docker/containerd/daemon/io.containerd.snapshotter.v1.overlayfs/snapshots/7/fs,lowerdir=/var/lib/docker/containerd/daemon/io.containerd.snapshotter.v1.overlayfs/snapshots/5/fs,index=off,redirect_dir=off", err: invalid argument

Expected behavior

Can build the image with 28.5.2-1~debian.11~bullseye

[+] Building 3.3s (7/7) FINISHED                                                                                                                                                                     docker:default
 => [internal] load build definition from Dockerfile                                                                                                                                                           0.0s
 => => transferring dockerfile: 70B                                                                                                                                                                            0.0s
 => [internal] load metadata for docker.io/library/alpine:latest                                                                                                                                               2.2s
 => [auth] library/alpine:pull token for registry-1.docker.io                                                                                                                                                  0.0s
 => [internal] load .dockerignore                                                                                                                                                                              0.0s
 => => transferring context: 2B                                                                                                                                                                                0.0s
 => [1/2] FROM docker.io/library/alpine:latest@sha256:4b7ce07002c69e8f3d704a9c5d6fd3053be500b7f1c69fc0d80990c2ad8dd412                                                                                         0.5s
 => => resolve docker.io/library/alpine:latest@sha256:4b7ce07002c69e8f3d704a9c5d6fd3053be500b7f1c69fc0d80990c2ad8dd412                                                                                         0.0s
 => => sha256:4b7ce07002c69e8f3d704a9c5d6fd3053be500b7f1c69fc0d80990c2ad8dd412 9.22kB / 9.22kB                                                                                                                 0.0s
 => => sha256:85f2b723e106c34644cd5851d7e81ee87da98ac54672b29947c052a45d31dc2f 1.02kB / 1.02kB                                                                                                                 0.0s
 => => sha256:706db57fb2063f39f69632c5b5c9c439633fda35110e65587c5d85553fd1cc38 581B / 581B                                                                                                                     0.0s
 => => sha256:2d35ebdb57d9971fea0cac1582aa78935adf8058b2cc32db163c98822e5dfa1b 3.80MB / 3.80MB                                                                                                                 0.3s
 => => extracting sha256:2d35ebdb57d9971fea0cac1582aa78935adf8058b2cc32db163c98822e5dfa1b                                                                                                                      0.1s
 => [2/2] RUN rm /etc/hostname                                                                                                                                                                                 0.2s
 => exporting to image                                                                                                                                                                                         0.2s
 => => exporting layers                                                                                                                                                                                        0.1s
 => => writing image sha256:321c98552e5e5038b745530ffb8dd25ab3e297d59fabca4956520a7a9967aee8                                                                                                                   0.0s
 => => naming to docker.io/library/whiteout  

docker version

Client: Docker Engine - Community
 Version:           29.0.0
 API version:       1.52
 Go version:        go1.25.4
 Git commit:        3d4129b
 Built:             Mon Nov 10 21:47:13 2025
 OS/Arch:           linux/amd64
 Context:           default

Server: Docker Engine - Community
 Engine:
  Version:          29.0.0
  API version:      1.52 (minimum version 1.44)
  Go version:       go1.25.4
  Git commit:       d105562
  Built:            Mon Nov 10 21:47:13 2025
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          v2.1.5
  GitCommit:        fcd43222d6b07379a4be9786bda52438f0dd16a1
 runc:
  Version:          1.3.3
  GitCommit:        v1.3.3-0-gd842d771
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

docker info

Client: Docker Engine - Community
 Version:    29.0.0
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.29.1
    Path:     /usr/libexec/docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  v2.40.3
    Path:     /usr/libexec/docker/cli-plugins/docker-compose

Server:
 Containers: 0
  Running: 0
  Paused: 0
  Stopped: 0
 Images: 0
 Server Version: 29.0.0
 Storage Driver: overlayfs
  driver-type: io.containerd.snapshotter.v1
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog
 CDI spec directories:
  /etc/cdi
  /var/run/cdi
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: fcd43222d6b07379a4be9786bda52438f0dd16a1
 runc version: v1.3.3-0-gd842d771
 init version: de40ad0
 Security Options:
  seccomp
   Profile: builtin
  cgroupns
 Kernel Version: 6.17.7-arch1-1
 Operating System: Debian GNU/Linux 11 (bullseye)
 OSType: linux
 Architecture: x86_64
 CPUs: 16
 Total Memory: 30.96GiB
 Name: eb09c1a85cc1
 ID: 8cad55b5-9fc1-47e4-97b8-2d7eeec169f8
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Experimental: false
 Insecure Registries:
  ::1/128
  127.0.0.0/8
 Live Restore Enabled: false
 Firewall Backend: iptables

Additional Info

No response

ny-a avatar Nov 11 '25 16:11 ny-a

Thanks for reporting. This looks to only reproduce when running docker-in-docker without a volume attached for docker's storage (so overlayFS on top of overlayFS);

Docker 29 uses containerd snapshotters (storage drivers) by default, which (unlike the legacy "graph-driver" storage drivers) does not automatically fallback to alternative storage drivers if overlayFS cannot be supported.

When disabling the containerd snapshotter (using --feature containerd-snapshotter=false) and using the legacy graph-drivers to replicate the behavior of a v28.x docekr engine, the daemon logs show that it detected that overlayFS cannot be supported, downgrades the storage-driver to use vfs as a fallback;

dockerd --feature containerd-snapshotter=false

INFO[2025-11-11T17:41:18.059950726Z] Loading containers: start.
ERRO[2025-11-11T17:41:18.064219439Z] failed to mount overlay: invalid argument     storage-driver=overlay2
ERRO[2025-11-11T17:41:18.065310488Z] exec: "fuse-overlayfs": executable file not found in $PATH  storage-driver=fuse-overlayfs
INFO[2025-11-11T17:41:18.072594682Z] Restoring containers: start.
INFO[2025-11-11T17:41:18.086983766Z] Deleting nftables IPv4 rules                  error="exit status 1"
INFO[2025-11-11T17:41:18.098105207Z] Deleting nftables IPv6 rules                  error="exit status 1"
INFO[2025-11-11T17:41:18.550636934Z] Loading containers: done.
INFO[2025-11-11T17:41:18.561385744Z] Docker daemon                                 commit=d105562 containerd-snapshotter=false storage-driver=vfs version=29.0.0

Generally, running overlayFS on top of overlayFS is not recommended, and may indeed cause issues like this, which is why the official docker-in-docker image defines a volume for the /var/lib/docker directory; https://github.com/docker-library/docker/blob/319e58aa0299128924649f0745054a1b8732545a/29/dind/Dockerfile#L104

There's some options;

First of all, when running docker-in-docker, you may also want to use the docker-in-docker script as entrypoint for your container (or use the official docker-in-docker image), as it handle setting up various things to make docker work correctly; https://github.com/docker-library/docker/blob/319e58aa0299128924649f0745054a1b8732545a/29/dind/Dockerfile#L98-L100

To work around the problem of nested overlayFS, you can attach a volume at /var/lib/docker;

docker run --rm -it --privileged -v /var/lib/docker debian:bullseye

If attaching a volume is not an option for you, you can configure the daemon to use the native snapshotter (storage-driver), which is the equivalent of the vfs graph-driver. HOWEVER using vfs (or native) is really intended as a last-resort; using this driver is not optimal, as it creates a full copy of all files in all layers of images used (and for every container, or build-step as part of your Dockerfile).

You can start the daemon with the native storage driver (snapshotter) using dockerd --storage-driver=native (or the equivalent in /etc/docker/daemon.json); the docker info output should then show something like;

 Server Version: 29.0.0
 Storage Driver: native
  driver-type: io.containerd.snapshotter.v1

thaJeztah avatar Nov 11 '25 18:11 thaJeztah

Thank you for your reply. To work around this issue, I will temporarily fix Docker CE to version 28. I will also attempt to properly configure docker-in-docker, or set up a volume for Amazon ECS in our production environment.

ny-a avatar Nov 12 '25 09:11 ny-a

Thanks for the details here.

I presume it's the same root cause when using buildx inside a container, since buildx creates its own buildkit builder container?

 > [internal] booting buildkit:
------
ERROR: Error response from daemon: failed to mount /tmp/containerd-mount538338422: mount source: "overlay", target: "/tmp/containerd-mount538338422", fstype: overlay, flags: 0, data: "workdir=/var/lib/docker/containerd/daemon/io.containerd.snapshotter.v1.overlayfs/snapshots/8/work,upperdir=/var/lib/docker/containerd/daemon/io.containerd.snapshotter.v1.overlayfs/snapshots/8/fs,lowerdir=/var/lib/docker/containerd/daemon/io.containerd.snapshotter.v1.overlayfs/snapshots/6/fs:/var/lib/docker/containerd/daemon/io.containerd.snapshotter.v1.overlayfs/snapshots/5/fs:/var/lib/docker/containerd/daemon/io.containerd.snapshotter.v1.overlayfs/snapshots/4/fs:/var/lib/docker/containerd/daemon/io.containerd.snapshotter.v1.overlayfs/snapshots/3/fs:/var/lib/docker/containerd/daemon/io.containerd.snapshotter.v1.overlayfs/snapshots/2/fs:/var/lib/docker/containerd/daemon/io.containerd.snapshotter.v1.overlayfs/snapshots/1/fs,index=off", err: invalid argument

To be fair, I am running an older buildkit version (0.19.0) due to various issues with qemu/buildx being broken when working with Ubuntu 22.04 images; so I may need to try upgrading that.

chadlwilson avatar Nov 12 '25 12:11 chadlwilson

To be fair, I am running an older buildkit version (0.19.0) due to https://github.com/docker/buildx/issues/3036 with qemu/buildx being broken when working with Ubuntu 22.04 images; so I may need to try upgrading that.

Having a quick look at what current versions of buildx set for these containers, and I see it sets a volume at /var/lib/buildkit;

docker inspect --format '{{json .Config.Volumes}}' buildx_buildkit_eloquent_aryabhata0
{"/var/lib/buildkit":{}}

thaJeztah avatar Nov 12 '25 14:11 thaJeztah

Hmm, OK, not sure where this came from, but with 0.19.0:

$ docker inspect --format '{{json .Config.Volumes}}' buildx_buildkit_gocd-builder0
{"/home/user/.local/share/buildkit":{}}

chadlwilson avatar Nov 12 '25 14:11 chadlwilson

Hmm, OK, not sure where this came from, but with 0.19.0:

Not sure; wondering if that mount is for rootless? Having a quick look at the v0.19 code in buildx, I see it sets a mount here; https://github.com/docker/buildx/blob/v0.19.0/driver/docker-container/driver.go#L127-L133

Which gets mounted at /var/lib/buildkit AFAICS; https://github.com/docker/buildx/blob/v0.19.0/util/confutil/container.go#L14-L21

@crazy-max you probably have a better insight

thaJeztah avatar Nov 13 '25 12:11 thaJeztah

To be fair, I am running an older buildkit version (0.19.0) due to various issues with qemu/buildx being broken when working with Ubuntu 22.04 images; so I may need to try upgrading that.

@chadlwilson We bumped to latest QEMU 10.0.4 in BuildKit v0.25.0: https://github.com/moby/buildkit/commit/d5d5b082c7ee61a1b0141b8f66acb04dc357926c. Can you try with latest BuildKit v0.26.0?

Also what is your Buildx version?

crazy-max avatar Nov 13 '25 13:11 crazy-max

Also what is your Buildx version?

Latest available on your official RHEL 10 repository, so 0.29.1.

wondering if that mount is for rootless?

Ahh yes, sorry, it is buildkit rootless.

I have since updated to buildkit rootless 0.25.1 (buildx-stable-1-rootless, still with Docker 28.5.x) and the March-era binfmt/qemu and whatnot issues with Ubuntu 22 seem to have been resolved, so I am in a better position to try again with Docker 29 and sort out the VOLUME mounts at all DIND layers (probably an overdue performance improvement anyway).

I’ll share when I’ve tried that. In the meantime for any others:

  • Aside from my local testing, host docker is predominantly Docker 25 on AL2023 (kernel 6.12) VMs (I mention it, as some workflows mount the host socket (and some do not).
  • Next layer is Almalinux 10 containers with Docker/containerd/buildx installed via your official RPMs, but omitting all of the DIND script magic (for historical reasons prior to my time, which I need to review.)
  • on these Alma containers I am building other multi-arch amd64+arm64 containers via buildx, including one container that has an official docker:dind base image after which it loads it to local daemon and tries to run the helloworld container on it to sanity test the DIND image (temporarily 3 container layers)

This particular quagmire is almost certainly my/our fault, but either way, with the suggestions above I’m sure I’ll get through it, so appreciate the tips. 👍

chadlwilson avatar Nov 13 '25 13:11 chadlwilson

buildx all works fine now inside our Almalinux 10 custom DIND container after properly declaring a VOLUME /var/lib/docker on the image. Docker 29 with buildx 0.30., buildkit 0.25.1. Also works fine using build to build a derived alpine:28-dind image (note, 28 for now until we make some decisions on our default DOCKER_MIN_API_VERSION to ship with) and then running a hello-world container inside it (dind in dind).

As a bonus, everything builds are a lot faster now too!

Many thanks for the pointers here.

chadlwilson avatar Nov 15 '25 08:11 chadlwilson

Hi

I'm also having issues with the same error message:

##[error]ERROR: failed to build: failed to solve: process "/bin/sh -c echo "APT::Get::Assume-Yes \"true\";" > /etc/apt/apt.conf.d/90assumeyes" did not complete successfully: mount source: "overlay", target: "/var/lib/docker/buildkit/containerd-overlayfs/cachemounts/buildkit2533687527", fstype: overlay, flags: 0, data: "workdir=/var/lib/docker/containerd/daemon/io.containerd.snapshotter.v1.overlayfs/snapshots/5/work,upperdir=/var/lib/docker/containerd/daemon/io.containerd.snapshotter.v1.overlayfs/snapshots/5/fs,lowerdir=/var/lib/docker/containerd/daemon/io.containerd.snapshotter.v1.overlayfs/snapshots/1/fs,index=off,redirect_dir=off", err: invalid argument

Is there a fix for that?

ij-23 avatar Nov 23 '25 10:11 ij-23