buildx icon indicating copy to clipboard operation
buildx copied to clipboard

The cache export step hangs

Open Xplouder opened this issue 4 years ago • 18 comments

Hi,

first of all, sorry if this is a double post, but since the other reports I found are a kite old and without recent activity, I decided/try to sum it all here:

Looks like the "preparing build cache for the export" step is hanging pointing to some kind of bug here: image In my last tries, it had more than 1h, no CPU usage, just stuck. Meaning that beside inline cache type, the others are unusable.

How to reproduce:

  • dockerfile with multistage build
  • cache-to with local or registry type with mode default/max
  • first export will work well, however in the second run (maybe due cache-from) it will hang up infinitely

notes:

  • I also used a multi-platform build but I don't think it's related
  • tested in a GitLab CI pipeline

Samples

Here is the build commands that i used with just some redacted commands:

local type:

    - docker buildx build
      --cache-from=type=local,src=docker_cache/
      --cache-to=type=local,dest=docker_cache/
      --ssh default=...
      --output type=image,name=registry.dev:foo,push=true
      --platform=linux/amd64,linux/arm,linux/arm64
      .

registry type:

    - docker buildx build
      --cache-from=type=registry,ref=registry.dev/cache
      --cache-to=type=registry,ref=registry.dev/cache
      --ssh default=...
      --output type=image,name=registry:foo,push=true
      --platform=linux/amd64,linux/arm,linux/arm64
      .

Other reports that might be related:

  • https://github.com/moby/buildkit/issues/1704
  • https://github.com/docker/buildx/issues/237

Xplouder avatar Feb 08 '21 18:02 Xplouder

please post a runnable reproducer

tonistiigi avatar Feb 09 '21 02:02 tonistiigi

Same here. There seems to be a lot of similar issues here.
I have been stuck with output=type=local,dest=path as well.
I can reproduce the issue but I don't know how to make a minimal reproducer. All I know is that it got stuck in the "copying file" after image is built. BTW it is a multi-platform build, the arm64 image is successfully exported but armv7 always gets stuck in the copying to output stage, after the image is built successfully.

umonaca avatar Mar 17 '21 06:03 umonaca

I have been trying to figure out for the entire day why a simple docker buildx filesystem export hangs in GitHub Actions, while it works just fine locally in WSL2. Maybe this is the same issue? https://twitter.com/awakecoding/status/1430252223771054084

awakecoding avatar Aug 24 '21 21:08 awakecoding

opened https://github.com/grpc/grpc-go/issues/4722

tonistiigi avatar Aug 31 '21 01:08 tonistiigi

If someone can make a reproducer using --cache-to that fails in a similar way @awakecoding did to -o type=local with a reproducible system, I could look if it is similar. Still don't quite understand what is the difference between local and tar output if it breaks in grpc level. It could be that local transfers files individually but neither type=tar or --cache-to do not. @bendavies

tonistiigi avatar Sep 01 '21 18:09 tonistiigi

I have a similar issue. It doesn't happen all the time, and I'm not sure what triggers it so I can't give a reproducer. The build fully uses the --cache-from (all steps are marked as CACHED), which points to the same registry&image&tag as --cache-to, so I don't think anything actually needs to be pushed.

Deleting the --cache-to tag from the registry allows the next build to succeed.

Sorry, I don't have much more information.

hectorj-klaxoon avatar Jan 27 '22 13:01 hectorj-klaxoon

I saw this immediately after adding mode=max. I'm caching to/from registry,

worldspawn avatar Feb 23 '22 03:02 worldspawn

Any fix?

arikmaor avatar Aug 03 '22 08:08 arikmaor

@tonistiigi I created reproducible example in https://github.com/bbednarek/multiple-docker-build repo along with the workaround that we have taken (OCI layout).

I am basically building Docker image in 2 steps, using 2 different Dockerfiles: Docker.builder and Docker. You can find 2 different workflow files which are using 2 different ways to build the final Docker image (you can also run them manually and override default target platform):

On the top of that you can find Makefile which contains 3 self-descriptive jobs to run it locally:

  • buildx-docker -> make clean buildx-docker took around 50 min to complete
  • buildx-docker-oci -> make clean buildx-docker-oci took around 30 min to complete
  • build-docker -> make clean build-docker took around 5 min to complete

bbednarek avatar Nov 03 '23 00:11 bbednarek

I saw this immediately after adding mode=max. I'm caching to/from registry,

This issue is only about type=local . It was traced to https://github.com/docker/buildx/issues/537#issuecomment-908826867

tonistiigi avatar Nov 12 '23 19:11 tonistiigi

@tonistiigi Unfortunately, grpc/grpc-go#4722 was closed as stale. At this point, our cache exports are so slow, that we might as well not use docker caching at all.

jjhuff avatar Jan 18 '24 21:01 jjhuff