buildkit icon indicating copy to clipboard operation
buildkit copied to clipboard

Disconnected moby.filesync spans with `docker buildx build`

Open kylelemons opened this issue 7 months ago • 2 comments

Contributing guidelines and issue reporting guide

Well-formed report checklist

  • [x] I have found a bug that the documentation does not mention anything about my problem
  • [x] I have found a bug that there are no open or closed issues that are related to my problem
  • [x] I have provided version/information about my environment and done my best to provide a reproducer

Description of bug

Bug description

OpenTelemetry traces for buildkit (from docker buildx build) come with a large number of "disconnected" spans (i.e. the parent Span IDs are invalid).

From my testing, I see disconnected spans with the following titles:

  • moby.filesync.v1.Auth/VerifyTokenAuthority
  • moby.filesync.v1.Auth/GetTokenAuthority
  • moby.filesync.v1.Auth/FetchToken
  • moby.filesync.v1.FileSync/DiffCopy

When viewed in Jaeger (see below), these spans come with the a warning like the following:

invalid parent span IDs=9440ecdfe377a710

This seems to indicate that the parent spans for these gRPC calls may not be being closed and thus are not being sent to the collector.

Reproduction

Spin up a simple Jaeger container:

docker run -d --name jaeger \
  -e COLLECTOR_OTLP_ENABLED=true \
  -p 16686:16686 \
  -p 4317:4317 \
  -p 4318:4318 \
  jaegertracing/all-in-one:latest

Point your collector at Jaeger:

export OTEL_EXPORTER_OTLP_ENDPOINT="http://localhost:4317"
export OTEL_TRACES_EXPORTER="otlp"
export OTEL_EXPORTER_OTLP_PROTOCOL="grpc"

Make a dummy Dockerfile:

FROM alpine:latest

Build the image with docker buildx build:

❯ docker buildx build --platform=linux/amd64 -f ./Dockerfile .
[+] Building 1.9s (5/5) FINISHED                                                                                       docker:desktop-linux
 => [internal] load build definition from Dockerfile                                                                                   0.0s
 => => transferring dockerfile: 56B                                                                                                    0.0s
 => [internal] load metadata for docker.io/library/alpine:latest                                                                       1.5s
 => [internal] load .dockerignore                                                                                                      0.0s
 => => transferring context: 457B                                                                                                      0.0s
 => [1/1] FROM docker.io/library/alpine:latest@sha256:4bcff63911fcb4448bd4fdacec207030997caf25e9bea4045fa6c8c44de311d1                 0.3s
 => => resolve docker.io/library/alpine:latest@sha256:4bcff63911fcb4448bd4fdacec207030997caf25e9bea4045fa6c8c44de311d1                 0.0s
 => => sha256:9824c27679d3b27c5e1cb00a73adb6f4f8d556994111c12db3c5d61a0c843df8 3.80MB / 3.80MB                                         0.3s
 => exporting to image                                                                                                                 0.3s
 => => exporting layers                                                                                                                0.0s
 => => exporting manifest sha256:8073c482cbdbbf650f70696ace78bf9d0d51f9b8ff0e035db089d039ca71fea4                                      0.0s
 => => exporting config sha256:9234e8fb04c47cfe0f49931e4ac7eb76fa904e33b7f8576aec0501c085f02516                                        0.0s
 => => exporting attestation manifest sha256:1e1ecd02ea98d4c3ba586b1017eddaf11a951c3c87ddaa535dc9d3b8f304ec92                          0.0s
 => => exporting manifest list sha256:84a725b7730b9139f9cdceb46906a00e89ab2cb2cd6735854867c329b45e41aa                                 0.0s
 => => naming to moby-dangling@sha256:84a725b7730b9139f9cdceb46906a00e89ab2cb2cd6735854867c329b45e41aa                                 0.0s

Pull up the Jaeger web UI (http://localhost:16686/) and search for the Service: buildx span, which should show up as buildx: build . and click on it. Collapse the top-level buildx: build . span to reveal the remaining "disconected" spans, which you can then expand to see the Warning: about a disconnected parent:

Image

Version information

❯ docker buildx version && docker buildx inspect

github.com/docker/buildx v0.26.1-desktop.1 532a478c2ea39e2d0eb40ad2e3f6bec57df4c8af
Name:          desktop-linux
Driver:        docker
Last Activity: 2025-09-06 00:19:59 +0000 UTC

Nodes:
Name:             desktop-linux
Endpoint:         desktop-linux
Status:           running
BuildKit version: v0.23.2
Platforms:        linux/arm64, linux/amd64, linux/amd64/v2, linux/riscv64, linux/ppc64le, linux/s390x, linux/386
Labels:
 org.mobyproject.buildkit.worker.containerd.namespace: moby
 org.mobyproject.buildkit.worker.containerd.uuid:      b8a23e8f-84ec-4857-85c5-38219c01da79
 org.mobyproject.buildkit.worker.executor:             containerd
 org.mobyproject.buildkit.worker.hostname:             docker-desktop
 org.mobyproject.buildkit.worker.moby.host-gateway-ip: 192.168.65.254
 org.mobyproject.buildkit.worker.network:              host
 org.mobyproject.buildkit.worker.selinux.enabled:      false
 org.mobyproject.buildkit.worker.snapshotter:          overlayfs
Devices:
 Name:                  docker.com/gpu=webgpu
 Automatically allowed: false
GC Policy rule#0:
 All:            false
 Filters:        type==source.local,type==exec.cachemount,type==source.git.checkout
 Keep Duration:  48h0m0s
 Max Used Space: 2.764GiB
GC Policy rule#1:
 All:            false
 Keep Duration:  1440h0m0s
 Reserved Space: 20GiB
GC Policy rule#2:
 All:            false
 Reserved Space: 20GiB
GC Policy rule#3:
 All:            true
 Reserved Space: 20GiB

kylelemons avatar Sep 06 '25 00:09 kylelemons

Can you reproduce this with docker buildx history trace, or docker buildx history trace > trace.json for raw data inspection.

tonistiigi avatar Sep 09 '25 16:09 tonistiigi

When you set up the OTEL environment variables, are you doing that in the buildkit container or just on the buildx invocation?

Buildkit has some complicated GRPC interactions. I think you're missing the buildkit spans. If you set up OTEL within buildkit, then buildx will forward the traces to buildkit and everything will show up. If you send the reports directly from buildx, the buildkit traces will never get sent and it would look something like this.

jsternberg avatar Sep 25 '25 20:09 jsternberg