pull images failed
finch pull --platform=amd64 xxx
FATA[1167] failed to extract layer sha256:9cc8d31519b533c03cd8347147f9ea0b9bfbda4650200d388a1495a34812283f: mount callback failed on /var/lib/containerd/tmpmounts/containerd-mount3705620677: failed to Lchown "/var/lib/containerd/tmpmounts/containerd-mount3705620677/kubeflow/src" for UID 29511686, GID 1085706827: lchown /var/lib/containerd/tmpmounts/containerd-mount3705620677/kubeflow/src: invalid argument (Hint: try increasing the number of subordinate IDs in /etc/subuid and /etc/subgid): unknown FATA[1168] exit status 1
Is this image public/shareable? This looks like an image that uses extremely large UIDs and/or GIDs, which when running rootless (or simply via a runtime with user namespaces enabled) means you have exhausted the (standard 2^16) ~65k range of UIDs/GIDs used to map filesystem ownership. I expect this image will not run on any rootless/user namespace-enabled container runtime, unless the /etc/sub{u,g}id files are created which allow a significant range of subordinate IDs to be used within containers.
I'm not quite sure what the value of using IDs in the very high range (that UID is somewhere above 2^24?; GID is even larger!) are, but if you own the image, I would be curious why the need for extremely large integers for the owner and group.
image: ccr.ccs.tencentyun.com/cube-studio/kubeflow-dashboard:2022.09.01 is publish use docker pull ccr.ccs.tencentyun.com/cube-studio/kubeflow-dashboard:2022.09.01 is ok
Reproduced in Finch.
FATA[0125] failed to extract layer sha256:9cc8d31519b533c03cd8347147f9ea0b9bfbda4650200d388a1495a34812283f: mount callback failed on /var/lib/containerd/tmpmounts/containerd-mount3084210000: failed to Lchown "/var/lib/containerd/tmpmounts/containerd-mount3084210000/kubeflow/src" for UID 29511686, GID 1085706827: lchown /var/lib/containerd/tmpmounts/containerd-mount3084210000/kubeflow/src: invalid argument (Hint: try increasing the number of subordinate IDs in /etc/subuid and /etc/subgid): unknown
FATA[0114] exit status 1
However, it worked with the nerdctl built from v1.0.0 tag, which is what we are using in Finch. Will continue the investigation
It's important compare nerdctl (or any other runtime tool) running the same way it is inside Finch, which based on the output is running inside a user namespace ("rootless" mode, specifically); the container shown will probably work on any container runtime that is not running the container within a user namespace (either "rootless" mode or simply inside a root-created user namespace with a specific range of subordinate uid and gids). If you use the nerdctl install that sets up rootless on a Linux system, you should be able to reproduce the same issue, unless you use an extremely large subordinate mapping for the ID ranges.
Reproduced in nerdctl in finch VM shell.
FATA[0139] failed to extract layer sha256:9cc8d31519b533c03cd8347147f9ea0b9bfbda4650200d388a1495a34812283f: mount callback failed on /var/lib/containerd/tmpmounts/containerd-mount1146161846: failed to Lchown "/var/lib/containerd/tmpmounts/containerd-mount1146161846/kubeflow/src" for UID 29511686, GID 1085706827: lchown /var/lib/containerd/tmpmounts/containerd-mount1146161846/kubeflow/src: invalid argument (Hint: try increasing the number of subordinate IDs in /etc/subuid and /etc/subgid): unknown
Validated it can work after extending subuid and subgid.
[ningziwe@lima-finch ningziwe]$ cat /etc/subuid
ningziwe:100000:29700000
[ningziwe@lima-finch ningziwe]$ cat /etc/subgid
ningziwe:100000:1085800000
[ningziwe@lima-finch ningziwe]$
logout
➜ ~ finch pull ccr.ccs.tencentyun.com/cube-studio/kubeflow-dashboard:2022.09.01
...
elapsed: 339.7s total: 942.4 (2.8 MiB/s)
Workaround:
# Log in VM shell
LIMA_HOME=/Applications/Finch/lima/data /Applications/Finch/lima/bin/limactl shell finch
# In VM shell, modify /etc/subuid and /etc/subgid to a larger number
sudo vi /etc/subuid
sudo vi /etc/subgid
# Logout VM shell and restart finch VM
finch vm stop
finch vm start
# Try to pull the image again
finch pull ccr.ccs.tencentyun.com/cube-studio/kubeflow-dashboard:2022.09.01
As @estesp mentioned, the root cause is the image has extremely large UID/GID but the default number is 65536 in Finch.
I found a relevant issue in k8s. From the issue, 65536 is the default UID/GID number for most distributions and this issue is to fix the extremely large UID/GID in image side.
I suggest referring this issue and checking if the UID/GID of your image should/could be adjusted.
If you find it is necessary to use images with extremely large UID/GID, please elaborate the use case here. We can discuss making subuid/subgid configurable if the use case can be justified.
The large uid/guid issue was resolved by switching to rootful container inside VM. https://github.com/runfinch/finch/issues/196