Cannot read mounts in rootless Podman
Description
I'm trying to run gVisor within rootless Podman. The aim is to use gVisor for its syscall interception capabilities. However, I get the following error:
$ runsc -debug -network none -rootless do echo ok
Error reading mounts file: error unmarshaling mounts: unexpected end of JSON input
JSON bytes:
creating container: cannot create sandbox: cannot read client sync file: waiting for sandbox to start: EOF
In a regular Docker (not rootless) container, this command works:
$ runsc -debug -network none -rootless do echo ok
ok
Steps to reproduce
- Install Podman in your system.
- Execute into an Alpine container with
podman run -it --rm alpine:edge ash - Install
runscbased on the installation instructions - Run
runsc -debug -network none -rootless do echo ok
runsc version
runsc version release-20221107.0
spec: 1.0.2-dev
docker version (if using docker)
$ podman --version
podman version 4.3.1
uname
Linux 6.0.8-arch1-1 #1 SMP PREEMPT_DYNAMIC Thu, 10 Nov 2022 21:14:24 +0000 x86_64 GNU/Linux
kubectl (if using Kubernetes)
No response
repo state (if built from source)
No response
runsc debug logs (if available)
Error reading mounts file: error unmarshaling mounts: unexpected end of JSON input
JSON bytes:
I used the -debug-log switch, and I'm now seeing more logs. The following logs from the gofer process provide more info:
I1116 12:05:48.419039 1 main.go:216] ***************************
I1116 12:05:48.419097 1 main.go:217] Args: [runsc-gofer --root=/tmp/runsc-do237025772 --debug=true --debug-log=log/ --overlay=true --network=none --strace=true --rootless=true --debug-l
og-fd=3 gofer --bundle /tmp/runsc-do237025772 --spec-fd=4 --mounts-fd=5 --io-fds=6]
I1116 12:05:48.419124 1 main.go:218] Version release-20221107.0
I1116 12:05:48.419139 1 main.go:219] GOOS: linux
I1116 12:05:48.419153 1 main.go:220] GOARCH: amd64
I1116 12:05:48.419168 1 main.go:221] PID: 1
I1116 12:05:48.419183 1 main.go:222] UID: 0, GID: 0
I1116 12:05:48.419198 1 main.go:223] Configuration:
I1116 12:05:48.419214 1 main.go:224] RootDir: /tmp/runsc-do237025772
I1116 12:05:48.419231 1 main.go:225] Platform: ptrace
I1116 12:05:48.419246 1 main.go:226] FileAccess: exclusive, overlay: true
I1116 12:05:48.419262 1 main.go:227] Network: none, logging: false
I1116 12:05:48.419279 1 main.go:228] Strace: true, max size: 1024, syscalls:
I1116 12:05:48.419293 1 main.go:229] LISAFS: true
I1116 12:05:48.419308 1 main.go:230] Debug: true
I1116 12:05:48.419322 1 main.go:231] Systemd: false
I1116 12:05:48.419336 1 main.go:232] ***************************
I1116 12:05:48.419352 1 main.go:254] Failed to set RLIMIT_MEMLOCK: operation not permitted
W1116 12:05:48.420557 1 specutils.go:113] noNewPrivileges ignored. PR_SET_NO_NEW_PRIVS is assumed to always be set.
W1116 12:05:48.420759 1 util.go:64] FATAL ERROR: error mounting proc: operation not permitted
error mounting proc: operation not permitted
I see that the relevant check in Gofer happens here:
https://github.com/google/gvisor/blob/d8aa09e04c4e38155dfcee6ed0495c2b29f604fc/runsc/cmd/gofer.go#L418
Since we are already running in rootless Podman, we may opt to reuse the existing /proc and / paths. My understanding is that this can happen via the -TESTONLY-unsafe-nonroot flag, and it works indeed:
$ runsc -network none -rootless -TESTONLY-unsafe-nonroot do echo ok
ok
Still, this is a pretty scary looking argument. Is it safe using it in a sandbox within a sandbox use case? Else, is there a better way to achieve the original goal?
The aim is to use gVisor for its syscall interception capabilities
A friendly reminder that this issue had no activity for 120 days.
@apyrgio have you found another solution that doesn't rely on -TESTONLY-unsafe-nonroot by now?
Unfortunately not. I haven't managed to work more in this front actually.
I think this issue was fixed by https://github.com/google/gvisor/commit/c6a1db5baec7616983b14ac06e84bee45330a9d3. Can you please confirm?
Doesn't look like it I'm afraid. The original invocation (runsc -debug -network none -rootless do echo ok) still fails with:
I1110 09:32:21.865885 1 main.go:189] ***************************
I1110 09:32:21.865967 1 main.go:190] Args: [runsc-gofer --root=/tmp/runsc-do4154063816 --debug-log=logs/ --overlay2=all:memory --network=none --rootless=true --debug-log-fd=3 gofer --bundle /tmp/runsc-do4154063816 --gofer-mount-conf
s=1 --spec-fd=4 --mounts-fd=5 --io-fds=6]
I1110 09:32:21.865995 1 main.go:191] Version release-20231106.0
I1110 09:32:21.866008 1 main.go:192] GOOS: linux
I1110 09:32:21.866020 1 main.go:193] GOARCH: amd64
I1110 09:32:21.866032 1 main.go:194] PID: 1
I1110 09:32:21.866045 1 main.go:195] UID: 0, GID: 0
I1110 09:32:21.866058 1 main.go:196] Configuration:
I1110 09:32:21.866070 1 main.go:197] RootDir: /tmp/runsc-do4154063816
I1110 09:32:21.866082 1 main.go:198] Platform: systrap
I1110 09:32:21.866095 1 main.go:199] FileAccess: exclusive
I1110 09:32:21.866110 1 main.go:200] Directfs: true
I1110 09:32:21.866123 1 main.go:201] Overlay: all:memory
I1110 09:32:21.866137 1 main.go:202] Network: none, logging: false
I1110 09:32:21.866152 1 main.go:203] Strace: false, max size: 1024, syscalls:
I1110 09:32:21.866165 1 main.go:204] IOURING: false
I1110 09:32:21.866178 1 main.go:205] Debug: false
I1110 09:32:21.866190 1 main.go:206] Systemd: false
I1110 09:32:21.866202 1 main.go:207] ***************************
W1110 09:32:21.867751 1 specutils.go:124] noNewPrivileges ignored. PR_SET_NO_NEW_PRIVS is assumed to always be set.
W1110 09:32:21.868016 1 util.go:64] FATAL ERROR: error mounting proc: operation not permitted
error mounting proc: operation not permitted
(note that the line Failed to set RLIMIT_MEMLOCK: operation not permitted is no longer present, in contrast with the previous logs)
Also, the altered invocation (runsc -network none -rootless -TESTONLY-unsafe-nonroot do echo ok) still works.
I think this issue was fixed by c6a1db5. Can you please confirm?
@ayushr2 This commit code is after the bug occurs, so it can't fix this issue.
@apyrgio is selinux enabled? If the answer is yes, could you try to temporary disable it and reproduce the issue?
$ sestatus $ sudo setenforce Permissive
@apyrgio @terenceli could you try out https://github.com/google/gvisor/pull/9798?
@apyrgio @terenceli could you try out #9798?
@avagin in my case it doesn't work.
I did some analysis, the issue is happen in here: https://github.com/google/gvisor/blob/master/runsc/cmd/gofer.go#L394C21-L394C21
Your fix code is after it.
This issue seems to be a general issue 'mount procfs from unprivileged container'. It is discussed in runc issue https://github.com/opencontainers/runc/issues/1658
I used Alban's workaround in here https://github.com/opencontainers/runc/issues/1658#issuecomment-375750981
and It works.
Following is my Ubuntu test.
The /mnt/proc volume is in Alban's workaround.
# docker run -it --security-opt apparmor:unconfined --security-opt seccomp=unconfined -v /home:/home -v /mnt/proc:/newproc -v /tmp:/tmp ubuntu
root@561ca9b97a06:/# useradd test
root@561ca9b97a06:/# su test
$ cd /tmp
$ ./runsc -rootless do sh
*** Warning: sandbox network isn't supported with --rootless, switching to host ***
#
So my question, could we find a more elegant fix/workaround for we can use gVisor in unprivileged docker/podman environment? I rely this to build a process-level sandbox.
Your fix code is after it.
You are right. I created a fedora vm to reproduce the issue, but runsc failed differently there.
This issue seems to be a general issue 'mount procfs from unprivileged container'. It is discussed in runc issue opencontainers/runc#1658
Thank you for researching the problem. I think https://github.com/google/gvisor/commit/063ee51c57f6cd5c64aa0d115396941dce455b8b should fix the issue. Could you try it out?
Your fix code is after it.
You are right. I created a fedora vm to reproduce the issue, but runsc failed differently there.
This issue seems to be a general issue 'mount procfs from unprivileged container'. It is discussed in runc issue opencontainers/runc#1658
Thank you for researching the problem. I think 063ee51 should fix the issue. Could you try it out?
@avagin Yes, after apply this patch, it works! Thanks
Oh, that's great! Thanks a lot for the progress on this front.
@terenceli, just to be sure, did this fix work within a rootless Podman container, as described at the top of the issue? If not, I can give it a go as well.
Oh, that's great! Thanks a lot for the progress on this front.
@terenceli, just to be sure, did this fix work within a rootless Podman container, as described at the top of the issue? If not, I can give it a go as well.
@apyrgio Sorry, I don't try it now. I have tried it in Ubuntu and docker env. You can test it in Podman.
@apyrgio I have tried this patch using podman. With selinux set to Permissive, it works, and with selinux enabled, it doesn't work.
@avagin Could you summit a PR to upstream?
[test@debug010000002015 ~]$ podman --version
podman version 4.8.1
[test@debug010000002015 ~]$ podman run -it --rm alpine ash
/ # cd /tmp
/tmp # ls
runsc
/tmp # ./runsc do ls
Error setting up network: failed to run "/sbin/ip link add ve-runsc-387325 type veth peer name vp-runsc-387325": exit status 2
/tmp # ./runsc --rootless do ls
*** Warning: sandbox network isn't supported with --rootless, switching to host ***
runsc runsc-do3020874715
Thanks a lot @terenceli. Unfortunately I didn't have time to test it on my own, but the results look promising! For our use case (Dangerzone) we are spinning up containers that don't have a network device, so it could very well be that nested gVisor will work, even with SELinux enabled. That's a great development!