Add network sandbox passthrough for rootless/pre-setup applications, divorced functionality from OCI.
Description
I have the need for network=host which engages the gvisor netstack. While rootless.
Lets call it network=sandbox-passthrough
One use case is to benefit from gvisor sandbox isolation while setting up the environment with standard linux namespace tooling such as bubblewap/bwrap:
bwrap --args 20 -- runsc -platform=kvm -rootless -ignore-cgroups -network=host -host-uds=all -host-fifo=open -file-access=shared -overlay2=none do -force-overlay=false -- bash -li
(20 is an FD containing args for bwrap, which can do much of what OCI does, but is far more readable and easily configurable)
The above works well. But doesn't provide the benefit of gvisor's netstack isolation.
Introducing this desired functionality would allow for divorcing gvisor from the OCI spec. And making the 'do' command do more than test. Also perhaps partially or in full obviate https://github.com/google/gvisor/issues/5440
gvisor's netstack is already used to provide an isolated netstack for containers and VM's while rootless:
https://github.com/containers/gvisor-tap-vsock
A replacement for libslirp and VPNKit, written in pure Go. It is based on the network stack of gVisor.
And while this may be used with gvisor it would be hacky, and if I recall correctly did not work with network=none.
Another potential approach:
https://github.com/containers/libkrun
Transparent Socket Impersonation which allows the VM to have network connectivity without a virtual interface. This technique supports both outgoing and incoming connections. It's possible for userspace applications running in the VM to transparently connect to endpoints outside the VM and receive connections from the outside to ports listening inside the VM. Requires a custom kernel (like the one bundled in libkrunfw) and it's limited to AF_INET SOCK_DGRAM and SOCK_STREAM sockets.
They do it by patching the guest kernel, gvisor being in charge of executing syscalls can do something similar.
Depending on the approach it would look like a lone interface inside the gvisor sandbox serving it a network link to the netns it was launched from. Or as in case of libkrun no interfaces would be visible in the sandbox but packets sent inside it would appear in the host netns coming from the gvisor netstack.
The primary requirement here is to enable the network setup and isolation but in rootless mode. The existing modes in netstack:
- network=host : will just be a passthrough to the host netns without any isolation.
- network=sandbox: requires root permissions to setup the network and veth devices.
- network=none: does not require root, but provides only loopback interface.
The new mode proposed here network=sandbox-passthrough should provide a way to enable some network setup without root, but also use gvisor netstack for communication with host netns.
While this is a good use case which will allow gVisor netstack to support other tools (as mentioned in this issue like bwrap), implementing this is not a high priority for the gVisor team for now. We are happy to accept PRs with the proposed solution of have a specific interface which communicates with host netns (with good amount of testing :-) ).