gvisor icon indicating copy to clipboard operation
gvisor copied to clipboard

IpVlan support with sandboxed networking

Open cyclimse opened this issue 7 months ago • 0 comments

Description

Hello team 👋

Problem

We've recently started to play around with IPVlan NICs on Kubernetes. Our use-case is to bring private network connectivity to running containers. It works pretty well with various container runtimes (runc, Kata containers, etc...) but we've been struggling to make it work on gVisor when using --network=sandbox (the default).

Overall, it looks like that IpVlan connectivity is not supported. From what we understand, it seems related to the way gVisor removes the IP on all NICs within the container namespace when using --network=sandbox. This is a similar issue to what's described in https://github.com/google/gvisor/issues/6549, but I think the incompatibility with IpVlan is a somewhat undocumented/unknown side effect of this behavior.

Steps to reproduce

Here are two scripts to reproduce with gVisor + containerd (you can also find them in this gist)

Scripts to reproduce (click to open)
# run.sh
#!/bin/bash

set -eu -o pipefail

function echoblue() {
	echo -e "\033[1;34m$1\033[0m"
}

if [ "$#" -ne 1 ]; then
	echo "Error: Invalid number of arguments."
	echo "Usage: $0 <runtime>"
	exit 1
fi

RUNTIME="$1"
SUPPORTED_RUNTIMES=("runsc" "runc" "runsc-network-host")

if [[ ! " ${SUPPORTED_RUNTIMES[@]} " =~ " $RUNTIME " ]]; then
	echo "Unsupported runtime: $RUNTIME"
	echo "Supported runtimes: ${SUPPORTED_RUNTIMES[*]}"
	exit 1
fi

# Create host networking setup using a veth pair and a network namespace.
# This allows the host to communicate with the ipvlan container, which is not
# possible with a dummy interface in L2 mode.
HOST_NS="host-ns"
VETH_HOST="veth-host"
VETH_NS="veth-ns-peer"
SERVER_IP="192.168.100.1"
SERVER_PORT="12345"

if ! ip netns list | grep -q $HOST_NS; then
	echo "Creating host network namespace $HOST_NS..."
	ip netns add $HOST_NS
fi

if ! ip link show $VETH_HOST >/dev/null 2>&1; then
	echo "Creating veth pair $VETH_HOST <-> $VETH_NS..."
	ip link add $VETH_HOST type veth peer name $VETH_NS
	ip link set $VETH_NS netns $HOST_NS

	echo "Configuring host side of veth..."
	ip link set $VETH_HOST up

	echo "Configuring namespace side of veth..."
	ip netns exec $HOST_NS ip addr add $SERVER_IP/24 dev $VETH_NS
	ip netns exec $HOST_NS ip link set $VETH_NS up
	ip netns exec $HOST_NS ip link set lo up
else
	echo "Veth pair $VETH_HOST already exists."
fi

# Our target will be a simple TCP server that listens on the host namespace.
# This will allow us to test connectivity from the container to the host.
ip netns exec $HOST_NS socat -v tcp-listen:$SERVER_PORT,reuseaddr,fork,bind=$SERVER_IP \
	exec:"echo 'Hello from host namespace!'" &

# Create a dummy container to test networking
CONTAINER_NAME="ipvlan-test-dummy"

# Ensure the container name is unique
if nerdctl ps -a --filter "name=$CONTAINER_NAME" --format '{{.Names}}' | grep -q "$CONTAINER_NAME"; then
	echo "Removing existing container $CONTAINER_NAME..."
	nerdctl rm -f $CONTAINER_NAME || true
	sleep 2 # Wait for the container to be removed
	nerdctl rm $CONTAINER_NAME || true
fi

# Because of the way gVisor works, we need to create the IpVlan before we run the container.
# As such, we start by creating a network namespace for the container and setting it up with the IpVlan interface.
CONTAINER_NS="container-ns"
IPVLAN_NAME="ipvlan0"
IPVLAN_IP="192.168.100.2"

if ip netns list | grep -q $CONTAINER_NS; then
	echo "Removing existing network namespace $CONTAINER_NS..."
	ip netns del $CONTAINER_NS || true
	sleep 2 # Wait for the namespace to be removed
fi
ip netns add $CONTAINER_NS

# Set up container networking using CNI bridge plugin
CNI_PATH="/opt/cni/bin"
NETNS_PATH="/run/netns/$CONTAINER_NS"
CONTAINER_ID="test-container-$$"

# Create the CNI configuration
cat >/tmp/bridge-config.json <<EOF
{
  "cniVersion": "1.0.0",
  "name": "mybridge",
  "type": "bridge",
  "bridge": "cni-bridge0",
  "isGateway": true,
  "ipMasq": true,
  "hairpinMode": true,
  "ipam": {
    "type": "host-local",
    "subnet": "172.19.0.0/24",
    "routes": [
      { "dst": "0.0.0.0/0" }
    ]
  }
}
EOF

# Execute the CNI plugin
echo "Setting up bridge networking with CNI..."
CNI_COMMAND=ADD CNI_CONTAINERID=$CONTAINER_ID CNI_NETNS=$NETNS_PATH CNI_IFNAME=eth0 \
	CNI_PATH=$CNI_PATH $CNI_PATH/bridge </tmp/bridge-config.json

# Explicitly set up NAT for the container subnet to your public interface
PUBLIC_INTERFACE="ens2"
iptables -t nat -A POSTROUTING -s 172.19.0.0/24 -o $PUBLIC_INTERFACE -j MASQUERADE
iptables -A FORWARD -i cni-bridge0 -o $PUBLIC_INTERFACE -j ACCEPT
iptables -A FORWARD -i $PUBLIC_INTERFACE -o cni-bridge0 -m state --state RELATED,ESTABLISHED -j ACCEPT

ip link add link $VETH_HOST $IPVLAN_NAME netns $CONTAINER_NS type ipvlan mode l2
ip netns exec $CONTAINER_NS ip addr add $IPVLAN_IP/24 dev $IPVLAN_NAME
ip netns exec $CONTAINER_NS ip link set $IPVLAN_NAME up

# If it already exists, remove it
if nerdctl ps -a --filter "name=$CONTAINER_NAME" --format '{{.Names}}' | grep -q "$CONTAINER_NAME"; then
	echo "Removing existing container $CONTAINER_NAME..."
	nerdctl rm -f $CONTAINER_NAME || true
	sleep 2 # Wait for the container to be removed
	nerdctl rm $CONTAINER_NAME || true
fi

# See: https://github.com/containerd/nerdctl/pull/3538
nerdctl run -d --name $CONTAINER_NAME \
	--runtime $RUNTIME \
	--net=ns:/run/netns/$CONTAINER_NS \
	alpine:3.22 sleep infinity

# Install telnet in the container to test connectivity
nerdctl exec -it $CONTAINER_NAME apk add --no-cache busybox-extras

echo "Container $CONTAINER_NAME is running with network namespace $CONTAINER_NS."
echoblue "You can test connectivity to the host namespace using:"
echoblue "nerdctl exec -it $CONTAINER_NAME telnet $SERVER_IP $SERVER_PORT"

if [ "$RUNTIME" == "runsc" ]; then
	echoblue "On gVisor, this won't work because gVisor steals the IP from the IpVlan interface."

	nerdctl exec -it $CONTAINER_NAME ip addr show $IPVLAN_NAME

	# Prompt user if script should add the IP back to the IpVlan interface
	echoblue "Do you want to add the IP back to the IpVlan interface in $CONTAINER_NS? (yes/no)"

	read -r answer
	if [[ "$answer" == "yes" ]]; then
		echo "Adding IP $IPVLAN_IP back to $IPVLAN_NAME in $CONTAINER_NS..."
		ip netns exec $CONTAINER_NS ip addr add $IPVLAN_IP dev $IPVLAN_NAME
		echo "IP added. You can now test connectivity again."
	fi
fi

Script to install the pre-requsites:

# setup.sh
#!/bin/bash

set -eu -o pipefail

RUNC_IS_INSTALLED=$(command -v runc || true)
SOCAT_IS_INSTALLED=$(command -v socat || true)

GVISOR_IS_INSTALLED=$(command -v runsc || true)

CONTAINERD_IS_INSTALLED=$(command -v containerd || true)
CONTAINERD_VERSION="1.7.25"

NERDCTL_IS_INSTALLED=$(command -v nerdctl || true)
NERDCTL_VERSION="2.1.2" # Need at least 2.0.0 for --net=ns:/run/netns/<namespace> support

CNI_VERSION="1.6.2"

# Ensure necessary packages are installed
if [ -z "$SOCAT_IS_INSTALLED" ] || [ -z "$RUNC_IS_INSTALLED" ]; then
	echo "Installing required packages: socat, runc, wget, iproute2, iptables"
	sudo apt-get update
	sudo apt-get install -y socat runc
else
	echo "Required packages are already installed."
fi

# Install gVisor
(
	if [ -z "$GVISOR_IS_INSTALLED" ]; then
		(
			set -e
			ARCH=$(uname -m)
			URL=https://storage.googleapis.com/gvisor/releases/release/latest/${ARCH}
			wget ${URL}/runsc ${URL}/runsc.sha512 \
				${URL}/containerd-shim-runsc-v1 ${URL}/containerd-shim-runsc-v1.sha512
			sha512sum -c runsc.sha512 \
				-c containerd-shim-runsc-v1.sha512
			rm -f *.sha512
			chmod a+rx runsc containerd-shim-runsc-v1
			sudo mv runsc containerd-shim-runsc-v1 /usr/local/bin

			cat >/usr/local/bin/runsc-network-host <<EOF
#!/bin/sh
exec /usr/local/bin/runsc --network=host "\$@"
EOF
			chmod a+rx /usr/local/bin/runsc-network-host
		)
	else
		echo "gVisor is already installed."
	fi
)

# Install containerd
(
	if [ -z "$CONTAINERD_IS_INSTALLED" ]; then
		(
			# containerd
			wget https://github.com/containerd/containerd/releases/download/v$CONTAINERD_VERSION/containerd-$CONTAINERD_VERSION-linux-amd64.tar.gz
			tar Cxzvf /usr/local containerd-$CONTAINERD_VERSION-linux-amd64.tar.gz
			wget -P /usr/local/lib/systemd/system/ https://raw.githubusercontent.com/containerd/containerd/v$CONTAINERD_VERSION/containerd.service
			rm containerd-$CONTAINERD_VERSION-linux-amd64.tar.gz
		)
	else
		echo "containerd is already installed."
	fi
)

# Install nerdctl
(
	if [ -z "$NERDCTL_IS_INSTALLED" ]; then
		(
			wget https://github.com/containerd/nerdctl/releases/download/v$NERDCTL_VERSION/nerdctl-$NERDCTL_VERSION-linux-amd64.tar.gz
			tar Cxzvvf /usr/local/bin nerdctl-$NERDCTL_VERSION-linux-amd64.tar.gz
			rm nerdctl-$NERDCTL_VERSION-linux-amd64.tar.gz
		)
	else
		echo "nerdctl is already installed."
	fi
)

# Install cni
(
	if [ ! -d /opt/cni/bin ]; then
		(
			wget https://github.com/containernetworking/plugins/releases/download/v$CNI_VERSION/cni-plugins-linux-amd64-v$CNI_VERSION.tgz
			mkdir -p /opt/cni/bin
			tar Cxzvf /opt/cni/bin cni-plugins-linux-amd64-v$CNI_VERSION.tgz
			chown -R root:root /opt/cni/bin
			rm cni-plugins-linux-amd64-v$CNI_VERSION.tgz
		)
	else
		echo "CNI plugins are already installed."
	fi
)

# Start containerd service
if ! systemctl is-active --quiet containerd; then
	echo "Starting containerd service..."
	systemctl enable containerd
	systemctl start containerd
fi

:warning: Word of caution, this is intended to run on an ephemeral Ubuntu VM. Specifically, this was tested on Ubuntu 24.04.2 :warning:

Usage:

  1. Install the prerequisites:
./setup.sh
  1. Run the script:
./run.sh
  1. Check connectivity:

Expected working example ✅

$ nerdctl exec -it ipvlan-test-dummy telnet 192.168.100.1 12345
2025/06/26 13:53:00.000574202  length=27 from=0 to=26
                                                       Hello from host namespace!
                                                                                 Connected to 192.168.100.1
Hello from host namespace!
Connection closed by foreign host

Oh no ❌

$ nerdctl exec -it ipvlan-test-dummy telnet 192.168.100.1 12345
telnet: can't connect to remote host (192.168.100.1): Host is unreachable
FATA[0003] exec failed with exit code 1   

Results:

runtime can connect?
runc
runsc
runsc-network-host

The run.sh script also showcases that adding back the IP on the IPVlan NIC after gVisor has created the sandbox fixes the connectivity issue. It's likely this isn't a proper fix to the issue, be, to be blunt, we don't know enough about the internals of gVisor networking to understand why IPVlan is not supported without this small hack.

Is this feature related to a specific bug?

I'm not sure if this is a bug, the removal of the IP is intended behavior (documented in https://github.com/google/gvisor/issues/6549#issuecomment-914880555) but the lack of support for IPVlan might be an unintended side-effect.

Do you have a specific solution in mind?

Here are some solutions we had in mind:

  • Support IPVlan when using sandboxed networking.

  • Skip IP removal from IPVlan NICs. Here's a patch that implements this behavior:

Patch (click to open)

Patch to apply on release https://github.com/google/gvisor/blob/release-20250616.0:

diff --git a/runsc/sandbox/network.go b/runsc/sandbox/network.go
index e8c804893..963b8b9d4 100644
--- a/runsc/sandbox/network.go
+++ b/runsc/sandbox/network.go
@@ -252,11 +252,14 @@ func createInterfacesAndRoutesFromNS(conn *urpc.Client, nsPath string, conf *con
                        return fmt.Errorf("getting link for interface %q: %w", iface.Name, err)
                }
                linkAddress := ifaceLink.Attrs().HardwareAddr
+               isIPVlan := ifaceLink.Type() == "ipvlan" // maybe just check if it has parent index == hostDev.Attrs().Index

                // Collect the addresses for the interface, enable forwarding,
                // and remove them from the host.
                var addresses []boot.IPWithPrefix
                for _, addr := range ipAddrs {
+                       log.Debugf("interface %s has address %s", iface.Name, addr.String())
+
                        prefix, _ := addr.Mask.Size()
                        addresses = append(addresses, boot.IPWithPrefix{Address: addr.IP, PrefixLen: prefix})

@@ -271,6 +274,20 @@ func createInterfacesAndRoutesFromNS(conn *urpc.Client, nsPath string, conf *con
                                }
                                return fmt.Errorf("removing address %v from device %q: %w", addr, iface.Name, err)
                        }
+
+                       if isIPVlan && addr.IP.To4() != nil {
+                               log.Debugf("interface %s is an ipv4 with ipvlan: %s, adding address /32", iface.Name, addr.IP.String())
+
+                               newAddr32 := fmt.Sprintf("%s/32", addr.IP.String())
+                               newAddr, err := netlink.ParseAddr(newAddr32)
+                               if err != nil {
+                                       return fmt.Errorf("cannot parse addr: %w", err)
+                               }
+                               err = netlink.AddrAdd(ifaceLink, newAddr)
+                               if err != nil {
+                                       return fmt.Errorf("cannot add ipvlan addr %s: %w", newAddr32, err)
+                               }
+                       }
                }

                if conf.XDP.Mode == config.XDPModeNS { 
  • Add a new flag that, when used in combination with --network=sandbox, skips the removal of the IP from specific NICs defined in the specified network namespace like --do-not-remove-ip-from="ipvlan0". This is a simlar proposal to https://github.com/google/gvisor/issues/6549 but scoped to specific NICs

Thanks a lot and have a lovely day!

cyclimse avatar Jun 26 '25 14:06 cyclimse