IpVlan support with sandboxed networking
Description
Hello team 👋
Problem
We've recently started to play around with IPVlan NICs on Kubernetes. Our use-case is to bring private network connectivity to running containers. It works pretty well with various container runtimes (runc, Kata containers, etc...) but we've been struggling to make it work on gVisor when using --network=sandbox (the default).
Overall, it looks like that IpVlan connectivity is not supported. From what we understand, it seems related to the way gVisor removes the IP on all NICs within the container namespace when using --network=sandbox. This is a similar issue to what's described in https://github.com/google/gvisor/issues/6549, but I think the incompatibility with IpVlan is a somewhat undocumented/unknown side effect of this behavior.
Steps to reproduce
Here are two scripts to reproduce with gVisor + containerd (you can also find them in this gist)
Scripts to reproduce (click to open)
# run.sh
#!/bin/bash
set -eu -o pipefail
function echoblue() {
echo -e "\033[1;34m$1\033[0m"
}
if [ "$#" -ne 1 ]; then
echo "Error: Invalid number of arguments."
echo "Usage: $0 <runtime>"
exit 1
fi
RUNTIME="$1"
SUPPORTED_RUNTIMES=("runsc" "runc" "runsc-network-host")
if [[ ! " ${SUPPORTED_RUNTIMES[@]} " =~ " $RUNTIME " ]]; then
echo "Unsupported runtime: $RUNTIME"
echo "Supported runtimes: ${SUPPORTED_RUNTIMES[*]}"
exit 1
fi
# Create host networking setup using a veth pair and a network namespace.
# This allows the host to communicate with the ipvlan container, which is not
# possible with a dummy interface in L2 mode.
HOST_NS="host-ns"
VETH_HOST="veth-host"
VETH_NS="veth-ns-peer"
SERVER_IP="192.168.100.1"
SERVER_PORT="12345"
if ! ip netns list | grep -q $HOST_NS; then
echo "Creating host network namespace $HOST_NS..."
ip netns add $HOST_NS
fi
if ! ip link show $VETH_HOST >/dev/null 2>&1; then
echo "Creating veth pair $VETH_HOST <-> $VETH_NS..."
ip link add $VETH_HOST type veth peer name $VETH_NS
ip link set $VETH_NS netns $HOST_NS
echo "Configuring host side of veth..."
ip link set $VETH_HOST up
echo "Configuring namespace side of veth..."
ip netns exec $HOST_NS ip addr add $SERVER_IP/24 dev $VETH_NS
ip netns exec $HOST_NS ip link set $VETH_NS up
ip netns exec $HOST_NS ip link set lo up
else
echo "Veth pair $VETH_HOST already exists."
fi
# Our target will be a simple TCP server that listens on the host namespace.
# This will allow us to test connectivity from the container to the host.
ip netns exec $HOST_NS socat -v tcp-listen:$SERVER_PORT,reuseaddr,fork,bind=$SERVER_IP \
exec:"echo 'Hello from host namespace!'" &
# Create a dummy container to test networking
CONTAINER_NAME="ipvlan-test-dummy"
# Ensure the container name is unique
if nerdctl ps -a --filter "name=$CONTAINER_NAME" --format '{{.Names}}' | grep -q "$CONTAINER_NAME"; then
echo "Removing existing container $CONTAINER_NAME..."
nerdctl rm -f $CONTAINER_NAME || true
sleep 2 # Wait for the container to be removed
nerdctl rm $CONTAINER_NAME || true
fi
# Because of the way gVisor works, we need to create the IpVlan before we run the container.
# As such, we start by creating a network namespace for the container and setting it up with the IpVlan interface.
CONTAINER_NS="container-ns"
IPVLAN_NAME="ipvlan0"
IPVLAN_IP="192.168.100.2"
if ip netns list | grep -q $CONTAINER_NS; then
echo "Removing existing network namespace $CONTAINER_NS..."
ip netns del $CONTAINER_NS || true
sleep 2 # Wait for the namespace to be removed
fi
ip netns add $CONTAINER_NS
# Set up container networking using CNI bridge plugin
CNI_PATH="/opt/cni/bin"
NETNS_PATH="/run/netns/$CONTAINER_NS"
CONTAINER_ID="test-container-$$"
# Create the CNI configuration
cat >/tmp/bridge-config.json <<EOF
{
"cniVersion": "1.0.0",
"name": "mybridge",
"type": "bridge",
"bridge": "cni-bridge0",
"isGateway": true,
"ipMasq": true,
"hairpinMode": true,
"ipam": {
"type": "host-local",
"subnet": "172.19.0.0/24",
"routes": [
{ "dst": "0.0.0.0/0" }
]
}
}
EOF
# Execute the CNI plugin
echo "Setting up bridge networking with CNI..."
CNI_COMMAND=ADD CNI_CONTAINERID=$CONTAINER_ID CNI_NETNS=$NETNS_PATH CNI_IFNAME=eth0 \
CNI_PATH=$CNI_PATH $CNI_PATH/bridge </tmp/bridge-config.json
# Explicitly set up NAT for the container subnet to your public interface
PUBLIC_INTERFACE="ens2"
iptables -t nat -A POSTROUTING -s 172.19.0.0/24 -o $PUBLIC_INTERFACE -j MASQUERADE
iptables -A FORWARD -i cni-bridge0 -o $PUBLIC_INTERFACE -j ACCEPT
iptables -A FORWARD -i $PUBLIC_INTERFACE -o cni-bridge0 -m state --state RELATED,ESTABLISHED -j ACCEPT
ip link add link $VETH_HOST $IPVLAN_NAME netns $CONTAINER_NS type ipvlan mode l2
ip netns exec $CONTAINER_NS ip addr add $IPVLAN_IP/24 dev $IPVLAN_NAME
ip netns exec $CONTAINER_NS ip link set $IPVLAN_NAME up
# If it already exists, remove it
if nerdctl ps -a --filter "name=$CONTAINER_NAME" --format '{{.Names}}' | grep -q "$CONTAINER_NAME"; then
echo "Removing existing container $CONTAINER_NAME..."
nerdctl rm -f $CONTAINER_NAME || true
sleep 2 # Wait for the container to be removed
nerdctl rm $CONTAINER_NAME || true
fi
# See: https://github.com/containerd/nerdctl/pull/3538
nerdctl run -d --name $CONTAINER_NAME \
--runtime $RUNTIME \
--net=ns:/run/netns/$CONTAINER_NS \
alpine:3.22 sleep infinity
# Install telnet in the container to test connectivity
nerdctl exec -it $CONTAINER_NAME apk add --no-cache busybox-extras
echo "Container $CONTAINER_NAME is running with network namespace $CONTAINER_NS."
echoblue "You can test connectivity to the host namespace using:"
echoblue "nerdctl exec -it $CONTAINER_NAME telnet $SERVER_IP $SERVER_PORT"
if [ "$RUNTIME" == "runsc" ]; then
echoblue "On gVisor, this won't work because gVisor steals the IP from the IpVlan interface."
nerdctl exec -it $CONTAINER_NAME ip addr show $IPVLAN_NAME
# Prompt user if script should add the IP back to the IpVlan interface
echoblue "Do you want to add the IP back to the IpVlan interface in $CONTAINER_NS? (yes/no)"
read -r answer
if [[ "$answer" == "yes" ]]; then
echo "Adding IP $IPVLAN_IP back to $IPVLAN_NAME in $CONTAINER_NS..."
ip netns exec $CONTAINER_NS ip addr add $IPVLAN_IP dev $IPVLAN_NAME
echo "IP added. You can now test connectivity again."
fi
fi
Script to install the pre-requsites:
# setup.sh
#!/bin/bash
set -eu -o pipefail
RUNC_IS_INSTALLED=$(command -v runc || true)
SOCAT_IS_INSTALLED=$(command -v socat || true)
GVISOR_IS_INSTALLED=$(command -v runsc || true)
CONTAINERD_IS_INSTALLED=$(command -v containerd || true)
CONTAINERD_VERSION="1.7.25"
NERDCTL_IS_INSTALLED=$(command -v nerdctl || true)
NERDCTL_VERSION="2.1.2" # Need at least 2.0.0 for --net=ns:/run/netns/<namespace> support
CNI_VERSION="1.6.2"
# Ensure necessary packages are installed
if [ -z "$SOCAT_IS_INSTALLED" ] || [ -z "$RUNC_IS_INSTALLED" ]; then
echo "Installing required packages: socat, runc, wget, iproute2, iptables"
sudo apt-get update
sudo apt-get install -y socat runc
else
echo "Required packages are already installed."
fi
# Install gVisor
(
if [ -z "$GVISOR_IS_INSTALLED" ]; then
(
set -e
ARCH=$(uname -m)
URL=https://storage.googleapis.com/gvisor/releases/release/latest/${ARCH}
wget ${URL}/runsc ${URL}/runsc.sha512 \
${URL}/containerd-shim-runsc-v1 ${URL}/containerd-shim-runsc-v1.sha512
sha512sum -c runsc.sha512 \
-c containerd-shim-runsc-v1.sha512
rm -f *.sha512
chmod a+rx runsc containerd-shim-runsc-v1
sudo mv runsc containerd-shim-runsc-v1 /usr/local/bin
cat >/usr/local/bin/runsc-network-host <<EOF
#!/bin/sh
exec /usr/local/bin/runsc --network=host "\$@"
EOF
chmod a+rx /usr/local/bin/runsc-network-host
)
else
echo "gVisor is already installed."
fi
)
# Install containerd
(
if [ -z "$CONTAINERD_IS_INSTALLED" ]; then
(
# containerd
wget https://github.com/containerd/containerd/releases/download/v$CONTAINERD_VERSION/containerd-$CONTAINERD_VERSION-linux-amd64.tar.gz
tar Cxzvf /usr/local containerd-$CONTAINERD_VERSION-linux-amd64.tar.gz
wget -P /usr/local/lib/systemd/system/ https://raw.githubusercontent.com/containerd/containerd/v$CONTAINERD_VERSION/containerd.service
rm containerd-$CONTAINERD_VERSION-linux-amd64.tar.gz
)
else
echo "containerd is already installed."
fi
)
# Install nerdctl
(
if [ -z "$NERDCTL_IS_INSTALLED" ]; then
(
wget https://github.com/containerd/nerdctl/releases/download/v$NERDCTL_VERSION/nerdctl-$NERDCTL_VERSION-linux-amd64.tar.gz
tar Cxzvvf /usr/local/bin nerdctl-$NERDCTL_VERSION-linux-amd64.tar.gz
rm nerdctl-$NERDCTL_VERSION-linux-amd64.tar.gz
)
else
echo "nerdctl is already installed."
fi
)
# Install cni
(
if [ ! -d /opt/cni/bin ]; then
(
wget https://github.com/containernetworking/plugins/releases/download/v$CNI_VERSION/cni-plugins-linux-amd64-v$CNI_VERSION.tgz
mkdir -p /opt/cni/bin
tar Cxzvf /opt/cni/bin cni-plugins-linux-amd64-v$CNI_VERSION.tgz
chown -R root:root /opt/cni/bin
rm cni-plugins-linux-amd64-v$CNI_VERSION.tgz
)
else
echo "CNI plugins are already installed."
fi
)
# Start containerd service
if ! systemctl is-active --quiet containerd; then
echo "Starting containerd service..."
systemctl enable containerd
systemctl start containerd
fi
:warning: Word of caution, this is intended to run on an ephemeral Ubuntu VM. Specifically, this was tested on Ubuntu 24.04.2 :warning:
Usage:
- Install the prerequisites:
./setup.sh
- Run the script:
./run.sh
- Check connectivity:
Expected working example ✅
$ nerdctl exec -it ipvlan-test-dummy telnet 192.168.100.1 12345
2025/06/26 13:53:00.000574202 length=27 from=0 to=26
Hello from host namespace!
Connected to 192.168.100.1
Hello from host namespace!
Connection closed by foreign host
Oh no ❌
$ nerdctl exec -it ipvlan-test-dummy telnet 192.168.100.1 12345
telnet: can't connect to remote host (192.168.100.1): Host is unreachable
FATA[0003] exec failed with exit code 1
Results:
| runtime | can connect? |
|---|---|
| runc | ✅ |
| runsc | ❌ |
| runsc-network-host | ✅ |
The run.sh script also showcases that adding back the IP on the IPVlan NIC after gVisor has created the sandbox fixes the connectivity issue. It's likely this isn't a proper fix to the issue, be, to be blunt, we don't know enough about the internals of gVisor networking to understand why IPVlan is not supported without this small hack.
Is this feature related to a specific bug?
I'm not sure if this is a bug, the removal of the IP is intended behavior (documented in https://github.com/google/gvisor/issues/6549#issuecomment-914880555) but the lack of support for IPVlan might be an unintended side-effect.
Do you have a specific solution in mind?
Here are some solutions we had in mind:
-
Support IPVlan when using sandboxed networking.
-
Skip IP removal from IPVlan NICs. Here's a patch that implements this behavior:
Patch (click to open)
Patch to apply on release https://github.com/google/gvisor/blob/release-20250616.0:
diff --git a/runsc/sandbox/network.go b/runsc/sandbox/network.go
index e8c804893..963b8b9d4 100644
--- a/runsc/sandbox/network.go
+++ b/runsc/sandbox/network.go
@@ -252,11 +252,14 @@ func createInterfacesAndRoutesFromNS(conn *urpc.Client, nsPath string, conf *con
return fmt.Errorf("getting link for interface %q: %w", iface.Name, err)
}
linkAddress := ifaceLink.Attrs().HardwareAddr
+ isIPVlan := ifaceLink.Type() == "ipvlan" // maybe just check if it has parent index == hostDev.Attrs().Index
// Collect the addresses for the interface, enable forwarding,
// and remove them from the host.
var addresses []boot.IPWithPrefix
for _, addr := range ipAddrs {
+ log.Debugf("interface %s has address %s", iface.Name, addr.String())
+
prefix, _ := addr.Mask.Size()
addresses = append(addresses, boot.IPWithPrefix{Address: addr.IP, PrefixLen: prefix})
@@ -271,6 +274,20 @@ func createInterfacesAndRoutesFromNS(conn *urpc.Client, nsPath string, conf *con
}
return fmt.Errorf("removing address %v from device %q: %w", addr, iface.Name, err)
}
+
+ if isIPVlan && addr.IP.To4() != nil {
+ log.Debugf("interface %s is an ipv4 with ipvlan: %s, adding address /32", iface.Name, addr.IP.String())
+
+ newAddr32 := fmt.Sprintf("%s/32", addr.IP.String())
+ newAddr, err := netlink.ParseAddr(newAddr32)
+ if err != nil {
+ return fmt.Errorf("cannot parse addr: %w", err)
+ }
+ err = netlink.AddrAdd(ifaceLink, newAddr)
+ if err != nil {
+ return fmt.Errorf("cannot add ipvlan addr %s: %w", newAddr32, err)
+ }
+ }
}
if conf.XDP.Mode == config.XDPModeNS {
- Add a new flag that, when used in combination with
--network=sandbox, skips the removal of the IP from specific NICs defined in the specified network namespace like--do-not-remove-ip-from="ipvlan0". This is a simlar proposal to https://github.com/google/gvisor/issues/6549 but scoped to specific NICs
Thanks a lot and have a lovely day!