Virtio-net: implement support of VIRTIO_NET_F_MRG_RXBUF
Per comment: https://github.com/firecracker-microvm/firecracker/blob/99a026756c406d1c7f60c5c9f6685adcab1229e7/devices/src/virtio/net.rs#L153-L154 virtio-net implementation in firecracker does not support VIRTIO_NET_F_MRG_RXBUF. The VIRTIO_NET_F_MRG_RXBUF allows for much finer memory management that the existing one that relies on humongous 68K (17 pages) buffers on guest side. As I understand the 68K buffers also have to be contigous in physical memory and may come hard to achieve under memory shortage pressure.
Please see relevant snipper from virtio spec: "5.1.6.3.1 Driver Requirements: Setting Up Receive Buffers If VIRTIO_NET_F_MRG_RXBUF is not negotiated: If VIRTIO_NET_F_GUEST_TSO4, VIRTIO_NET_F_GUEST_TSO6 or VIRTIO_NET_F_GUEST_UFO are negotiated, the driver SHOULD populate the receive queue(s) with buffers of at least 65562 bytes. Otherwise, the driver SHOULD populate the receive queue(s) with buffers of at least 1526 bytes. If VIRTIO_NET_F_MRG_RXBUF is negotiated, each buffer MUST be at greater than the size of the struct virtio_net_hdr."
Hi @wkozaczuk, thank you for reporting this and for the thorough investigation! Looks like this is a pre-existing functionality gap that we haven't stumbled across since now. We will look into it as soon as we have bandwidth - in the meantime, if you have a fix ready, PRs are welcome!
@wkozaczuk this is not implemented in Firecracker because we are not following the virtio 1.0 specification, but the 0.95 one.
As I see it now, the likelihood of this particular issue being fixed before we completely switch to a virtio specification >= 1.0 is low. We are working on some improvements on the virtio devices in Firecracker, and we should have some updates in a couple of months.
Is this issue blocking you in any way? If yes, we can see what steps we can take to address it faster.
@andreeaflorescu what do you mean when you say we are not following the virtio 1.0 specification? We do negotiate the VIRTIO_F_VERSION_1. We even have a link in our net implementation that points to the v1.0 spec.
I cannot speak to whether virtio spec mandates this feature or not. I am also not sure how it affects memory utilization in Linux guest which clearly handles that but I wonder at what cost.
On OSv side, for now, I came up with a patch that makes it use large (68K) sg buffers for RX ring if VIRTIO_NET_F_MRG_RXBUF is not negotiated. This, unfortunately, leads to much higher memory utilization - 17MiB vs 1MiB (256 68K buffers vs 256 4K buffers) - but at least it works. As I understand each packet whether small (<4K) or large needs single 68K which obviously is not very efficient.
So I must say this shortcoming is not that urgent to me to fix but I would love to have it addressed eventually (QEMU supports it by default).
@wkozaczuk your memory math is wrong. But I also think VIRTIO_NET_F_MRG_RXBUF is important.
Without MRG_RXBUF we have the following cases:
- offloads enabled
- offloads disabled
Offloads disabled is "easy", the driver allocates ~1500byte packets, and puts up to default 256 of them into the rx ring. (technically driver allocates: vi->hdr_len + VIRTNET_RX_PAD + GOOD_PACKET_LEN + xdp_headroom bytes).
Offloads enabled means driver needs to put 64KiB buffers into RX ring. However, notice this also consumes descriptor-ring slots. For completeness: each large packet in RX path occupies... 19 descriptor slots.
So with offloads enabled we're limited to descritpor slots - there is also a fixed cap of 256. Without VIRTIO_RING_F_INDIRECT_DESC this is a problem. Without VIRTIO_RING_F_INDIRECT_DESC you can have at most 256/19 = 13 large packets in RX ring.
In other words, with offloads, and without MRG_RXBUF and without INDIRECT_DESC our rx ring is of size... 13 packets. That sounds rather bad.
@majek I trust your math is better than mine :-) I wrote it over 3 years ago so I am not 100% confident I follow all the details now. In any case, it would be nice if https://github.com/firecracker-microvm/firecracker/pull/2672 got resurrected.
relevant to #4364.