Add SO_ZEROCOPY and MSG_ZEROCOPY support for UDP and TCP
-
Version of iperf3 (or development branch, such as
masteror3.1-STABLE) to which this pull request applies: master -
Issues fixed (if any): #1678
-
Brief description of code changes (suitable for use as a commit message):
Add support for zero-copy for both UDP and TCP, using SO_ZEROCOPY/MSG_ZEROCOPY. Although it is not clear when and in what environments the added functionality improve performance, support was added to allow testing this functionality. The assumption is that different environments may have different levels of support for this functionality.
(Note that running bootstrap.sh; configure is required before make to support the new features.)
The added support is for MSG_ZEROCOPY. When used, socket option SO_ZEROCOPY is set and send(...., MSG_ZEROCOPY) is used instead of write(). MSG_ZEROCOPY is used in the following cases:
- UDP: when
-Z/--zerocopyoption is set. - TCP: when
--zerocopy=zis set. Otherwise,sendfile()continue to be used for TCP zero copy.
This PR is a rework of the original suggested support in #1690, as it was too naive and did not include support for notifications.
The main implementation is based on the msg_zerocopy.c example for supporting MSG_ZEROCOPY.
Note that per the Zero-copy networking discussion about the patch that added MSG_ZEROCOPY support, the expected performance improvement is expected to be less than 10% and only for large packets. Therefor, it would be helpful if this PR will be tested on different environments to see whether and when it improves performance.
Closing, as it is not clear that this functionality is valuable, as I wrote in the description. (Code is still available for testing or coming up with enhanced solution.)
MSG_ZEROCOPY with MTU 9K and optmem_max 1MB does improve performance a lot for large TCP flows, specially when used with pacing (up to 35%, from 37G to 50G). Brian Tiernery and I, made the tests and the results can be checker in [1, 2]. This results were run using an older patch [3], this current patch seems to have some big performance regressions. I'll run new UDP tests to provide evidence if there are improvements and I hope that this patch can be upstreamed.
[1] https://github.com/marcosfsch/INDIS-2024/blob/main/AmLight/ubuntu22-6.8/summary.txt [2] https://github.com/marcosfsch/INDIS-2024/blob/main/ESnet/ubuntu22-6.8/summary.txt [3] https://github.com/esnet/iperf/pull/1690
For UDP, this patch is making performance worse, from 36G to 5.5G. But using a previous MSG_ZEROCOPY patch [1] I was able to get 18% better performance for UDP. From 36G to 42G for single UDP flow.
[1] https://github.com/esnet/iperf/pull/1690
Also, to get improvements with MSG_ZEROCOPY, you have to set this in /etc/sysctl.conf:
net.core.optmem_max = 1048576
But we are finding MSG_ZEROCOPY helps a lot on 100G hosts.
OK. Thanks for the input. @bmah888, can you please re-open this PR?
I should also mention that we were using your original patch 1690, not patch 1720. Seems like this should not matter, but thought I should mention that.
I should also mention that we were using your original patch 1690, not patch 1720. Seems like this should not matter, but thought I should mention that.
That is interesting to know. As far as I remember that should not matter. The reason I split #1960 to this PR and #1717 (SKIP-RX-COPY) is that per my internal tests I assumed there is more chance that SKIP-RX-COPY will be merged into the mainline. Per this discussion I now understand this may not be the case, but still I assume it is easier to merge less changes per each PT than merge all.