Add multi-CPU Affinity option per stream/thread support for Linux
PLEASE NOTE the following text from the iperf3 license. Submitting a pull request to the iperf3 repository constitutes "[making] Enhancements available...publicly":
You are under no obligation whatsoever to provide any bug fixes, patches, or
upgrades to the features, functionality or performance of the source code
("Enhancements") to anyone; however, if you choose to make your Enhancements
available either publicly, or directly to Lawrence Berkeley National
Laboratory, without imposing a separate written license agreement for such
Enhancements, then you hereby grant the following license: a non-exclusive,
royalty-free perpetual license to install, use, modify, prepare derivative
works, incorporate into other computer software, distribute, and sublicense
such enhancements or derivative works thereof, in binary and source code form.
The complete iperf3 license is available in the LICENSE file in the
top directory of the iperf3 source tree.
-
Version of iperf3 (or development branch, such as
masteror3.1-STABLE) to which this pull request applies:master -
Issues fixed (if any): #1738
-
Brief description of code changes (suitable for use as a commit message):
This is an idea that expands on the enhancement request for supporting a list of CPUs to pin the the iperf3 process to now that multithreading is supported. At the time of raising these changes, this is not a complete feature, and the PR has been marked as a draft. There is already an open PR, iperf/#1778, which seeks to implement the enhancement, as requested. These code changes take a different approach in the sense that when the user uses the --parallel flag to create multiple streams for the tests, then they can additionally use the --stream-affinity flag to pin each stream (thread) to a target CPU. This enables me, the user, to create 8 streams/threads on my 8-core system and pin each stream/thread to 1 core. I'll leave the code here for reference, but continue the conversation in the feature request.
Thanks for the PR! So, when we were testing the multi-threading functionality, we used numactl(8) quite extensively. We were thinking a better way to do CPU pinning, etc. would be to deprecate the -A option and just recommend using numactl, given that it's quite more flexible and featureful than anything that iperf3 could (should?) be trying to implementing. I think that there's some published research that's also using numactl instead of -A. Any thoughts on this? Could you tell us more about your use case and why this pull request might be better suited?
Admittedly, these changes were part of a learning exercise to better understand XPS Transmit Packet Steering in Linux. The idea was to explore what happens when each CPU is assigned to its own tx-n queue, so if I had 4 CPU and 4 transmit queues, then I'd try iperf3 --client my.iperf3-server.com --parallel 4 --stream-affinity 0,1,2,3 so each thread runs on a target CPU that is assigned its own transmit queue. I've never used numactl, so I don't know if it can help achieve this. I also don't know if there's any meaningful benefit to doing it the way that I'm thinking, I'd have think of ways to test it.
Thanks! I admittedly had never heard of XPS Transmit Packet Steering, reading the reference now.
We generally will use numactl on multi-CPU NUMA-architecture hosts to ensure that the iperf3 processes/threads handling network I/O are pinned to the cores on the CPU that's bound to the NIC being used. It's actually pretty clever about figuring this stuff out. So I'd do something like this:
% numactl -N netdev:eth0 iperf3 --client foo.example.com
% numactl -N ip:192.168.1.1 iperf3 --client foo.example.com
That said, numactl is specific to Linux (as far as I know). FreeBSD has cpuset but I confess I haven't had the chance to play with it. iperf3 -A seems less powerful but is more universal.
I'm going to have to spend some time learning more about how numactl works. How would this work for applications that use libiperf, though?
I'm going to have to spend some time learning more about how
numactlworks. How would this work for applications that uselibiperf, though?
I think that whoever's running the other application use numactl to invoke that application, and then the CPU affinity / binding settings apply to whatever threads are invoking (or are being run by) functions in libiperf (or indeed other parts of the original application).