perftest icon indicating copy to clipboard operation
perftest copied to clipboard

[--rate_limit] A stable relative error (~7.5%) observed

Open doctormin opened this issue 4 years ago • 1 comments

Hi all:

I'm very interested in your work. And I personally tested --rate_limit using a Python script between two machines in an IB network. The network traffic limits were set ranging from 10MBps to 5000MBps+. Whether it is ib_send_bw,ib_read_bw or ib_write_bw, a stable relative error was observed. The rate_limit_type is SW.: image I wonder what mechanism causes this stable relative error, thanks!

doctormin avatar Sep 06 '21 06:09 doctormin

doctormin avatar Sep 06 '21 10:09 doctormin

Hi doctormin, thanks for the info! Can you please help on how to reproduce it ?

HassanKhadour avatar Nov 10 '22 11:11 HassanKhadour

Hi Hassan! I am afraid that I can not share the .py script immediately because this test was done by me when I was working for an enterprise (I don't have access to it any more). But I can explain more about what happened in my test script:

  • In a for loop, the arg to --rate_limit was set in range(5, 5750, 5) and ib_send_bw (or other command) was then executed each time
    • then the "actual rate achieved" was got from the terminal output of perftest everytime it finished.

So the script can collect multiple <input: rate_limit>, <output: actual_rate> data pairs. That's how I draw the diagram above.

This relative error was stable and reproducible in my experience when I was working on it. Could it be that the input (args --rate_limit) is throughput and the output of perftest's terminal report is goodput?

doctormin avatar Nov 10 '22 12:11 doctormin

Hi, perftest output is goodput, I'm trying to reproduce this issue but I can't get those relative errors, I can get barely 1.5%~ at maximum

HassanKhadour avatar Nov 10 '22 12:11 HassanKhadour

I see, thanks for the information. Maybe the packet size setting will influence the gap between throughput and goodput (but I am not so sure). In my previous test (~7.5% result), I select the very packet size which can make the throughput to be MAX in my testbed (by carrying out -a test first), but I didn't remember the exact value of it. I am sure that I was using the same packet size setting for all my data points in the diagram above.

doctormin avatar Nov 10 '22 13:11 doctormin

Hi @doctormin, We investigated the Issue and soon a fix will be merged. The problem is in some formula we were dividing integers and therefore the fractions were ignored, now we changed the types of those variables to be "double". After the Fix the results will be much more accurate.

HassanKhadour avatar Dec 01 '22 13:12 HassanKhadour

Happy to hear about that :) Thank you guys for a great job

doctormin avatar Dec 02 '22 18:12 doctormin

Our pleasure! Please check it if you can https://github.com/linux-rdma/perftest/commit/1ef49c4c3c63b25b61a8eda9c3065f6a5f386aee

HassanKhadour avatar Dec 07 '22 12:12 HassanKhadour

Hi @doctormin, Can you please update us if u checked it?

HassanKhadour avatar Dec 19 '22 13:12 HassanKhadour

Hi @HassanKhadour, I am sorry that I can not check this issue in near future. Because this issue was discovered when I was an intern in an HPC-related enterprise, and currently I don't have access to an IB cluster any more.

doctormin avatar Dec 19 '22 15:12 doctormin

No worries, thanks for reporting the Issue, Closing

HassanKhadour avatar Dec 21 '22 09:12 HassanKhadour