distributed-ranges icon indicating copy to clipboard operation
distributed-ranges copied to clipboard

mhp exclusive scan much slower than inclusive (exclusive scan perf analyse)

Open haichangsi opened this issue 2 years ago • 1 comments

haichangsi avatar Oct 19 '23 21:10 haichangsi

When looking at benchmarks at borealis: https://github.com/intel-sandbox/libraries.runtimes.hpc.dds.dr-ci/actions/runs/6717195674

For inclusive scan times are as follow:

Inclusive_Scan_DR/min_time:0.100/min_warmup_time:0.100/real_time       4083 ms         4081 ms            1 bytes_per_second=182.481G/s footprint=14.9012G
Inclusive_Scan_DR/min_time:0.100/min_warmup_time:0.100/real_time       2501 ms         2498 ms            1 bytes_per_second=297.951G/s footprint=7.45058G
Inclusive_Scan_DR/min_time:0.100/min_warmup_time:0.100/real_time       1654 ms         1651 ms            1 bytes_per_second=450.547G/s footprint=4.96705G
Inclusive_Scan_DR/min_time:0.100/min_warmup_time:0.100/real_time       1272 ms         1270 ms            1 bytes_per_second=585.591G/s footprint=3.72529G
Inclusive_Scan_DR/min_time:0.100/min_warmup_time:0.100/real_time       1045 ms         1043 ms            1 bytes_per_second=712.966G/s footprint=2.98023G
Inclusive_Scan_DR/min_time:0.100/min_warmup_time:0.100/real_time        863 ms          862 ms            1 bytes_per_second=862.869G/s footprint=2.48353G
Inclusive_Scan_DR/min_time:0.100/min_warmup_time:0.100/real_time        737 ms          735 ms            1 bytes_per_second=1011.18G/s footprint=2.12874G
Inclusive_Scan_DR/min_time:0.100/min_warmup_time:0.100/real_time        645 ms          643 ms            1 bytes_per_second=1.1285T/s footprint=1.86265G
Inclusive_Scan_DR/min_time:0.100/min_warmup_time:0.100/real_time        573 ms          571 ms            1 bytes_per_second=1.26983T/s footprint=1.65568G
Inclusive_Scan_DR/min_time:0.100/min_warmup_time:0.100/real_time        518 ms          516 ms            1 bytes_per_second=1.40465T/s footprint=1.49012G
Inclusive_Scan_DR/min_time:0.100/min_warmup_time:0.100/real_time        472 ms          470 ms            1 bytes_per_second=1.54308T/s footprint=1.35465G
Inclusive_Scan_DR/min_time:0.100/min_warmup_time:0.100/real_time        426 ms          425 ms            1 bytes_per_second=1.70636T/s footprint=1.24176G
Inclusive_Scan_DR/min_time:0.100/min_warmup_time:0.100/real_time       4066 ms         4064 ms            1 bytes_per_second=183.241G/s footprint=14.9012G
Inclusive_Scan_DR/min_time:0.100/min_warmup_time:0.100/real_time       5099 ms         5093 ms            1 bytes_per_second=292.251G/s footprint=14.9012G
Inclusive_Scan_DR/min_time:0.100/min_warmup_time:0.100/real_time       5105 ms         5100 ms            1 bytes_per_second=437.868G/s footprint=14.9012G
Inclusive_Scan_DR/min_time:0.100/min_warmup_time:0.100/real_time       5251 ms         5246 ms            1 bytes_per_second=567.564G/s footprint=14.9012G
Inclusive_Scan_DR/min_time:0.100/min_warmup_time:0.100/real_time       5256 ms         5251 ms            1 bytes_per_second=708.757G/s footprint=14.9012G
Inclusive_Scan_DR/min_time:0.100/min_warmup_time:0.100/real_time       5360 ms         5355 ms            1 bytes_per_second=833.972G/s footprint=14.9012G
Inclusive_Scan_DR/min_time:0.100/min_warmup_time:0.100/real_time       5362 ms         5357 ms            1 bytes_per_second=972.734G/s footprint=14.9012G
Inclusive_Scan_DR/min_time:0.100/min_warmup_time:0.100/real_time       5358 ms         5354 ms            1 bytes_per_second=1112.36G/s footprint=14.9012G
Inclusive_Scan_DR/min_time:0.100/min_warmup_time:0.100/real_time       5412 ms         4908 ms            1 bytes_per_second=1.20993T/s footprint=14.9012G
Inclusive_Scan_DR/min_time:0.100/min_warmup_time:0.100/real_time       5340 ms         5335 ms            1 bytes_per_second=1.36253T/s footprint=14.9012G

On the other hand for exclusive scan it is:

Exclusive_Scan_DR/min_time:0.100/min_warmup_time:0.100/real_time       7160 ms         7156 ms            1 bytes_per_second=104.056G/s footprint=14.9012G
Exclusive_Scan_DR/min_time:0.100/min_warmup_time:0.100/real_time      25702 ms        25685 ms            1 bytes_per_second=28.9883G/s footprint=7.45058G
Exclusive_Scan_DR/min_time:0.100/min_warmup_time:0.100/real_time      23682 ms        23662 ms            1 bytes_per_second=31.4607G/s footprint=4.96705G
Exclusive_Scan_DR/min_time:0.100/min_warmup_time:0.100/real_time      31002 ms        30978 ms            1 bytes_per_second=24.0323G/s footprint=3.72529G
Exclusive_Scan_DR/min_time:0.100/min_warmup_time:0.100/real_time      27332 ms        27309 ms            1 bytes_per_second=27.2598G/s footprint=2.98023G
Exclusive_Scan_DR/min_time:0.100/min_warmup_time:0.100/real_time      24186 ms        24164 ms            1 bytes_per_second=30.8055G/s footprint=2.48353G
Exclusive_Scan_DR/min_time:0.100/min_warmup_time:0.100/real_time      25131 ms        25108 ms            1 bytes_per_second=29.6468G/s footprint=2.12874G
Exclusive_Scan_DR/min_time:0.100/min_warmup_time:0.100/real_time      25281 ms        25250 ms            1 bytes_per_second=29.4713G/s footprint=1.86265G
Exclusive_Scan_DR/min_time:0.100/min_warmup_time:0.100/real_time      27502 ms        27455 ms            1 bytes_per_second=27.0913G/s footprint=1.65568G
Exclusive_Scan_DR/min_time:0.100/min_warmup_time:0.100/real_time      27991 ms        27933 ms            1 bytes_per_second=26.6181G/s footprint=1.49012G
Exclusive_Scan_DR/min_time:0.100/min_warmup_time:0.100/real_time      29720 ms        29667 ms            1 bytes_per_second=25.0693G/s footprint=1.35465G
Exclusive_Scan_DR/min_time:0.100/min_warmup_time:0.100/real_time      30183 ms        30103 ms            1 bytes_per_second=24.6849G/s footprint=1.24176G
Exclusive_Scan_DR/min_time:0.100/min_warmup_time:0.100/real_time       7112 ms         7108 ms            1 bytes_per_second=104.766G/s footprint=14.9012G
Exclusive_Scan_DR/min_time:0.100/min_warmup_time:0.100/real_time      54683 ms        54654 ms            1 bytes_per_second=27.2502G/s footprint=14.9012G
Exclusive_Scan_DR/min_time:0.100/min_warmup_time:0.100/real_time      87625 ms        87567 ms            1 bytes_per_second=25.5085G/s footprint=14.9012G
Exclusive_Scan_DR/min_time:0.100/min_warmup_time:0.100/real_time     148550 ms       148468 ms            1 bytes_per_second=20.0622G/s footprint=14.9012G
Exclusive_Scan_DR/min_time:0.100/min_warmup_time:0.100/real_time     191582 ms       191448 ms            1 bytes_per_second=19.4449G/s footprint=14.9012G
Exclusive_Scan_DR/min_time:0.100/min_warmup_time:0.100/real_time     220317 ms       220133 ms            1 bytes_per_second=20.2905G/s footprint=14.9012G
Exclusive_Scan_DR/min_time:0.100/min_warmup_time:0.100/real_time     265100 ms       264817 ms            1 bytes_per_second=19.6733G/s footprint=14.9012G
Exclusive_Scan_DR/min_time:0.100/min_warmup_time:0.100/real_time     341283 ms       324785 ms            1 bytes_per_second=17.4649G/s footprint=14.9012G
Exclusive_Scan_DR/min_time:0.100/min_warmup_time:0.100/real_time     346785 ms       346147 ms            1 bytes_per_second=19.3362G/s footprint=14.9012G

E.g. for last benchmark it is 5sec vs 346sec.

Check why it is so. For now I am disabling also mhp benchmarks.

lslusarczyk avatar Nov 02 '23 10:11 lslusarczyk