distributed-ranges
distributed-ranges copied to clipboard
mhp exclusive scan much slower than inclusive (exclusive scan perf analyse)
When looking at benchmarks at borealis: https://github.com/intel-sandbox/libraries.runtimes.hpc.dds.dr-ci/actions/runs/6717195674
For inclusive scan times are as follow:
Inclusive_Scan_DR/min_time:0.100/min_warmup_time:0.100/real_time 4083 ms 4081 ms 1 bytes_per_second=182.481G/s footprint=14.9012G
Inclusive_Scan_DR/min_time:0.100/min_warmup_time:0.100/real_time 2501 ms 2498 ms 1 bytes_per_second=297.951G/s footprint=7.45058G
Inclusive_Scan_DR/min_time:0.100/min_warmup_time:0.100/real_time 1654 ms 1651 ms 1 bytes_per_second=450.547G/s footprint=4.96705G
Inclusive_Scan_DR/min_time:0.100/min_warmup_time:0.100/real_time 1272 ms 1270 ms 1 bytes_per_second=585.591G/s footprint=3.72529G
Inclusive_Scan_DR/min_time:0.100/min_warmup_time:0.100/real_time 1045 ms 1043 ms 1 bytes_per_second=712.966G/s footprint=2.98023G
Inclusive_Scan_DR/min_time:0.100/min_warmup_time:0.100/real_time 863 ms 862 ms 1 bytes_per_second=862.869G/s footprint=2.48353G
Inclusive_Scan_DR/min_time:0.100/min_warmup_time:0.100/real_time 737 ms 735 ms 1 bytes_per_second=1011.18G/s footprint=2.12874G
Inclusive_Scan_DR/min_time:0.100/min_warmup_time:0.100/real_time 645 ms 643 ms 1 bytes_per_second=1.1285T/s footprint=1.86265G
Inclusive_Scan_DR/min_time:0.100/min_warmup_time:0.100/real_time 573 ms 571 ms 1 bytes_per_second=1.26983T/s footprint=1.65568G
Inclusive_Scan_DR/min_time:0.100/min_warmup_time:0.100/real_time 518 ms 516 ms 1 bytes_per_second=1.40465T/s footprint=1.49012G
Inclusive_Scan_DR/min_time:0.100/min_warmup_time:0.100/real_time 472 ms 470 ms 1 bytes_per_second=1.54308T/s footprint=1.35465G
Inclusive_Scan_DR/min_time:0.100/min_warmup_time:0.100/real_time 426 ms 425 ms 1 bytes_per_second=1.70636T/s footprint=1.24176G
Inclusive_Scan_DR/min_time:0.100/min_warmup_time:0.100/real_time 4066 ms 4064 ms 1 bytes_per_second=183.241G/s footprint=14.9012G
Inclusive_Scan_DR/min_time:0.100/min_warmup_time:0.100/real_time 5099 ms 5093 ms 1 bytes_per_second=292.251G/s footprint=14.9012G
Inclusive_Scan_DR/min_time:0.100/min_warmup_time:0.100/real_time 5105 ms 5100 ms 1 bytes_per_second=437.868G/s footprint=14.9012G
Inclusive_Scan_DR/min_time:0.100/min_warmup_time:0.100/real_time 5251 ms 5246 ms 1 bytes_per_second=567.564G/s footprint=14.9012G
Inclusive_Scan_DR/min_time:0.100/min_warmup_time:0.100/real_time 5256 ms 5251 ms 1 bytes_per_second=708.757G/s footprint=14.9012G
Inclusive_Scan_DR/min_time:0.100/min_warmup_time:0.100/real_time 5360 ms 5355 ms 1 bytes_per_second=833.972G/s footprint=14.9012G
Inclusive_Scan_DR/min_time:0.100/min_warmup_time:0.100/real_time 5362 ms 5357 ms 1 bytes_per_second=972.734G/s footprint=14.9012G
Inclusive_Scan_DR/min_time:0.100/min_warmup_time:0.100/real_time 5358 ms 5354 ms 1 bytes_per_second=1112.36G/s footprint=14.9012G
Inclusive_Scan_DR/min_time:0.100/min_warmup_time:0.100/real_time 5412 ms 4908 ms 1 bytes_per_second=1.20993T/s footprint=14.9012G
Inclusive_Scan_DR/min_time:0.100/min_warmup_time:0.100/real_time 5340 ms 5335 ms 1 bytes_per_second=1.36253T/s footprint=14.9012G
On the other hand for exclusive scan it is:
Exclusive_Scan_DR/min_time:0.100/min_warmup_time:0.100/real_time 7160 ms 7156 ms 1 bytes_per_second=104.056G/s footprint=14.9012G
Exclusive_Scan_DR/min_time:0.100/min_warmup_time:0.100/real_time 25702 ms 25685 ms 1 bytes_per_second=28.9883G/s footprint=7.45058G
Exclusive_Scan_DR/min_time:0.100/min_warmup_time:0.100/real_time 23682 ms 23662 ms 1 bytes_per_second=31.4607G/s footprint=4.96705G
Exclusive_Scan_DR/min_time:0.100/min_warmup_time:0.100/real_time 31002 ms 30978 ms 1 bytes_per_second=24.0323G/s footprint=3.72529G
Exclusive_Scan_DR/min_time:0.100/min_warmup_time:0.100/real_time 27332 ms 27309 ms 1 bytes_per_second=27.2598G/s footprint=2.98023G
Exclusive_Scan_DR/min_time:0.100/min_warmup_time:0.100/real_time 24186 ms 24164 ms 1 bytes_per_second=30.8055G/s footprint=2.48353G
Exclusive_Scan_DR/min_time:0.100/min_warmup_time:0.100/real_time 25131 ms 25108 ms 1 bytes_per_second=29.6468G/s footprint=2.12874G
Exclusive_Scan_DR/min_time:0.100/min_warmup_time:0.100/real_time 25281 ms 25250 ms 1 bytes_per_second=29.4713G/s footprint=1.86265G
Exclusive_Scan_DR/min_time:0.100/min_warmup_time:0.100/real_time 27502 ms 27455 ms 1 bytes_per_second=27.0913G/s footprint=1.65568G
Exclusive_Scan_DR/min_time:0.100/min_warmup_time:0.100/real_time 27991 ms 27933 ms 1 bytes_per_second=26.6181G/s footprint=1.49012G
Exclusive_Scan_DR/min_time:0.100/min_warmup_time:0.100/real_time 29720 ms 29667 ms 1 bytes_per_second=25.0693G/s footprint=1.35465G
Exclusive_Scan_DR/min_time:0.100/min_warmup_time:0.100/real_time 30183 ms 30103 ms 1 bytes_per_second=24.6849G/s footprint=1.24176G
Exclusive_Scan_DR/min_time:0.100/min_warmup_time:0.100/real_time 7112 ms 7108 ms 1 bytes_per_second=104.766G/s footprint=14.9012G
Exclusive_Scan_DR/min_time:0.100/min_warmup_time:0.100/real_time 54683 ms 54654 ms 1 bytes_per_second=27.2502G/s footprint=14.9012G
Exclusive_Scan_DR/min_time:0.100/min_warmup_time:0.100/real_time 87625 ms 87567 ms 1 bytes_per_second=25.5085G/s footprint=14.9012G
Exclusive_Scan_DR/min_time:0.100/min_warmup_time:0.100/real_time 148550 ms 148468 ms 1 bytes_per_second=20.0622G/s footprint=14.9012G
Exclusive_Scan_DR/min_time:0.100/min_warmup_time:0.100/real_time 191582 ms 191448 ms 1 bytes_per_second=19.4449G/s footprint=14.9012G
Exclusive_Scan_DR/min_time:0.100/min_warmup_time:0.100/real_time 220317 ms 220133 ms 1 bytes_per_second=20.2905G/s footprint=14.9012G
Exclusive_Scan_DR/min_time:0.100/min_warmup_time:0.100/real_time 265100 ms 264817 ms 1 bytes_per_second=19.6733G/s footprint=14.9012G
Exclusive_Scan_DR/min_time:0.100/min_warmup_time:0.100/real_time 341283 ms 324785 ms 1 bytes_per_second=17.4649G/s footprint=14.9012G
Exclusive_Scan_DR/min_time:0.100/min_warmup_time:0.100/real_time 346785 ms 346147 ms 1 bytes_per_second=19.3362G/s footprint=14.9012G
E.g. for last benchmark it is 5sec vs 346sec.
Check why it is so. For now I am disabling also mhp benchmarks.