distributed-ranges icon indicating copy to clipboard operation
distributed-ranges copied to clipboard

shp benchmarks in borealis failed

Open lslusarczyk opened this issue 2 years ago • 2 comments

https://github.com/intel-sandbox/libraries.runtimes.hpc.dds.dr-ci/actions/runs/6637463788

check it, if newly enabled benchmarks revealed some problem fix it if it is easy or at least comment out appropriate benchmark in shp with comment pointing to this issue

lslusarczyk avatar Oct 25 '23 14:10 lslusarczyk

analysing the failure, currently on Borealis shp-benhc times out, on devcloud my account expired, running locally - out-of-mem (seems shp-bench ignores in some cases vector-size - fixing it...)

in progress...

lslusarczyk avatar Oct 27 '23 11:10 lslusarczyk

ExclusiveScan benchmark in shp fails. See: https://github.com/intel-sandbox/libraries.runtimes.hpc.dds.dr-ci/actions/runs/6703645628

Exact command:

ONEAPI_DEVICE_SELECTOR='level_zero:gpu;ext_oneapi_cuda:gpu' \
KMP_AFFINITY=compact shp/shp-bench --vector-size 2000000000 --reps 50\
 --benchmark_out_format=json --context device:GPU --context model:SHP --context runtime:SYCL\
 --context target:SHP_SYCL_GPU --v=3 --benchmark_out=dr-bench-adc021a8e9a64a6c86da243e79fcb338.json\
 --benchmark_filter=.*Sort_DR\|Gemm_DR\|^DotProduct_DR\|^Exclusive_Scan_DR\|^Inclusive_Scan_DR\|^Reduce_DR --num-devices 6

output with failure:

- LOG(2): Running Exclusive_Scan_DR/min_time:0.100/min_warmup_time:0.100/real_time for 1
free(): invalid next size (fast)

lslusarczyk avatar Oct 31 '23 10:10 lslusarczyk