unified-runtime icon indicating copy to clipboard operation
unified-runtime copied to clipboard

Benchmark updates for faster run and more reliable results

Open mateuszpn opened this issue 1 year ago • 78 comments

mateuszpn avatar Oct 02 '24 13:10 mateuszpn

Compute Benchmarks level_zero run (with params: ): https://github.com/oneapi-src/unified-runtime/actions/runs/11144890890

github-actions[bot] avatar Oct 02 '24 13:10 github-actions[bot]

Compute Benchmarks level_zero run (): https://github.com/oneapi-src/unified-runtime/actions/runs/11144890890 Job status: failure. Test status: skipped.

github-actions[bot] avatar Oct 02 '24 13:10 github-actions[bot]

Compute Benchmarks level_zero run (with params: ): https://github.com/oneapi-src/unified-runtime/actions/runs/11144890890

github-actions[bot] avatar Oct 02 '24 13:10 github-actions[bot]

Compute Benchmarks level_zero run (): https://github.com/oneapi-src/unified-runtime/actions/runs/11144890890 Job status: failure. Test status: skipped.

github-actions[bot] avatar Oct 02 '24 13:10 github-actions[bot]

Compute Benchmarks level_zero run (with params: --save baseline): https://github.com/oneapi-src/unified-runtime/actions/runs/11145360258

github-actions[bot] avatar Oct 02 '24 14:10 github-actions[bot]

Compute Benchmarks level_zero run (--save baseline): https://github.com/oneapi-src/unified-runtime/actions/runs/11145360258 Job status: failure. Test status: skipped.

github-actions[bot] avatar Oct 02 '24 14:10 github-actions[bot]

Compute Benchmarks level_zero run (with params: --save baseline): https://github.com/oneapi-src/unified-runtime/actions/runs/11146412593

github-actions[bot] avatar Oct 02 '24 15:10 github-actions[bot]

Compute Benchmarks level_zero run (--save baseline): https://github.com/oneapi-src/unified-runtime/actions/runs/11146412593 Job status: failure. Test status: failure.

github-actions[bot] avatar Oct 02 '24 15:10 github-actions[bot]

Compute Benchmarks level_zero run (with params: ): https://github.com/oneapi-src/unified-runtime/actions/runs/11146871014

github-actions[bot] avatar Oct 02 '24 15:10 github-actions[bot]

Compute Benchmarks level_zero run (): https://github.com/oneapi-src/unified-runtime/actions/runs/11146871014 Job status: cancelled. Test status: cancelled.

github-actions[bot] avatar Oct 02 '24 15:10 github-actions[bot]

Compute Benchmarks level_zero run (with params: ): https://github.com/oneapi-src/unified-runtime/actions/runs/11147212802

github-actions[bot] avatar Oct 02 '24 15:10 github-actions[bot]

Compute Benchmarks level_zero run (): https://github.com/oneapi-src/unified-runtime/actions/runs/11147212802 Job status: failure. Test status: failure.

github-actions[bot] avatar Oct 02 '24 15:10 github-actions[bot]

Compute Benchmarks level_zero run (with params: --save baseline): https://github.com/oneapi-src/unified-runtime/actions/runs/11158482367

github-actions[bot] avatar Oct 03 '24 09:10 github-actions[bot]

Compute Benchmarks level_zero run (--save baseline): https://github.com/oneapi-src/unified-runtime/actions/runs/11158482367 Job status: failure. Test status: failure.

github-actions[bot] avatar Oct 03 '24 09:10 github-actions[bot]

Compute Benchmarks level_zero run (with params: ): https://github.com/oneapi-src/unified-runtime/actions/runs/11160336350

github-actions[bot] avatar Oct 03 '24 10:10 github-actions[bot]

Compute Benchmarks level_zero run (): https://github.com/oneapi-src/unified-runtime/actions/runs/11160336350 Job status: success. Test status: success.

Summary

No diffs to calculate performance change

(result is better)

Performance change in benchmark groups

Relative perf in group Runtime (8): cannot calculate
Benchmark This PR Relative perf Change -
Runtime_BlockedTransform_iter_512_blocksize_2048 0.072000 ms
Runtime_BlockedTransform_iter_256_blocksize_2048 0.071000 ms
Runtime_BlockedTransform_iter_256_blocksize_1024 0.081000 ms
Runtime_BlockedTransform_iter_128_blocksize_2048 0.166000 ms
Runtime_BlockedTransform_iter_512_blocksize_1024 0.174000 ms
Runtime_BlockedTransform_iter_128_blocksize_1024 0.174000 ms
Runtime_BlockedTransform_iter_64_blocksize_1024 0.079000 ms
Runtime_BlockedTransform_iter_64_blocksize_2048 0.168000 ms

Details

Benchmark details - environment, command, output...
Runtime_BlockedTransform_iter_512_blocksize_2048

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=2049 --local=1024

Output:

['Runtime_BlockedTransform_iter_512_blocksize_2048', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '1024', '2049', '0.000109', '0.000072', '0.000065', '0.000065 0.000066 0.000072 0.000169 0.000171', '0.000056', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_256_blocksize_2048

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=2049 --local=1024

Output:

['Runtime_BlockedTransform_iter_256_blocksize_2048', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '1024', '2049', '0.000107', '0.000071', '0.000060', '0.000060 0.000060 0.000071 0.000165 0.000180', '0.000060', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_256_blocksize_1024

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=2049 --local=1024

Output:

['Runtime_BlockedTransform_iter_256_blocksize_1024', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '1024', '2049', '0.000124', '0.000081', '0.000066', '0.000066 0.000069 0.000081 0.000174 0.000231', '0.000075', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_128_blocksize_2048

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=2049 --local=1024

Output:

['Runtime_BlockedTransform_iter_128_blocksize_2048', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '1024', '2049', '0.000138', '0.000166', '0.000058', '0.000058 0.000059 0.000166 0.000171 0.000236', '0.000078', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_512_blocksize_1024

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=2049 --local=1024

Output:

['Runtime_BlockedTransform_iter_512_blocksize_1024', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '1024', '2049', '0.000141', '0.000174', '0.000082', '0.000082 0.000086 0.000174 0.000175 0.000188', '0.000052', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_128_blocksize_1024

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=2049 --local=1024

Output:

['Runtime_BlockedTransform_iter_128_blocksize_1024', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '1024', '2049', '0.000143', '0.000174', '0.000069', '0.000069 0.000091 0.000174 0.000183 0.000195', '0.000058', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_64_blocksize_1024

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=2049 --local=1024

Output:

['Runtime_BlockedTransform_iter_64_blocksize_1024', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '1024', '2049', '0.000437', '0.000079', '0.000064', '0.000064 0.000076 0.000079 0.000201 0.001762', '0.000743', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_64_blocksize_2048

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=2049 --local=1024

Output:

['Runtime_BlockedTransform_iter_64_blocksize_2048', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '1024', '2049', '0.000145', '0.000168', '0.000086', '0.000086 0.000109 0.000168 0.000168 0.000194', '0.000046', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

github-actions[bot] avatar Oct 03 '24 10:10 github-actions[bot]

Compute Benchmarks level_zero run (with params: --save baseline): https://github.com/oneapi-src/unified-runtime/actions/runs/11160575425

github-actions[bot] avatar Oct 03 '24 10:10 github-actions[bot]

Compute Benchmarks level_zero run (--save baseline): https://github.com/oneapi-src/unified-runtime/actions/runs/11160575425 Job status: success. Test status: success.

Summary

No diffs to calculate performance change

(result is better)

Performance change in benchmark groups

Relative perf in group api (6): cannot calculate
Benchmark This PR Relative perf Change -
api_overhead_benchmark_sycl SubmitKernel out of order 25.430000 μs
api_overhead_benchmark_sycl SubmitKernel in order 25.333000 μs
api_overhead_benchmark_ur SubmitKernel out of order 17.647000 μs
api_overhead_benchmark_ur SubmitKernel in order 13.226000 μs
api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024 2.157000 μs
api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024 1.663000 μs
Relative perf in group memory (4): cannot calculate
Benchmark This PR Relative perf Change -
memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024 226.426000 μs
memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024 113.628000 μs
memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024 5.745000 μs
memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240 3.233000 μs
Relative perf in group miscellaneous (1): cannot calculate
Benchmark This PR Relative perf Change -
miscellaneous_benchmark_sycl VectorSum 858.160000 μs
Relative perf in group Velocity-Bench (5): cannot calculate
Benchmark This PR Relative perf Change -
Velocity-Bench Hashtable 361.439400 M keys/sec
Velocity-Bench Bitcracker 35.562800 s
Velocity-Bench CudaSift 218.822000 ms
Velocity-Bench QuickSilver 118.210000 MMS/CTT
Velocity-Bench Sobel Filter 551.852000 ms
Relative perf in group Runtime (16): cannot calculate
Benchmark This PR Relative perf Change -
Runtime_BlockedTransform_iter_256_blocksize_1024 0.076000 ms
Runtime_BlockedTransform_iter_64_blocksize_2048 0.072000 ms
Runtime_BlockedTransform_iter_512_blocksize_1024 0.173000 ms
Runtime_BlockedTransform_iter_256_blocksize_2048 0.061000 ms
Runtime_BlockedTransform_iter_512_blocksize_2048 0.169000 ms
Runtime_BlockedTransform_iter_128_blocksize_1024 0.088000 ms
Runtime_BlockedTransform_iter_64_blocksize_1024 0.241000 ms
Runtime_BlockedTransform_iter_128_blocksize_2048 0.062000 ms
Runtime_IndependentDAGTaskThroughput_SingleTask 271.280000 ms
Runtime_IndependentDAGTaskThroughput_NDRangeParallelFor 273.134000 ms
Runtime_IndependentDAGTaskThroughput_HierarchicalParallelFor 275.897000 ms
Runtime_IndependentDAGTaskThroughput_BasicParallelFor 276.025000 ms
Runtime_DAGTaskThroughput_NDRangeParallelFor 1794.983000 ms
Runtime_DAGTaskThroughput_BasicParallelFor 1703.181000 ms
Runtime_DAGTaskThroughput_HierarchicalParallelFor 1696.627000 ms
Runtime_DAGTaskThroughput_SingleTask 1649.814000 ms
Relative perf in group MicroBench (16): cannot calculate
Benchmark This PR Relative perf Change -
MicroBench_HostDeviceBandwidth_1D_H2D_Strided 4.693000 ms
MicroBench_HostDeviceBandwidth_1D_D2H_Strided 4.874000 ms
MicroBench_HostDeviceBandwidth_3D_H2D_Contiguous 5.001000 ms
MicroBench_HostDeviceBandwidth_3D_H2D_Strided 4.917000 ms
MicroBench_HostDeviceBandwidth_3D_D2H_Strided 617.648000 ms
MicroBench_HostDeviceBandwidth_1D_D2H_Contiguous 5.126000 ms
MicroBench_HostDeviceBandwidth_1D_H2D_Contiguous 4.903000 ms
MicroBench_HostDeviceBandwidth_2D_D2H_Strided 617.440000 ms
MicroBench_HostDeviceBandwidth_2D_H2D_Contiguous 5.022000 ms
MicroBench_HostDeviceBandwidth_2D_H2D_Strided 4.906000 ms
MicroBench_HostDeviceBandwidth_2D_D2H_Contiguous 618.361000 ms
MicroBench_HostDeviceBandwidth_3D_D2H_Contiguous 618.374000 ms
MicroBench_LocalMem_fp32_4096 30.433000 ms
MicroBench_LocalMem_int32_4096 30.379000 ms
MicroBench_Arith_fp32_512 0.019000 ms
MicroBench_Arith_int32_512 0.037000 ms
Relative perf in group Pattern (10): cannot calculate
Benchmark This PR Relative perf Change -
Pattern_Reduction_NDRange_int32 16.333000 ms
Pattern_Reduction_Hierarchical_int32 16.204000 ms
Pattern_SegmentedReduction_Hierarchical_int16 12.216000 ms
Pattern_SegmentedReduction_NDRange_fp32 5.715000 ms
Pattern_SegmentedReduction_NDRange_int64 6.193000 ms
Pattern_SegmentedReduction_Hierarchical_int64 12.256000 ms
Pattern_SegmentedReduction_Hierarchical_fp32 12.051000 ms
Pattern_SegmentedReduction_NDRange_int32 5.720000 ms
Pattern_SegmentedReduction_NDRange_int16 6.078000 ms
Pattern_SegmentedReduction_Hierarchical_int32 12.056000 ms
Relative perf in group ScalarProduct (6): cannot calculate
Benchmark This PR Relative perf Change -
ScalarProduct_NDRange_fp32 6.352000 ms
ScalarProduct_NDRange_int64 8.233000 ms
ScalarProduct_Hierarchical_int64 11.557000 ms
ScalarProduct_NDRange_int32 6.330000 ms
ScalarProduct_Hierarchical_fp32 10.263000 ms
ScalarProduct_Hierarchical_int32 10.595000 ms
Relative perf in group USM (7): cannot calculate
Benchmark This PR Relative perf Change -
USM_Allocation_latency_fp32_shared 0.137000 ms
USM_Allocation_latency_fp32_host 37.346000 ms
USM_Allocation_latency_fp32_device 0.145000 ms
USM_Instr_Mix_fp32_device_1:1mix_no_init_no_prefetch 1.801000 ms
USM_Instr_Mix_fp32_device_1:1mix_with_init_no_prefetch 1.649000 ms
USM_Instr_Mix_fp32_host_1:1mix_no_init_no_prefetch 1.192000 ms
USM_Instr_Mix_fp32_host_1:1mix_with_init_no_prefetch 1.035000 ms
Relative perf in group VectorAddition (3): cannot calculate
Benchmark This PR Relative perf Change -
VectorAddition_int32 1.447000 ms
VectorAddition_fp32 1.449000 ms
VectorAddition_int64 3.075000 ms
Relative perf in group Polybench (4): cannot calculate
Benchmark This PR Relative perf Change -
Polybench_2DConvolution 0.194000 ms
Polybench_2mm 1.223000 ms
Polybench_3mm 1.725000 ms
Polybench_Atax 6.736000 ms
Relative perf in group ReductionAtomic (4): cannot calculate
Benchmark This PR Relative perf Change -
ReductionAtomic_fp64 0.020000 ms
ReductionAtomic_int32 0.012000 ms
ReductionAtomic_int64 0.010000 ms
ReductionAtomic_fp32 0.020000 ms
Relative perf in group Kmeans (1): cannot calculate
Benchmark This PR Relative perf Change -
Kmeans_fp32 16.170000 ms
Relative perf in group LinearRegressionCoeff (1): cannot calculate
Benchmark This PR Relative perf Change -
LinearRegressionCoeff_fp32 966.801000 ms
Relative perf in group LinearRegression (1): cannot calculate
Benchmark This PR Relative perf Change -
LinearRegression_fp32 0.427000 ms
Relative perf in group MolecularDynamics (1): cannot calculate
Benchmark This PR Relative perf Change -
MolecularDynamics 0.027000 ms

Details

Benchmark details - environment, command, output...
api_overhead_benchmark_sycl SubmitKernel out of order

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type SubmitKernel(api=sycl Profiling=0 Ioq=0 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),25.430,25.726,5.87%,21.891,357.199,[CPU],[us]

api_overhead_benchmark_sycl SubmitKernel in order

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type SubmitKernel(api=sycl Profiling=0 Ioq=1 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),25.333,25.319,3.41%,23.969,268.200,[CPU],[us]

api_overhead_benchmark_ur SubmitKernel out of order

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type SubmitKernel(api=ur Profiling=0 Ioq=0 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),17.647,17.625,4.63%,16.433,250.023,[CPU],[us]

api_overhead_benchmark_ur SubmitKernel in order

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type SubmitKernel(api=ur Profiling=0 Ioq=1 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),13.226,13.215,1.82%,12.533,48.001,[CPU],[us]

memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueInOrderMemcpy --csv --noHeaders --iterations=10000 --IsCopyOnly=0 --sourcePlacement=Device --destinationPlacement=Device --size=1024 --count=100

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type QueueInOrderMemcpy(api=sycl IsCopyOnly=0 sourcePlacement=Device destinationPlacement=Device size=1KB count=100),226.426,226.335,1.07%,220.607,435.450,[CPU],[us]

memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueInOrderMemcpy --csv --noHeaders --iterations=10000 --IsCopyOnly=0 --sourcePlacement=Host --destinationPlacement=Device --size=1024 --count=100

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type QueueInOrderMemcpy(api=sycl IsCopyOnly=0 sourcePlacement=Host destinationPlacement=Device size=1KB count=100),113.628,113.585,0.85%,111.127,171.830,[CPU],[us]

memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueMemcpy --csv --noHeaders --iterations=10000 --sourcePlacement=Device --destinationPlacement=Device --size=1024

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type QueueMemcpy(api=sycl sourcePlacement=Device destinationPlacement=Device size=1KB),5.745,5.565,11.05%,5.185,34.508,[CPU],[us]

memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=StreamMemory --csv --noHeaders --iterations=10000 --type=Triad --size=10240 --memoryPlacement=Device --useEvents=0 --contents=Zeros

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type StreamMemory(api=sycl type=Triad size=10KB useEvents=0 contents=Zeros memoryPlacement=Device),3.233,3.254,3.57%,0.516,3.464,[CPU],[GB/s]

api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=ExecImmediateCopyQueue --csv --noHeaders --iterations=100000 --ioq=0 --IsCopyOnly=1 --MeasureCompletionTime=0 --src=Device --dst=Device --size=1024

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type ExecImmediateCopyQueue(api=sycl IsCopyOnly=1 MeasureCompletionTime=0 src=Device dst=Device size=1KB ioq=0),2.157,2.152,6.26%,1.958,33.815,[CPU],[us]

api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=ExecImmediateCopyQueue --csv --noHeaders --iterations=100000 --ioq=1 --IsCopyOnly=1 --MeasureCompletionTime=0 --src=Host --dst=Host --size=1024

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type ExecImmediateCopyQueue(api=sycl IsCopyOnly=1 MeasureCompletionTime=0 src=Host dst=Host size=1KB ioq=1),1.663,1.657,5.71%,1.568,23.445,[CPU],[us]

miscellaneous_benchmark_sycl VectorSum

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/miscellaneous_benchmark_sycl --test=VectorSum --csv --noHeaders --iterations=1000 --numberOfElementsX=512 --numberOfElementsY=256 --numberOfElementsZ=256

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type VectorSum(api=sycl numberOfElementsX=512 numberOfElementsY=256 numberOfElementsZ=256),858.160,858.609,0.39%,807.892,867.787,[GPU],bw [GB/s]

Velocity-Bench Hashtable

Environment Variables:

Command:

/home/test-user/bench_workdir/hashtable/hashtable_sycl --no-verify

Output:

hashtable - total time for whole calculation: 0.371342 s 361.439400 million keys/second

Velocity-Bench Bitcracker

Environment Variables:

Command:

/home/test-user/bench_workdir/bitcracker/bitcracker -f /home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/img_win8_user_hash.txt -d /home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/user_passwords_60000.txt -b 60000

Output:

---------> BitCracker: BitLocker password cracking tool <---------

================================== Retrieving Info

Reading hash file "/home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/img_win8_user_hash.txt"

              Attack

================================================ Type of attack: User Password Psw per thread: 1 max_num_pswd_per_read: 60000 Dictionary: /home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/user_passwords_60000.txt MAC Comparison (-m): Yes

Iter: 1, num passwords read: 60000 Kernel execution: Effective passwords: 60000 Passwords Range: npknpByH7N2m3OnLNH1X9DJxLrzIFWk ..... dL_7uuf3QCz-c6K3xDu0

================================================ Bitcracker attack completed Total passwords evaluated: 60000 Password not found!

time to subtract from total: 0.00421934 s bitcracker - total time for whole calculation: 35.5628 s

Velocity-Bench CudaSift

Environment Variables:

Command:

/home/test-user/bench_workdir/cudaSift/cudaSift

Output:

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1237 1272 33.5868% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1222 1255 33.1795% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1208 1264 32.7993% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1222 1257 33.1795% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1239 1275 33.6411% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1230 1266 33.3967% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1222 1255 33.1795% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1224 1257 33.2338% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1233 1265 33.4781% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1232 1264 33.451% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1228 1262 33.3424% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1218 1254 33.0709% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1223 1266 33.2066% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1089 1255 29.5683% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1232 1265 33.451% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1226 1262 33.2881% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1214 1261 32.9623% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1224 1274 33.2338% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1235 1266 33.5324% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1237 1268 33.5868% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1232 1266 33.451% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1072 1257 29.1067% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1123 1259 30.4914% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1240 1274 33.6682% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1217 1249 33.0437% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1233 1267 33.4781% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1169 1266 31.7404% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1225 1258 33.2609% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1228 1266 33.3424% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1028 1265 27.912% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1137 1262 30.8716% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1233 1266 33.4781% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1235 1270 33.5324% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1217 1252 33.0437% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1216 1252 33.0166% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1212 1265 32.908% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1241 1275 33.6954% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1100 1262 29.867% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1227 1262 33.3152% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1234 1269 33.5053% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1228 1261 33.3424% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1231 1263 33.4238% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1095 1272 29.7312% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1234 1271 33.5053% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1105 1260 30.0027% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1099 1259 29.8398% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1213 1252 32.9351% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1234 1269 33.5053% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1238 1271 33.6139% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1238 1272 33.6139% 1 2

Performing data verification Data verification is SUCCESSFUL.

Avg workload time = 218.822 ms

Velocity-Bench QuickSilver

Environment Variables:

QS_DEVICE=GPU

Command:

/home/test-user/bench_workdir/QuickSilver/qs -i /home/test-user/bench_workdir/velocity-bench-repo/QuickSilver/Examples/AllScattering/scatteringOnly.inp

Output:

Copyright (c) 2016 Lawrence Livermore National Security, LLC All Rights Reserved Quicksilver Version : Quicksilver Git Hash : MPI Version : 3.0 Number of MPI ranks : 1 Number of OpenMP Threads: 1 Number of OpenMP CPUs : 1

Loading params Finished loading params Simulation: dt: 1e-08 fMax: 0.1 inputFile: /home/test-user/bench_workdir/velocity-bench-repo/QuickSilver/Examples/AllScattering/scatteringOnly.inp energySpectrum: boundaryCondition: octant loadBalance: 1 cycleTimers: 0 debugThreads: 0 lx: 100 ly: 100 lz: 100 nParticles: 10000000 batchSize: 0 nBatches: 10 nSteps: 10 nx: 10 ny: 10 nz: 10 seed: 1029384756 xDom: 0 yDom: 0 zDom: 0 eMax: 20 eMin: 1e-09 nGroups: 230 lowWeightCutoff: 0.001 bTally: 1 fTally: 1 cTally: 1 coralBenchmark: 0 crossSectionsOut:

Geometry: material: sourceMaterial shape: brick xMax: 100 xMin: 0 yMax: 100 yMin: 0 zMax: 100 zMin: 0

Material: name: sourceMaterial mass: 1000 nIsotopes: 10 nReactions: 9 sourceRate: 1e+10 totalCrossSection: 0.1 absorptionCrossSection: flat fissionCrossSection: flat scatteringCrossSection: flat absorptionCrossSectionRatio: 0 fissionCrossSectionRatio: 0 scatteringCrossSectionRatio: 1

CrossSection: name: flat A: 0 B: 0 C: 0 D: 0 E: 1 nuBar: 2.4 setting GPU setting parameters Building partition 0 Building partition 1 Building partition 2 Building partition 3 Building MC_Domain 0 Building MC_Domain 1 Building MC_Domain 2 Building MC_Domain 3 Starting Consistency Check Finished Consistency Check Finished initMesh Started copyMaterialDatabase_device Finished copyMaterialDatabase_device Finished copyNuclearData_device Finished copyDomainDevice cycle start source rr split absorb scatter fission produce collisn escape census num_seg scalar_flux cycleInit cycleTracking cycleFinalize 0 0 1000000 0 9000000 0 18533189 0 0 18533189 1151780 8848220 55527935 1.854923e+09 4.303970e-01 6.148200e-01 0.000000e+00 1 8848220 1000000 0 151478 0 34281997 0 0 34281997 1664159 8335539 94633679 5.047651e+09 3.638310e-01 7.487510e-01 0.000000e+00 2 8335539 1000000 0 663717 0 34354432 0 0 34354432 1366771 8632485 95010375 7.705930e+09 3.329380e-01 7.640800e-01 0.000000e+00 3 8632485 1000000 0 367978 0 34302727 0 0 34302727 1242216 8758247 94953591 9.992076e+09 3.678010e-01 8.284720e-01 0.000000e+00 4 8758247 1000000 0 242076 0 34141236 0 0 34141236 1168452 8831871 94599337 1.199834e+10 3.590150e-01 7.972970e-01 0.000000e+00 5 8831871 1000000 0 168070 0 33948724 0 0 33948724 1121156 8878785 94148236 1.377636e+10 3.596570e-01 7.672700e-01 0.000000e+00 6 8878785 1000000 0 120572 0 33760567 0 0 33760567 1089103 8910254 93689264 1.535668e+10 3.289830e-01 7.648450e-01 0.000000e+00 7 8910254 1000000 0 89810 0 33552179 0 0 33552179 1065203 8934861 93216931 1.676993e+10 3.299110e-01 7.897290e-01 0.000000e+00 8 8934861 1000000 0 65491 0 33384605 0 0 33384605 1047720 8952632 92768273 1.804559e+10 3.296550e-01 7.852080e-01 0.000000e+00 9 8952632 1000000 0 47165 0 33198494 0 0 33198494 1033968 8965829 92324678 1.920208e+10 3.305270e-01 7.601730e-01 0.000000e+00

Timer Cumulative Cumulative Cumulative Cumulative Cumulative Cumulative Name number microSecs microSecs microSecs microSecs Efficiency of calls min avg max stddev Rating main 1 1.115e+07 1.115e+07 1.115e+07 0.000e+00 100.00 cycleInit 10 3.533e+06 3.533e+06 3.533e+06 0.000e+00 100.00 cycleTracking 10 7.621e+06 7.621e+06 7.621e+06 0.000e+00 100.00 cycleTracking_Kernel 104 4.935e+06 4.935e+06 4.935e+06 0.000e+00 100.00 cycleTracking_MPI 117 2.123e+05 2.123e+05 2.123e+05 0.000e+00 100.00 cycleTracking_Test_Done 0 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.00 cycleFinalize 20 4.800e+02 4.800e+02 4.800e+02 0.000e+00 100.00 Figure Of Merit 118.21 [Num Mega Segments / Cycle Tracking Time]

Velocity-Bench Sobel Filter

Environment Variables:

OPENCV_IO_MAX_IMAGE_PIXELS=1677721600

Command:

/home/test-user/bench_workdir/sobel_filter/sobel_filter -i /home/test-user/bench_workdir/data/sobel_filter/sobel_filter_data/silverfalls_32Kx32K.png -n 5

Output:

SYMN: Welcome to the SYCL version of Sobel filter workload. SYMN: Input image file: /home/test-user/bench_workdir/data/sobel_filter/sobel_filter_data/silverfalls_32Kx32K.png SYMN: Launching SYCL kernel with # of iterations: 5 time to subtract from total: 7.47102 s sobelfilter - total time for whole calculation: 0.551852 s

Runtime_BlockedTransform_iter_256_blocksize_1024

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=2049 --local=1024

Output:

['Runtime_BlockedTransform_iter_256_blocksize_1024', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '1024', '2049', '0.000092', '0.000076', '0.000068', '0.000068 0.000069 0.000076 0.000078 0.000167', '0.000042', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_64_blocksize_2048

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=2049 --local=1024

Output:

['Runtime_BlockedTransform_iter_64_blocksize_2048', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '1024', '2049', '0.000117', '0.000072', '0.000059', '0.000059 0.000072 0.000072 0.000118 0.000266', '0.000086', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_512_blocksize_1024

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=2049 --local=1024

Output:

['Runtime_BlockedTransform_iter_512_blocksize_1024', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '1024', '2049', '0.000159', '0.000173', '0.000092', '0.000092 0.000172 0.000173 0.000176 0.000181', '0.000037', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_256_blocksize_2048

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=2049 --local=1024

Output:

['Runtime_BlockedTransform_iter_256_blocksize_2048', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '1024', '2049', '0.000083', '0.000061', '0.000058', '0.000058 0.000058 0.000061 0.000071 0.000166', '0.000047', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_512_blocksize_2048

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=2049 --local=1024

Output:

['Runtime_BlockedTransform_iter_512_blocksize_2048', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '1024', '2049', '0.000168', '0.000169', '0.000162', '0.000162 0.000166 0.000169 0.000169 0.000172', '0.000004', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_128_blocksize_1024

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=2049 --local=1024

Output:

['Runtime_BlockedTransform_iter_128_blocksize_1024', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '1024', '2049', '0.000116', '0.000088', '0.000067', '0.000067 0.000069 0.000088 0.000173 0.000183', '0.000057', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_64_blocksize_1024

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=2049 --local=1024

Output:

['Runtime_BlockedTransform_iter_64_blocksize_1024', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '1024', '2049', '0.000491', '0.000241', '0.000065', '0.000065 0.000071 0.000241 0.000245 0.001830', '0.000754', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_BlockedTransform_iter_128_blocksize_2048

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=2049 --local=1024

Output:

['Runtime_BlockedTransform_iter_128_blocksize_2048', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '1024', '2049', '0.000088', '0.000062', '0.000059', '0.000059 0.000062 0.000062 0.000069 0.000187', '0.000056', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_IndependentDAGTaskThroughput_SingleTask

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_independent --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/IndependentDAGTaskThroughput_multi.csv --size=32768

Output:

['Runtime_IndependentDAGTaskThroughput_SingleTask', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '32768', '0.274622', '0.271280', '0.262243', '0.262243 0.270004 0.271280 0.274935 0.294648', '0.012113', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_IndependentDAGTaskThroughput_NDRangeParallelFor

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_independent --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/IndependentDAGTaskThroughput_multi.csv --size=32768

Output:

['Runtime_IndependentDAGTaskThroughput_NDRangeParallelFor', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '32768', '0.272927', '0.273134', '0.271921', '0.271921 0.272357 0.273134 0.273243 0.273982', '0.000805', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_IndependentDAGTaskThroughput_HierarchicalParallelFor

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_independent --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/IndependentDAGTaskThroughput_multi.csv --size=32768

Output:

['Runtime_IndependentDAGTaskThroughput_HierarchicalParallelFor', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '32768', '0.280174', '0.275897', '0.273980', '0.273980 0.275569 0.275897 0.276162 0.299265', '0.010706', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_IndependentDAGTaskThroughput_BasicParallelFor

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_independent --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/IndependentDAGTaskThroughput_multi.csv --size=32768

Output:

['Runtime_IndependentDAGTaskThroughput_BasicParallelFor', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '32768', '0.278067', '0.276025', '0.269861', '0.269861 0.271613 0.276025 0.279537 0.293297', '0.009317', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_DAGTaskThroughput_NDRangeParallelFor

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_sequential --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/DAGTaskThroughput_multi.csv --size=327680

Output:

['Runtime_DAGTaskThroughput_NDRangeParallelFor', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '327680', '1.793923', '1.794983', '1.790988', '1.790988 1.791860 1.794983 1.795719 1.796067', '0.002335', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_DAGTaskThroughput_BasicParallelFor

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_sequential --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/DAGTaskThroughput_multi.csv --size=327680

Output:

['Runtime_DAGTaskThroughput_BasicParallelFor', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '327680', '1.704656', '1.703181', '1.701674', '1.701674 1.701734 1.703181 1.706069 1.710621', '0.003781', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_DAGTaskThroughput_HierarchicalParallelFor

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_sequential --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/DAGTaskThroughput_multi.csv --size=327680

Output:

['Runtime_DAGTaskThroughput_HierarchicalParallelFor', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '327680', '1.732877', '1.696627', '1.692526', '1.692526 1.692941 1.696627 1.768344 1.813945', '0.055603', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_DAGTaskThroughput_SingleTask

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_sequential --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/DAGTaskThroughput_multi.csv --size=327680

Output:

['Runtime_DAGTaskThroughput_SingleTask', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '327680', '1.654001', '1.649814', '1.648181', '1.648181 1.649490 1.649814 1.655479 1.667043', '0.007811', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

MicroBench_HostDeviceBandwidth_1D_H2D_Strided

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

Output:

['MicroBench_HostDeviceBandwidth_1D_H2D_Strided', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.004731', '0.004693', '0.004669', '0.004669 0.004684 0.004693 0.004799 0.004808', '0.000067', '26.772001', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']

MicroBench_HostDeviceBandwidth_1D_D2H_Strided

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

Output:

['MicroBench_HostDeviceBandwidth_1D_D2H_Strided', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.004956', '0.004874', '0.004778', '0.004778 0.004831 0.004874 0.005104 0.005191', '0.000181', '26.160128', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']

MicroBench_HostDeviceBandwidth_3D_H2D_Contiguous

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

Output:

['MicroBench_HostDeviceBandwidth_3D_H2D_Contiguous', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.004993', '0.005001', '0.004937', '0.004937 0.004963 0.005001 0.005021 0.005042', '0.000043', '25.320107', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']

MicroBench_HostDeviceBandwidth_3D_H2D_Strided

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

Output:

['MicroBench_HostDeviceBandwidth_3D_H2D_Strided', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.004854', '0.004917', '0.004562', '0.004562 0.004853 0.004917 0.004962 0.004976', '0.000170', '27.401428', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']

MicroBench_HostDeviceBandwidth_3D_D2H_Strided

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

Output:

['MicroBench_HostDeviceBandwidth_3D_D2H_Strided', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.617645', '0.617648', '0.617522', '0.617522 0.617612 0.617648 0.617711 0.617731', '0.000084', '0.202422', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']

MicroBench_HostDeviceBandwidth_1D_D2H_Contiguous

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

Output:

['MicroBench_HostDeviceBandwidth_1D_D2H_Contiguous', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.005112', '0.005126', '0.005030', '0.005030 0.005078 0.005126 0.005136 0.005191', '0.000061', '24.850786', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']

MicroBench_HostDeviceBandwidth_1D_H2D_Contiguous

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

Output:

['MicroBench_HostDeviceBandwidth_1D_H2D_Contiguous', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.005249', '0.004903', '0.004793', '0.004793 0.004836 0.004903 0.004955 0.006757', '0.000846', '26.077627', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']

MicroBench_HostDeviceBandwidth_2D_D2H_Strided

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

Output:

['MicroBench_HostDeviceBandwidth_2D_D2H_Strided', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.617479', '0.617440', '0.617293', '0.617293 0.617417 0.617440 0.617519 0.617725', '0.000160', '0.202497', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']

MicroBench_HostDeviceBandwidth_2D_H2D_Contiguous

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

Output:

['MicroBench_HostDeviceBandwidth_2D_H2D_Contiguous', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.005027', '0.005022', '0.004937', '0.004937 0.004962 0.005022 0.005092 0.005123', '0.000080', '25.318332', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']

MicroBench_HostDeviceBandwidth_2D_H2D_Strided

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

Output:

['MicroBench_HostDeviceBandwidth_2D_H2D_Strided', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.004842', '0.004906', '0.004729', '0.004729 0.004731 0.004906 0.004913 0.004932', '0.000103', '26.434858', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']

MicroBench_HostDeviceBandwidth_2D_D2H_Contiguous

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

Output:

['MicroBench_HostDeviceBandwidth_2D_D2H_Contiguous', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.618351', '0.618361', '0.618271', '0.618271 0.618352 0.618361 0.618370 0.618400', '0.000048', '0.202177', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']

MicroBench_HostDeviceBandwidth_3D_D2H_Contiguous

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

Output:

['MicroBench_HostDeviceBandwidth_3D_D2H_Contiguous', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.618328', '0.618374', '0.618116', '0.618116 0.618316 0.618374 0.618415 0.618421', '0.000126', '0.202227', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']

MicroBench_LocalMem_fp32_4096

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/local_mem --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/LocalMem_multi.csv --size=10240000

Output:

['MicroBench_LocalMem_fp32_4096', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '10240000', '0.030430', '0.030433', '0.030367', '0.030367 0.030421 0.030433 0.030451 0.030480', '0.000042', '10274.406342', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '312.000000']

MicroBench_LocalMem_int32_4096

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/local_mem --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/LocalMem_multi.csv --size=10240000

Output:

['MicroBench_LocalMem_int32_4096', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '10240000', '0.030385', '0.030379', '0.030320', '0.030320 0.030373 0.030379 0.030393 0.030462', '0.000051', '10290.324690', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '312.000000']

Pattern_Reduction_NDRange_int32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/reduction --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Pattern_Reduction_multi.csv --size=10240000

Output:

['Pattern_Reduction_NDRange_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '10240000', '0.016354', '0.016333', '0.016189', '0.016189 0.016260 0.016333 0.016423 0.016565', '0.000146', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_Reduction_Hierarchical_int32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/reduction --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Pattern_Reduction_multi.csv --size=10240000

Output:

['Pattern_Reduction_Hierarchical_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '10240000', '0.016367', '0.016204', '0.016132', '0.016132 0.016174 0.016204 0.016507 0.016818', '0.000292', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

ScalarProduct_NDRange_fp32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/ScalarProduct_multi.csv --size=102400000

Output:

['ScalarProduct_NDRange_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.006377', '0.006352', '0.006335', '0.006335 0.006349 0.006352 0.006374 0.006474', '0.000056', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

ScalarProduct_NDRange_int64

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/ScalarProduct_multi.csv --size=102400000

Output:

['ScalarProduct_NDRange_int64', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.008235', '0.008233', '0.008216', '0.008216 0.008227 0.008233 0.008246 0.008255', '0.000015', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

ScalarProduct_Hierarchical_int64

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/ScalarProduct_multi.csv --size=102400000

Output:

['ScalarProduct_Hierarchical_int64', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.011566', '0.011557', '0.011543', '0.011543 0.011550 0.011557 0.011590 0.011591', '0.000023', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

ScalarProduct_NDRange_int32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/ScalarProduct_multi.csv --size=102400000

Output:

['ScalarProduct_NDRange_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.006349', '0.006330', '0.006310', '0.006310 0.006330 0.006330 0.006336 0.006438', '0.000051', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

ScalarProduct_Hierarchical_fp32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/ScalarProduct_multi.csv --size=102400000

Output:

['ScalarProduct_Hierarchical_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.010249', '0.010263', '0.010203', '0.010203 0.010242 0.010263 0.010263 0.010273', '0.000028', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

ScalarProduct_Hierarchical_int32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/ScalarProduct_multi.csv --size=102400000

Output:

['ScalarProduct_Hierarchical_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.010595', '0.010595', '0.010531', '0.010531 0.010569 0.010595 0.010630 0.010649', '0.000047', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_SegmentedReduction_Hierarchical_int16

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000

Output:

['Pattern_SegmentedReduction_Hierarchical_int16', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.012220', '0.012216', '0.012213', '0.012213 0.012215 0.012216 0.012222 0.012235', '0.000009', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_SegmentedReduction_NDRange_fp32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000

Output:

['Pattern_SegmentedReduction_NDRange_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.005716', '0.005715', '0.005712', '0.005712 0.005714 0.005715 0.005718 0.005722', '0.000004', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_SegmentedReduction_NDRange_int64

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000

Output:

['Pattern_SegmentedReduction_NDRange_int64', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.006193', '0.006193', '0.006182', '0.006182 0.006192 0.006193 0.006197 0.006202', '0.000007', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_SegmentedReduction_Hierarchical_int64

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000

Output:

['Pattern_SegmentedReduction_Hierarchical_int64', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.012259', '0.012256', '0.012235', '0.012235 0.012247 0.012256 0.012259 0.012297', '0.000023', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_SegmentedReduction_Hierarchical_fp32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000

Output:

['Pattern_SegmentedReduction_Hierarchical_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.012053', '0.012051', '0.012034', '0.012034 0.012046 0.012051 0.012052 0.012083', '0.000018', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_SegmentedReduction_NDRange_int32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000

Output:

['Pattern_SegmentedReduction_NDRange_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.005720', '0.005720', '0.005713', '0.005713 0.005714 0.005720 0.005725 0.005730', '0.000007', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_SegmentedReduction_NDRange_int16

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000

Output:

['Pattern_SegmentedReduction_NDRange_int16', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.006082', '0.006078', '0.006076', '0.006076 0.006077 0.006078 0.006083 0.006097', '0.000009', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_SegmentedReduction_Hierarchical_int32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000

Output:

['Pattern_SegmentedReduction_Hierarchical_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.012057', '0.012056', '0.012049', '0.012049 0.012050 0.012056 0.012057 0.012070', '0.000008', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Allocation_latency_fp32_shared

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_allocation_latency --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/USM_Allocation_latency_multi.csv --size=1024000000

Output:

['USM_Allocation_latency_fp32_shared', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '1024000000', '0.000138', '0.000137', '0.000136', '0.000136 0.000137 0.000137 0.000137 0.000142', '0.000002', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Allocation_latency_fp32_host

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_allocation_latency --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/USM_Allocation_latency_multi.csv --size=1024000000

Output:

['USM_Allocation_latency_fp32_host', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '1024000000', '0.037358', '0.037346', '0.037234', '0.037234 0.037322 0.037346 0.037386 0.037501', '0.000097', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Allocation_latency_fp32_device

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_allocation_latency --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/USM_Allocation_latency_multi.csv --size=1024000000

Output:

['USM_Allocation_latency_fp32_device', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '1024000000', '0.000183', '0.000145', '0.000047', '0.000047 0.000138 0.000145 0.000169 0.000415', '0.000138', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Instr_Mix_fp32_device_1:1mix_no_init_no_prefetch

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/USM_Instr_Mix_multi.csv --size=8192

Output:

['USM_Instr_Mix_fp32_device_1:1mix_no_init_no_prefetch', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '8192', '0.001827', '0.001801', '0.001797', '0.001797 0.001799 0.001801 0.001858 0.001882', '0.000040', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Instr_Mix_fp32_device_1:1mix_with_init_no_prefetch

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/USM_Instr_Mix_multi.csv --size=8192

Output:

['USM_Instr_Mix_fp32_device_1:1mix_with_init_no_prefetch', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '8192', '0.001974', '0.001649', '0.001638', '0.001638 0.001642 0.001649 0.001651 0.003289', '0.000736', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Instr_Mix_fp32_host_1:1mix_no_init_no_prefetch

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/USM_Instr_Mix_multi.csv --size=8192

Output:

['USM_Instr_Mix_fp32_host_1:1mix_no_init_no_prefetch', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '8192', '0.001194', '0.001192', '0.001189', '0.001189 0.001192 0.001192 0.001193 0.001202', '0.000005', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Instr_Mix_fp32_host_1:1mix_with_init_no_prefetch

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/USM_Instr_Mix_multi.csv --size=8192

Output:

['USM_Instr_Mix_fp32_host_1:1mix_with_init_no_prefetch', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '8192', '0.001039', '0.001035', '0.001033', '0.001033 0.001033 0.001035 0.001039 0.001057', '0.000010', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

VectorAddition_int32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/vec_add --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/VectorAddition_multi.csv --size=102400000

Output:

['VectorAddition_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.001472', '0.001447', '0.001429', '0.001429 0.001443 0.001447 0.001507 0.001534', '0.000046', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

VectorAddition_fp32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/vec_add --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/VectorAddition_multi.csv --size=102400000

Output:

['VectorAddition_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.001461', '0.001449', '0.001432', '0.001432 0.001448 0.001449 0.001450 0.001525', '0.000037', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

VectorAddition_int64

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/vec_add --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/VectorAddition_multi.csv --size=102400000

Output:

['VectorAddition_int64', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.003069', '0.003075', '0.003049', '0.003049 0.003063 0.003075 0.003079 0.003080', '0.000013', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Polybench_2DConvolution

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/2DConvolution --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/2DConvolution.csv

Output:

['Polybench_2DConvolution', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000194', '0.000194', '0.000183', '0.000183 0.000183 0.000194 0.000199 0.000209', '0.000011', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Polybench_2mm

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/2mm --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/2mm.csv --size=512

Output:

['Polybench_2mm', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.001221', '0.001223', '0.001208', '0.001208 0.001212 0.001223 0.001228 0.001232', '0.000010', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Polybench_3mm

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/3mm --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/3mm.csv --size=512

Output:

['Polybench_3mm', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.001728', '0.001725', '0.001722', '0.001722 0.001723 0.001725 0.001727 0.001741', '0.000007', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

MicroBench_Arith_fp32_512

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/arith --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Arith_int32_512.csv --size=16384

Output:

['MicroBench_Arith_fp32_512', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '16384', '0.000021', '0.000019', '0.000019', '0.000019 0.000019 0.000019 0.000020 0.000030', '0.000005', '1658.528819', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.031250']

MicroBench_Arith_int32_512

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/arith --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Arith_int32_512.csv --size=16384

Output:

['MicroBench_Arith_int32_512', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '16384', '0.000053', '0.000037', '0.000037', '0.000037 0.000037 0.000037 0.000042 0.000113', '0.000033', '852.939571', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.031250']

Polybench_Atax

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/atax --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Atax.csv --size=8192

Output:

['Polybench_Atax', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '8192', '0.006771', '0.006736', '0.006711', '0.006711 0.006732 0.006736 0.006798 0.006877', '0.000068', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

ReductionAtomic_fp64

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/atomic_reduction --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/ReductionAtomic_fp64.csv

Output:

['ReductionAtomic_fp64', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000050', '0.000020', '0.000019', '0.000019 0.000019 0.000020 0.000022 0.000171', '0.000067', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

ReductionAtomic_int32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/atomic_reduction --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/ReductionAtomic_fp64.csv

Output:

['ReductionAtomic_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000019', '0.000012', '0.000010', '0.000010 0.000010 0.000012 0.000013 0.000049', '0.000017', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

ReductionAtomic_int64

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/atomic_reduction --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/ReductionAtomic_fp64.csv

Output:

['ReductionAtomic_int64', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000046', '0.000010', '0.000009', '0.000009 0.000010 0.000010 0.000012 0.000190', '0.000081', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

ReductionAtomic_fp32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/atomic_reduction --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/ReductionAtomic_fp64.csv

Output:

['ReductionAtomic_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000021', '0.000020', '0.000020', '0.000020 0.000020 0.000020 0.000020 0.000025', '0.000002', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Kmeans_fp32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/kmeans --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Kmeans.csv --size=700000000

Output:

['Kmeans_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '700000000', '0.016169', '0.016170', '0.016159', '0.016159 0.016160 0.016170 0.016173 0.016182', '0.000010', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

LinearRegressionCoeff_fp32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/lin_reg_coeff --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/LinearRegressionCoeff.csv --size=1638400000

Output:

['LinearRegressionCoeff_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '1638400000', '0.966762', '0.966801', '0.966601', '0.966601 0.966659 0.966801 0.966835 0.966914', '0.000129', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

LinearRegression_fp32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/lin_reg_error --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/LinearRegression.csv --size=4096

Output:

['LinearRegression_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '4096', '0.000432', '0.000427', '0.000420', '0.000420 0.000425 0.000427 0.000430 0.000459', '0.000015', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

MolecularDynamics

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/mol_dyn --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/MolecularDynamics.csv --size=8196

Output:

['MolecularDynamics', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '8196', '0.000033', '0.000027', '0.000025', '0.000025 0.000026 0.000027 0.000029 0.000059', '0.000014', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

github-actions[bot] avatar Oct 03 '24 11:10 github-actions[bot]

Compute Benchmarks level_zero run (with params: ): https://github.com/oneapi-src/unified-runtime/actions/runs/11161331244

github-actions[bot] avatar Oct 03 '24 11:10 github-actions[bot]

Compute Benchmarks level_zero run (): https://github.com/oneapi-src/unified-runtime/actions/runs/11161331244 Job status: failure. Test status: failure.

github-actions[bot] avatar Oct 03 '24 11:10 github-actions[bot]

Compute Benchmarks level_zero run (with params: ): https://github.com/oneapi-src/unified-runtime/actions/runs/11161403971

github-actions[bot] avatar Oct 03 '24 11:10 github-actions[bot]

Compute Benchmarks level_zero run (): https://github.com/oneapi-src/unified-runtime/actions/runs/11161403971 Job status: success. Test status: success.

Summary

Total 70 benchmarks in mean. Geomean 100.399%. Improved 12 Regressed 24 (threshold 0.50%)

(result is better)

Performance change in benchmark groups

Relative perf in group api (6): 100.313%
Benchmark This PR baseline Relative perf Change -
api_overhead_benchmark_ur SubmitKernel out of order 14.440000 μs 17.647 μs 122.21% 22.21% ++
api_overhead_benchmark_sycl SubmitKernel in order 24.939000 μs 25.333 μs 101.58% 1.58% .
api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024 2.139000 μs 2.157 μs 100.84% 0.84% .
api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024 1.673 μs 1.663000 μs 99.40% -0.60% .
api_overhead_benchmark_sycl SubmitKernel out of order 26.076 μs 25.430000 μs 97.52% -2.48% .
api_overhead_benchmark_ur SubmitKernel in order 15.752 μs 13.226000 μs 83.96% -16.04% --
Relative perf in group memory (4): 96.421%
Benchmark This PR baseline Relative perf Change -
memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240 3.195000 μs 3.233 μs 101.19% 1.19% .
memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024 5.790 μs 5.745000 μs 99.22% -0.78% .
memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024 119.515 μs 113.628000 μs 95.07% -4.93% .
memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024 250.058 μs 226.426000 μs 90.55% -9.45% -
Relative perf in group miscellaneous (1): 99.497%
Benchmark This PR baseline Relative perf Change -
miscellaneous_benchmark_sycl VectorSum 862.496 μs 858.160000 μs 99.50% -0.50% .
Relative perf in group Velocity-Bench (5): 99.878%
Benchmark This PR baseline Relative perf Change -
Velocity-Bench Bitcracker 35.424300 s 35.563 s 100.39% 0.39% .
Velocity-Bench Hashtable 361.727553 M keys/sec 361.439 M keys/sec 100.08% 0.08% .
Velocity-Bench QuickSilver 118.170 MMS/CTT 118.210000 MMS/CTT 99.97% -0.03% .
Velocity-Bench CudaSift 219.701 ms 218.822000 ms 99.60% -0.40% .
Velocity-Bench Sobel Filter 555.433 ms 551.852000 ms 99.36% -0.64% .
Relative perf in group Runtime (16): 97.040%
Benchmark This PR baseline Relative perf Change -
Runtime_DAGTaskThroughput_NDRangeParallelFor 1778.195000 ms 1794.983 ms 100.94% 0.94% .
Runtime_IndependentDAGTaskThroughput_SingleTask 270.339000 ms 271.280 ms 100.35% 0.35% .
Runtime_IndependentDAGTaskThroughput_NDRangeParallelFor 277.160 ms 273.134000 ms 98.55% -1.45% .
Runtime_IndependentDAGTaskThroughput_HierarchicalParallelFor 280.121 ms 275.897000 ms 98.49% -1.51% .
Runtime_IndependentDAGTaskThroughput_BasicParallelFor 285.659 ms 276.025000 ms 96.63% -3.37% .
Runtime_DAGTaskThroughput_SingleTask 1749.395 ms 1649.814000 ms 94.31% -5.69% -
Runtime_DAGTaskThroughput_HierarchicalParallelFor 1802.311 ms 1696.627000 ms 94.14% -5.86% -
Runtime_DAGTaskThroughput_BasicParallelFor 1826.810 ms 1703.181000 ms 93.23% -6.77% -
Runtime_BlockedTransform_iter_256_blocksize_1024 - 0.076000 ms
Runtime_BlockedTransform_iter_64_blocksize_2048 - 0.072000 ms
Runtime_BlockedTransform_iter_512_blocksize_1024 - 0.173000 ms
Runtime_BlockedTransform_iter_256_blocksize_2048 - 0.061000 ms
Runtime_BlockedTransform_iter_512_blocksize_2048 - 0.169000 ms
Runtime_BlockedTransform_iter_128_blocksize_1024 - 0.088000 ms
Runtime_BlockedTransform_iter_64_blocksize_1024 - 0.241000 ms
Runtime_BlockedTransform_iter_128_blocksize_2048 - 0.062000 ms
Relative perf in group MicroBench (16): 99.592%
Benchmark This PR baseline Relative perf Change -
MicroBench_HostDeviceBandwidth_1D_D2H_Contiguous 5.006000 ms 5.126 ms 102.40% 2.40% .
MicroBench_HostDeviceBandwidth_3D_H2D_Contiguous 4.932000 ms 5.001 ms 101.40% 1.40% .
MicroBench_HostDeviceBandwidth_2D_H2D_Contiguous 4.996000 ms 5.022 ms 100.52% 0.52% .
MicroBench_LocalMem_int32_4096 30.346000 ms 30.379 ms 100.11% 0.11% .
MicroBench_LocalMem_fp32_4096 30.416000 ms 30.433 ms 100.06% 0.06% .
MicroBench_HostDeviceBandwidth_2D_D2H_Contiguous 618.337000 ms 618.361 ms 100.00% 0.00% .
MicroBench_HostDeviceBandwidth_3D_D2H_Contiguous 618.388 ms 618.374000 ms 100.00% -0.00% .
MicroBench_HostDeviceBandwidth_3D_D2H_Strided 617.690 ms 617.648000 ms 99.99% -0.01% .
MicroBench_HostDeviceBandwidth_2D_D2H_Strided 617.668 ms 617.440000 ms 99.96% -0.04% .
MicroBench_HostDeviceBandwidth_1D_H2D_Contiguous 4.917 ms 4.903000 ms 99.72% -0.28% .
MicroBench_HostDeviceBandwidth_2D_H2D_Strided 4.960 ms 4.906000 ms 98.91% -1.09% .
MicroBench_HostDeviceBandwidth_1D_H2D_Strided 4.778 ms 4.693000 ms 98.22% -1.78% .
MicroBench_HostDeviceBandwidth_3D_H2D_Strided 5.011 ms 4.917000 ms 98.12% -1.88% .
MicroBench_HostDeviceBandwidth_1D_D2H_Strided 5.127 ms 4.874000 ms 95.07% -4.93% .
MicroBench_Arith_fp32_512 - 0.019000 ms
MicroBench_Arith_int32_512 - 0.037000 ms
Relative perf in group Pattern (10): 99.742%
Benchmark This PR baseline Relative perf Change -
Pattern_SegmentedReduction_NDRange_int32 5.714000 ms 5.720 ms 100.11% 0.11% .
Pattern_SegmentedReduction_NDRange_int64 6.188000 ms 6.193 ms 100.08% 0.08% .
Pattern_SegmentedReduction_Hierarchical_int16 12.214000 ms 12.216 ms 100.02% 0.02% .
Pattern_SegmentedReduction_Hierarchical_int32 12.055000 ms 12.056 ms 100.01% 0.01% .
Pattern_SegmentedReduction_Hierarchical_int64 12.256000 ms 12.256 ms 100.00% 0.00% .
Pattern_SegmentedReduction_Hierarchical_fp32 12.054 ms 12.051000 ms 99.98% -0.02% .
Pattern_SegmentedReduction_NDRange_int16 6.083 ms 6.078000 ms 99.92% -0.08% .
Pattern_SegmentedReduction_NDRange_fp32 5.720 ms 5.715000 ms 99.91% -0.09% .
Pattern_Reduction_NDRange_int32 16.443 ms 16.333000 ms 99.33% -0.67% .
Pattern_Reduction_Hierarchical_int32 16.519 ms 16.204000 ms 98.09% -1.91% .
Relative perf in group ScalarProduct (6): 100.022%
Benchmark This PR baseline Relative perf Change -
ScalarProduct_NDRange_fp32 6.322000 ms 6.352 ms 100.47% 0.47% .
ScalarProduct_Hierarchical_fp32 10.239000 ms 10.263 ms 100.23% 0.23% .
ScalarProduct_NDRange_int64 8.222000 ms 8.233 ms 100.13% 0.13% .
ScalarProduct_Hierarchical_int32 10.605 ms 10.595000 ms 99.91% -0.09% .
ScalarProduct_NDRange_int32 6.340 ms 6.330000 ms 99.84% -0.16% .
ScalarProduct_Hierarchical_int64 11.610 ms 11.557000 ms 99.54% -0.46% .
Relative perf in group USM (7): 110.796%
Benchmark This PR baseline Relative perf Change -
USM_Allocation_latency_fp32_device 0.071000 ms 0.145 ms 204.23% 104.23% ++++++++++
USM_Allocation_latency_fp32_shared 0.134000 ms 0.137 ms 102.24% 2.24% .
USM_Instr_Mix_fp32_device_1:1mix_with_init_no_prefetch 1.650 ms 1.649000 ms 99.94% -0.06% .
USM_Instr_Mix_fp32_host_1:1mix_no_init_no_prefetch 1.193 ms 1.192000 ms 99.92% -0.08% .
USM_Allocation_latency_fp32_host 37.444 ms 37.346000 ms 99.74% -0.26% .
USM_Instr_Mix_fp32_host_1:1mix_with_init_no_prefetch 1.042 ms 1.035000 ms 99.33% -0.67% .
USM_Instr_Mix_fp32_device_1:1mix_no_init_no_prefetch 1.815 ms 1.801000 ms 99.23% -0.77% .
Relative perf in group VectorAddition (3): 99.926%
Benchmark This PR baseline Relative perf Change -
VectorAddition_int64 3.067000 ms 3.075 ms 100.26% 0.26% .
VectorAddition_fp32 1.447000 ms 1.449 ms 100.14% 0.14% .
VectorAddition_int32 1.456 ms 1.447000 ms 99.38% -0.62% .
Relative perf in group Polybench (4): 99.520%
Benchmark This PR baseline Relative perf Change -
Polybench_2mm 1.215000 ms 1.223 ms 100.66% 0.66% .
Polybench_3mm 1.729 ms 1.725000 ms 99.77% -0.23% .
Polybench_Atax 6.863 ms 6.736000 ms 98.15% -1.85% .
Polybench_2DConvolution - 0.194000 ms
Relative perf in group Kmeans (1): 100.000%
Benchmark This PR baseline Relative perf Change -
Kmeans_fp32 16.170000 ms 16.170 ms 100.00% 0.00% .
Relative perf in group LinearRegressionCoeff (1): 100.006%
Benchmark This PR baseline Relative perf Change -
LinearRegressionCoeff_fp32 966.740000 ms 966.801 ms 100.01% 0.01% .
Relative perf in group MolecularDynamics (1): 103.846%
Benchmark This PR baseline Relative perf Change -
MolecularDynamics 0.026000 ms 0.027 ms 103.85% 3.85% .
Relative perf in group ReductionAtomic (4): cannot calculate
Benchmark This PR baseline Relative perf Change -
ReductionAtomic_fp64 - 0.020000 ms
ReductionAtomic_int32 - 0.012000 ms
ReductionAtomic_int64 - 0.010000 ms
ReductionAtomic_fp32 - 0.020000 ms
Relative perf in group LinearRegression (1): cannot calculate
Benchmark This PR baseline Relative perf Change -
LinearRegression_fp32 - 0.427000 ms

Details

Benchmark details - environment, command, output...
api_overhead_benchmark_sycl SubmitKernel out of order

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type SubmitKernel(api=sycl Profiling=0 Ioq=0 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),26.076,26.058,3.34%,24.392,276.210,[CPU],[us]

api_overhead_benchmark_sycl SubmitKernel in order

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type SubmitKernel(api=sycl Profiling=0 Ioq=1 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),24.939,25.113,4.41%,21.793,274.068,[CPU],[us]

api_overhead_benchmark_ur SubmitKernel out of order

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type SubmitKernel(api=ur Profiling=0 Ioq=0 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),14.440,14.347,3.36%,13.726,32.785,[CPU],[us]

api_overhead_benchmark_ur SubmitKernel in order

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type SubmitKernel(api=ur Profiling=0 Ioq=1 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),15.752,15.817,7.30%,12.526,243.328,[CPU],[us]

memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueInOrderMemcpy --csv --noHeaders --iterations=10000 --IsCopyOnly=0 --sourcePlacement=Device --destinationPlacement=Device --size=1024 --count=100

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type QueueInOrderMemcpy(api=sycl IsCopyOnly=0 sourcePlacement=Device destinationPlacement=Device size=1KB count=100),250.058,252.295,3.03%,220.259,508.610,[CPU],[us]

memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueInOrderMemcpy --csv --noHeaders --iterations=10000 --IsCopyOnly=0 --sourcePlacement=Host --destinationPlacement=Device --size=1024 --count=100

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type QueueInOrderMemcpy(api=sycl IsCopyOnly=0 sourcePlacement=Host destinationPlacement=Device size=1KB count=100),119.515,112.294,25.82%,109.653,302.517,[CPU],[us]

memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueMemcpy --csv --noHeaders --iterations=10000 --sourcePlacement=Device --destinationPlacement=Device --size=1024

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type QueueMemcpy(api=sycl sourcePlacement=Device destinationPlacement=Device size=1KB),5.790,5.483,12.24%,5.061,39.117,[CPU],[us]

memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=StreamMemory --csv --noHeaders --iterations=10000 --type=Triad --size=10240 --memoryPlacement=Device --useEvents=0 --contents=Zeros

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type StreamMemory(api=sycl type=Triad size=10KB useEvents=0 contents=Zeros memoryPlacement=Device),3.195,3.203,2.88%,0.374,3.427,[CPU],[GB/s]

api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=ExecImmediateCopyQueue --csv --noHeaders --iterations=100000 --ioq=0 --IsCopyOnly=1 --MeasureCompletionTime=0 --src=Device --dst=Device --size=1024

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type ExecImmediateCopyQueue(api=sycl IsCopyOnly=1 MeasureCompletionTime=0 src=Device dst=Device size=1KB ioq=0),2.139,2.136,4.07%,1.942,9.274,[CPU],[us]

api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=ExecImmediateCopyQueue --csv --noHeaders --iterations=100000 --ioq=1 --IsCopyOnly=1 --MeasureCompletionTime=0 --src=Host --dst=Host --size=1024

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type ExecImmediateCopyQueue(api=sycl IsCopyOnly=1 MeasureCompletionTime=0 src=Host dst=Host size=1KB ioq=1),1.673,1.667,5.87%,1.584,26.452,[CPU],[us]

miscellaneous_benchmark_sycl VectorSum

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/miscellaneous_benchmark_sycl --test=VectorSum --csv --noHeaders --iterations=1000 --numberOfElementsX=512 --numberOfElementsY=256 --numberOfElementsZ=256

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type VectorSum(api=sycl numberOfElementsX=512 numberOfElementsY=256 numberOfElementsZ=256),862.496,863.026,0.48%,814.955,873.207,[GPU],bw [GB/s]

Velocity-Bench Hashtable

Environment Variables:

Command:

/home/test-user/bench_workdir/hashtable/hashtable_sycl --no-verify

Output:

hashtable - total time for whole calculation: 0.371046 s 361.727553 million keys/second

Velocity-Bench Bitcracker

Environment Variables:

Command:

/home/test-user/bench_workdir/bitcracker/bitcracker -f /home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/img_win8_user_hash.txt -d /home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/user_passwords_60000.txt -b 60000

Output:

---------> BitCracker: BitLocker password cracking tool <---------

================================== Retrieving Info

Reading hash file "/home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/img_win8_user_hash.txt"

              Attack

================================================ Type of attack: User Password Psw per thread: 1 max_num_pswd_per_read: 60000 Dictionary: /home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/user_passwords_60000.txt MAC Comparison (-m): Yes

Iter: 1, num passwords read: 60000 Kernel execution: Effective passwords: 60000 Passwords Range: npknpByH7N2m3OnLNH1X9DJxLrzIFWk ..... dL_7uuf3QCz-c6K3xDu0

================================================ Bitcracker attack completed Total passwords evaluated: 60000 Password not found!

time to subtract from total: 0.00432567 s bitcracker - total time for whole calculation: 35.4243 s

Velocity-Bench CudaSift

Environment Variables:

Command:

/home/test-user/bench_workdir/cudaSift/cudaSift

Output:

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1225 1260 33.2609% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1229 1263 33.3695% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1216 1250 33.0166% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1244 1279 33.7768% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1227 1261 33.3152% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1239 1274 33.6411% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1226 1257 33.2881% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1242 1275 33.7225% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1095 1252 29.7312% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1228 1264 33.3424% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1103 1266 29.9484% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1222 1255 33.1795% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1090 1255 29.5954% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1220 1262 33.1252% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1175 1271 31.9033% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1224 1260 33.2338% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1208 1261 32.7993% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1120 1258 30.41% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1236 1267 33.5596% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1170 1265 31.7676% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1088 1268 29.5411% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1216 1252 33.0166% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1097 1241 29.7855% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1231 1267 33.4238% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1214 1247 32.9623% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1205 1271 32.7179% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1111 1268 30.1656% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1225 1261 33.2609% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1226 1273 33.2881% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1230 1263 33.3967% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1230 1262 33.3967% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1109 1267 30.1113% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1229 1264 33.3695% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1122 1258 30.4643% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1176 1258 31.9305% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1235 1272 33.5324% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1224 1263 33.2338% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1085 1269 29.4597% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1218 1260 33.0709% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1229 1270 33.3695% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1068 1259 28.9981% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1232 1268 33.451% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1218 1263 33.0709% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1187 1250 32.2292% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1105 1263 30.0027% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1232 1268 33.451% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1110 1265 30.1385% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1087 1252 29.514% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1197 1272 32.5007% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1225 1261 33.2609% 1 2

Performing data verification Data verification is SUCCESSFUL.

Avg workload time = 219.701 ms

Velocity-Bench QuickSilver

Environment Variables:

QS_DEVICE=GPU

Command:

/home/test-user/bench_workdir/QuickSilver/qs -i /home/test-user/bench_workdir/velocity-bench-repo/QuickSilver/Examples/AllScattering/scatteringOnly.inp

Output:

Copyright (c) 2016 Lawrence Livermore National Security, LLC All Rights Reserved Quicksilver Version : Quicksilver Git Hash : MPI Version : 3.0 Number of MPI ranks : 1 Number of OpenMP Threads: 1 Number of OpenMP CPUs : 1

Loading params Finished loading params Simulation: dt: 1e-08 fMax: 0.1 inputFile: /home/test-user/bench_workdir/velocity-bench-repo/QuickSilver/Examples/AllScattering/scatteringOnly.inp energySpectrum: boundaryCondition: octant loadBalance: 1 cycleTimers: 0 debugThreads: 0 lx: 100 ly: 100 lz: 100 nParticles: 10000000 batchSize: 0 nBatches: 10 nSteps: 10 nx: 10 ny: 10 nz: 10 seed: 1029384756 xDom: 0 yDom: 0 zDom: 0 eMax: 20 eMin: 1e-09 nGroups: 230 lowWeightCutoff: 0.001 bTally: 1 fTally: 1 cTally: 1 coralBenchmark: 0 crossSectionsOut:

Geometry: material: sourceMaterial shape: brick xMax: 100 xMin: 0 yMax: 100 yMin: 0 zMax: 100 zMin: 0

Material: name: sourceMaterial mass: 1000 nIsotopes: 10 nReactions: 9 sourceRate: 1e+10 totalCrossSection: 0.1 absorptionCrossSection: flat fissionCrossSection: flat scatteringCrossSection: flat absorptionCrossSectionRatio: 0 fissionCrossSectionRatio: 0 scatteringCrossSectionRatio: 1

CrossSection: name: flat A: 0 B: 0 C: 0 D: 0 E: 1 nuBar: 2.4 setting GPU setting parameters Building partition 0 Building partition 1 Building partition 2 Building partition 3 Building MC_Domain 0 Building MC_Domain 1 Building MC_Domain 2 Building MC_Domain 3 Starting Consistency Check Finished Consistency Check Finished initMesh Started copyMaterialDatabase_device Finished copyMaterialDatabase_device Finished copyNuclearData_device Finished copyDomainDevice cycle start source rr split absorb scatter fission produce collisn escape census num_seg scalar_flux cycleInit cycleTracking cycleFinalize 0 0 1000000 0 9000000 0 18533189 0 0 18533189 1151780 8848220 55527935 1.854923e+09 4.327580e-01 6.169830e-01 0.000000e+00 1 8848220 1000000 0 151478 0 34281997 0 0 34281997 1664159 8335539 94633679 5.047651e+09 3.678780e-01 7.578090e-01 0.000000e+00 2 8335539 1000000 0 663717 0 34354432 0 0 34354432 1366771 8632485 95010375 7.705930e+09 3.640840e-01 7.712350e-01 0.000000e+00 3 8632485 1000000 0 367978 0 34302727 0 0 34302727 1242216 8758247 94953591 9.992076e+09 3.956240e-01 8.197900e-01 0.000000e+00 4 8758247 1000000 0 242076 0 34141236 0 0 34141236 1168452 8831871 94599337 1.199834e+10 3.341700e-01 7.922870e-01 0.000000e+00 5 8831871 1000000 0 168070 0 33948724 0 0 33948724 1121156 8878785 94148236 1.377636e+10 3.332440e-01 7.666780e-01 0.000000e+00 6 8878785 1000000 0 120572 0 33760567 0 0 33760567 1089103 8910254 93689264 1.535668e+10 3.321390e-01 7.659400e-01 0.000000e+00 7 8910254 1000000 0 89810 0 33552179 0 0 33552179 1065203 8934861 93216931 1.676993e+10 3.326710e-01 7.865830e-01 0.000000e+00 8 8934861 1000000 0 65491 0 33384605 0 0 33384605 1047720 8952632 92768273 1.804559e+10 3.321060e-01 7.858210e-01 0.000000e+00 9 8952632 1000000 0 47165 0 33198494 0 0 33198494 1033968 8965829 92324678 1.920208e+10 3.324400e-01 7.605160e-01 0.000000e+00

Timer Cumulative Cumulative Cumulative Cumulative Cumulative Cumulative Name number microSecs microSecs microSecs microSecs Efficiency of calls min avg max stddev Rating main 1 1.118e+07 1.118e+07 1.118e+07 0.000e+00 100.00 cycleInit 10 3.557e+06 3.557e+06 3.557e+06 0.000e+00 100.00 cycleTracking 10 7.624e+06 7.624e+06 7.624e+06 0.000e+00 100.00 cycleTracking_Kernel 104 4.940e+06 4.940e+06 4.940e+06 0.000e+00 100.00 cycleTracking_MPI 117 2.192e+05 2.192e+05 2.192e+05 0.000e+00 100.00 cycleTracking_Test_Done 0 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.00 cycleFinalize 20 4.080e+02 4.080e+02 4.080e+02 0.000e+00 100.00 Figure Of Merit 118.17 [Num Mega Segments / Cycle Tracking Time]

Velocity-Bench Sobel Filter

Environment Variables:

OPENCV_IO_MAX_IMAGE_PIXELS=1677721600

Command:

/home/test-user/bench_workdir/sobel_filter/sobel_filter -i /home/test-user/bench_workdir/data/sobel_filter/sobel_filter_data/silverfalls_32Kx32K.png -n 5

Output:

SYMN: Welcome to the SYCL version of Sobel filter workload. SYMN: Input image file: /home/test-user/bench_workdir/data/sobel_filter/sobel_filter_data/silverfalls_32Kx32K.png SYMN: Launching SYCL kernel with # of iterations: 5 time to subtract from total: 7.4896 s sobelfilter - total time for whole calculation: 0.555433 s

Runtime_IndependentDAGTaskThroughput_NDRangeParallelFor

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_independent --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/IndependentDAGTaskThroughput_multi.csv --size=32768

Output:

['Runtime_IndependentDAGTaskThroughput_NDRangeParallelFor', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '32768', '0.279397', '0.277160', '0.275892', '0.275892 0.276733 0.277160 0.280282 0.286919', '0.004521', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_IndependentDAGTaskThroughput_SingleTask

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_independent --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/IndependentDAGTaskThroughput_multi.csv --size=32768

Output:

['Runtime_IndependentDAGTaskThroughput_SingleTask', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '32768', '0.272788', '0.270339', '0.262477', '0.262477 0.268819 0.270339 0.275869 0.286433', '0.008996', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_IndependentDAGTaskThroughput_BasicParallelFor

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_independent --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/IndependentDAGTaskThroughput_multi.csv --size=32768

Output:

['Runtime_IndependentDAGTaskThroughput_BasicParallelFor', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '32768', '0.293907', '0.285659', '0.280444', '0.280444 0.281217 0.285659 0.305152 0.317063', '0.016378', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_IndependentDAGTaskThroughput_HierarchicalParallelFor

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_independent --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/IndependentDAGTaskThroughput_multi.csv --size=32768

Output:

['Runtime_IndependentDAGTaskThroughput_HierarchicalParallelFor', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '32768', '0.280812', '0.280121', '0.276607', '0.276607 0.279503 0.280121 0.280673 0.287157', '0.003878', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_DAGTaskThroughput_BasicParallelFor

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_sequential --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/DAGTaskThroughput_multi.csv --size=327680

Output:

['Runtime_DAGTaskThroughput_BasicParallelFor', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '327680', '1.827242', '1.826810', '1.826309', '1.826309 1.826335 1.826810 1.827326 1.829432', '0.001293', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_DAGTaskThroughput_SingleTask

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_sequential --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/DAGTaskThroughput_multi.csv --size=327680

Output:

['Runtime_DAGTaskThroughput_SingleTask', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '327680', '1.745794', '1.749395', '1.731069', '1.731069 1.739506 1.749395 1.753183 1.755815', '0.010300', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_DAGTaskThroughput_HierarchicalParallelFor

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_sequential --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/DAGTaskThroughput_multi.csv --size=327680

Output:

['Runtime_DAGTaskThroughput_HierarchicalParallelFor', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '327680', '1.803061', '1.802311', '1.800942', '1.800942 1.801165 1.802311 1.802992 1.807894', '0.002829', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_DAGTaskThroughput_NDRangeParallelFor

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_sequential --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/DAGTaskThroughput_multi.csv --size=327680

Output:

['Runtime_DAGTaskThroughput_NDRangeParallelFor', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '327680', '1.778789', '1.778195', '1.777463', '1.777463 1.777877 1.778195 1.779784 1.780624', '0.001351', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

MicroBench_HostDeviceBandwidth_3D_D2H_Strided

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

Output:

['MicroBench_HostDeviceBandwidth_3D_D2H_Strided', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.617635', '0.617690', '0.617454', '0.617454 0.617636 0.617690 0.617691 0.617705', '0.000104', '0.202444', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']

MicroBench_HostDeviceBandwidth_2D_D2H_Strided

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

Output:

['MicroBench_HostDeviceBandwidth_2D_D2H_Strided', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.617669', '0.617668', '0.617628', '0.617628 0.617667 0.617668 0.617677 0.617703', '0.000027', '0.202387', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']

MicroBench_HostDeviceBandwidth_3D_D2H_Contiguous

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

Output:

['MicroBench_HostDeviceBandwidth_3D_D2H_Contiguous', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.618356', '0.618388', '0.618162', '0.618162 0.618369 0.618388 0.618395 0.618465', '0.000114', '0.202212', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']

MicroBench_HostDeviceBandwidth_3D_H2D_Strided

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

Output:

['MicroBench_HostDeviceBandwidth_3D_H2D_Strided', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.005022', '0.005011', '0.004993', '0.004993 0.005006 0.005011 0.005035 0.005063', '0.000028', '25.037185', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']

MicroBench_HostDeviceBandwidth_1D_D2H_Contiguous

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

Output:

['MicroBench_HostDeviceBandwidth_1D_D2H_Contiguous', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.005027', '0.005006', '0.004972', '0.004972 0.004974 0.005006 0.005061 0.005122', '0.000064', '25.140733', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']

MicroBench_HostDeviceBandwidth_2D_D2H_Contiguous

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

Output:

['MicroBench_HostDeviceBandwidth_2D_D2H_Contiguous', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.618361', '0.618337', '0.618297', '0.618297 0.618333 0.618337 0.618413 0.618424', '0.000055', '0.202168', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']

MicroBench_HostDeviceBandwidth_2D_H2D_Strided

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

Output:

['MicroBench_HostDeviceBandwidth_2D_H2D_Strided', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.004976', '0.004960', '0.004918', '0.004918 0.004951 0.004960 0.004984 0.005067', '0.000056', '25.416025', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']

MicroBench_HostDeviceBandwidth_2D_H2D_Contiguous

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

Output:

['MicroBench_HostDeviceBandwidth_2D_H2D_Contiguous', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.005037', '0.004996', '0.004926', '0.004926 0.004951 0.004996 0.005101 0.005212', '0.000118', '25.375316', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']

MicroBench_HostDeviceBandwidth_1D_D2H_Strided

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

Output:

['MicroBench_HostDeviceBandwidth_1D_D2H_Strided', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.005131', '0.005127', '0.005077', '0.005077 0.005119 0.005127 0.005137 0.005194', '0.000042', '24.621639', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']

MicroBench_HostDeviceBandwidth_1D_H2D_Contiguous

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

Output:

['MicroBench_HostDeviceBandwidth_1D_H2D_Contiguous', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.005287', '0.004917', '0.004864', '0.004864 0.004883 0.004917 0.004943 0.006828', '0.000862', '25.696525', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']

MicroBench_HostDeviceBandwidth_3D_H2D_Contiguous

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

Output:

['MicroBench_HostDeviceBandwidth_3D_H2D_Contiguous', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.004918', '0.004932', '0.004837', '0.004837 0.004924 0.004932 0.004946 0.004951', '0.000047', '25.841455', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']

MicroBench_HostDeviceBandwidth_1D_H2D_Strided

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

Output:

['MicroBench_HostDeviceBandwidth_1D_H2D_Strided', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.004800', '0.004778', '0.004734', '0.004734 0.004773 0.004778 0.004793 0.004924', '0.000073', '26.404330', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']

MicroBench_LocalMem_fp32_4096

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/local_mem --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/LocalMem_multi.csv --size=10240000

Output:

['MicroBench_LocalMem_fp32_4096', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '10240000', '0.030403', '0.030416', '0.030339', '0.030339 0.030404 0.030416 0.030425 0.030433', '0.000038', '10283.962283', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '312.000000']

MicroBench_LocalMem_int32_4096

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/local_mem --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/LocalMem_multi.csv --size=10240000

Output:

['MicroBench_LocalMem_int32_4096', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '10240000', '0.030375', '0.030346', '0.030327', '0.030327 0.030340 0.030346 0.030400 0.030461', '0.000056', '10288.023099', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '312.000000']

Pattern_Reduction_NDRange_int32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/reduction --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Pattern_Reduction_multi.csv --size=10240000

Output:

['Pattern_Reduction_NDRange_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '10240000', '0.016395', '0.016443', '0.016182', '0.016182 0.016413 0.016443 0.016466 0.016470', '0.000121', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_Reduction_Hierarchical_int32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/reduction --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Pattern_Reduction_multi.csv --size=10240000

Output:

['Pattern_Reduction_Hierarchical_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '10240000', '0.016538', '0.016519', '0.016490', '0.016490 0.016495 0.016519 0.016540 0.016648', '0.000065', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

ScalarProduct_NDRange_int64

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/ScalarProduct_multi.csv --size=102400000

Output:

['ScalarProduct_NDRange_int64', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.008246', '0.008222', '0.008210', '0.008210 0.008215 0.008222 0.008237 0.008344', '0.000056', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

ScalarProduct_Hierarchical_fp32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/ScalarProduct_multi.csv --size=102400000

Output:

['ScalarProduct_Hierarchical_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.010237', '0.010239', '0.010205', '0.010205 0.010234 0.010239 0.010243 0.010263', '0.000021', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

ScalarProduct_NDRange_int32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/ScalarProduct_multi.csv --size=102400000

Output:

['ScalarProduct_NDRange_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.006349', '0.006340', '0.006326', '0.006326 0.006332 0.006340 0.006362 0.006386', '0.000025', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

ScalarProduct_Hierarchical_int64

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/ScalarProduct_multi.csv --size=102400000

Output:

['ScalarProduct_Hierarchical_int64', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.011613', '0.011610', '0.011576', '0.011576 0.011590 0.011610 0.011618 0.011669', '0.000035', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

ScalarProduct_Hierarchical_int32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/ScalarProduct_multi.csv --size=102400000

Output:

['ScalarProduct_Hierarchical_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.010601', '0.010605', '0.010561', '0.010561 0.010592 0.010605 0.010611 0.010636', '0.000027', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

ScalarProduct_NDRange_fp32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/ScalarProduct_multi.csv --size=102400000

Output:

['ScalarProduct_NDRange_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.006316', '0.006322', '0.006298', '0.006298 0.006299 0.006322 0.006324 0.006334', '0.000016', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_SegmentedReduction_Hierarchical_int16

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000

Output:

['Pattern_SegmentedReduction_Hierarchical_int16', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.012218', '0.012214', '0.012211', '0.012211 0.012211 0.012214 0.012220 0.012234', '0.000010', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_SegmentedReduction_NDRange_fp32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000

Output:

['Pattern_SegmentedReduction_NDRange_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.005716', '0.005720', '0.005699', '0.005699 0.005706 0.005720 0.005720 0.005738', '0.000015', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_SegmentedReduction_NDRange_int64

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000

Output:

['Pattern_SegmentedReduction_NDRange_int64', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.006190', '0.006188', '0.006183', '0.006183 0.006188 0.006188 0.006192 0.006201', '0.000007', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_SegmentedReduction_Hierarchical_int64

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000

Output:

['Pattern_SegmentedReduction_Hierarchical_int64', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.012260', '0.012256', '0.012234', '0.012234 0.012236 0.012256 0.012274 0.012301', '0.000028', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_SegmentedReduction_Hierarchical_fp32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000

Output:

['Pattern_SegmentedReduction_Hierarchical_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.012063', '0.012054', '0.012045', '0.012045 0.012050 0.012054 0.012061 0.012106', '0.000024', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_SegmentedReduction_Hierarchical_int32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000

Output:

['Pattern_SegmentedReduction_Hierarchical_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.012057', '0.012055', '0.012035', '0.012035 0.012049 0.012055 0.012071 0.012076', '0.000017', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_SegmentedReduction_NDRange_int32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000

Output:

['Pattern_SegmentedReduction_NDRange_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.005716', '0.005714', '0.005709', '0.005709 0.005712 0.005714 0.005717 0.005727', '0.000007', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_SegmentedReduction_NDRange_int16

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000

Output:

['Pattern_SegmentedReduction_NDRange_int16', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.006085', '0.006083', '0.006079', '0.006079 0.006081 0.006083 0.006083 0.006098', '0.000008', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Allocation_latency_fp32_shared

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_allocation_latency --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/USM_Allocation_latency_multi.csv --size=1024000000

Output:

['USM_Allocation_latency_fp32_shared', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '1024000000', '0.000122', '0.000134', '0.000073', '0.000073 0.000125 0.000134 0.000137 0.000141', '0.000028', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Allocation_latency_fp32_device

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_allocation_latency --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/USM_Allocation_latency_multi.csv --size=1024000000

Output:

['USM_Allocation_latency_fp32_device', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '1024000000', '0.000201', '0.000071', '0.000047', '0.000047 0.000062 0.000071 0.000404 0.000420', '0.000193', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Allocation_latency_fp32_host

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_allocation_latency --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/USM_Allocation_latency_multi.csv --size=1024000000

Output:

['USM_Allocation_latency_fp32_host', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '1024000000', '0.037429', '0.037444', '0.037221', '0.037221 0.037369 0.037444 0.037519 0.037593', '0.000143', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Instr_Mix_fp32_host_1:1mix_no_init_no_prefetch

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/USM_Instr_Mix_multi.csv --size=8192

Output:

['USM_Instr_Mix_fp32_host_1:1mix_no_init_no_prefetch', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '8192', '0.001195', '0.001193', '0.001188', '0.001188 0.001191 0.001193 0.001198 0.001205', '0.000007', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Instr_Mix_fp32_device_1:1mix_no_init_no_prefetch

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/USM_Instr_Mix_multi.csv --size=8192

Output:

['USM_Instr_Mix_fp32_device_1:1mix_no_init_no_prefetch', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '8192', '0.001829', '0.001815', '0.001805', '0.001805 0.001809 0.001815 0.001827 0.001889', '0.000035', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Instr_Mix_fp32_device_1:1mix_with_init_no_prefetch

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/USM_Instr_Mix_multi.csv --size=8192

Output:

['USM_Instr_Mix_fp32_device_1:1mix_with_init_no_prefetch', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '8192', '0.002297', '0.001650', '0.001647', '0.001647 0.001647 0.001650 0.001664 0.004876', '0.001442', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Instr_Mix_fp32_host_1:1mix_with_init_no_prefetch

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/USM_Instr_Mix_multi.csv --size=8192

Output:

['USM_Instr_Mix_fp32_host_1:1mix_with_init_no_prefetch', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '8192', '0.001045', '0.001042', '0.001033', '0.001033 0.001035 0.001042 0.001048 0.001065', '0.000013', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

VectorAddition_int64

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/vec_add --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/VectorAddition_multi.csv --size=102400000

Output:

['VectorAddition_int64', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.003071', '0.003067', '0.003058', '0.003058 0.003058 0.003067 0.003084 0.003086', '0.000014', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

VectorAddition_int32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/vec_add --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/VectorAddition_multi.csv --size=102400000

Output:

['VectorAddition_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.001464', '0.001456', '0.001447', '0.001447 0.001452 0.001456 0.001470 0.001495', '0.000020', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

VectorAddition_fp32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/vec_add --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/VectorAddition_multi.csv --size=102400000

Output:

['VectorAddition_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.001468', '0.001447', '0.001446', '0.001446 0.001446 0.001447 0.001459 0.001544', '0.000043', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Polybench_2mm

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/2mm --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/2mm.csv --size=512

Output:

['Polybench_2mm', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.001217', '0.001215', '0.001200', '0.001200 0.001214 0.001215 0.001223 0.001232', '0.000012', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Polybench_3mm

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/3mm --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/3mm.csv --size=512

Output:

['Polybench_3mm', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.001732', '0.001729', '0.001717', '0.001717 0.001729 0.001729 0.001731 0.001758', '0.000015', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Polybench_Atax

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/atax --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Atax.csv --size=8192

Output:

['Polybench_Atax', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '8192', '0.006808', '0.006863', '0.006701', '0.006701 0.006715 0.006863 0.006871 0.006888', '0.000091', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Kmeans_fp32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/kmeans --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Kmeans.csv --size=700000000

Output:

['Kmeans_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '700000000', '0.016170', '0.016170', '0.016164', '0.016164 0.016165 0.016170 0.016170 0.016182', '0.000007', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

LinearRegressionCoeff_fp32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/lin_reg_coeff --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/LinearRegressionCoeff.csv --size=1638400000

Output:

['LinearRegressionCoeff_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '1638400000', '0.966747', '0.966740', '0.966649', '0.966649 0.966665 0.966740 0.966798 0.966883', '0.000097', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

MolecularDynamics

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/mol_dyn --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/MolecularDynamics.csv --size=8196

Output:

['MolecularDynamics', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '8196', '0.000033', '0.000026', '0.000025', '0.000025 0.000025 0.000026 0.000029 0.000059', '0.000015', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

github-actions[bot] avatar Oct 03 '24 12:10 github-actions[bot]

Compute Benchmarks level_zero run (with params: ): https://github.com/oneapi-src/unified-runtime/actions/runs/11214454134

github-actions[bot] avatar Oct 07 '24 11:10 github-actions[bot]

Compute Benchmarks level_zero run (): https://github.com/oneapi-src/unified-runtime/actions/runs/11214454134 Job status: failure. Test status: failure.

github-actions[bot] avatar Oct 07 '24 11:10 github-actions[bot]

Compute Benchmarks level_zero run (with params: ): https://github.com/oneapi-src/unified-runtime/actions/runs/11216034016

github-actions[bot] avatar Oct 07 '24 12:10 github-actions[bot]

Compute Benchmarks level_zero run (): https://github.com/oneapi-src/unified-runtime/actions/runs/11216034016 Job status: failure. Test status: failure.

github-actions[bot] avatar Oct 07 '24 13:10 github-actions[bot]

Compute Benchmarks level_zero run (with params: ): https://github.com/oneapi-src/unified-runtime/actions/runs/11216232257

github-actions[bot] avatar Oct 07 '24 13:10 github-actions[bot]

Compute Benchmarks level_zero run (): https://github.com/oneapi-src/unified-runtime/actions/runs/11216232257 Job status: failure. Test status: failure.

github-actions[bot] avatar Oct 07 '24 13:10 github-actions[bot]

Compute Benchmarks level_zero run (with params: ): https://github.com/oneapi-src/unified-runtime/actions/runs/11216678935

github-actions[bot] avatar Oct 07 '24 13:10 github-actions[bot]

Compute Benchmarks level_zero run (): https://github.com/oneapi-src/unified-runtime/actions/runs/11216678935 Job status: failure. Test status: failure.

github-actions[bot] avatar Oct 07 '24 13:10 github-actions[bot]