Benchmark updates for faster run and more reliable results
Compute Benchmarks level_zero run (with params: ): https://github.com/oneapi-src/unified-runtime/actions/runs/11144890890
Compute Benchmarks level_zero run (): https://github.com/oneapi-src/unified-runtime/actions/runs/11144890890 Job status: failure. Test status: skipped.
Compute Benchmarks level_zero run (with params: ): https://github.com/oneapi-src/unified-runtime/actions/runs/11144890890
Compute Benchmarks level_zero run (): https://github.com/oneapi-src/unified-runtime/actions/runs/11144890890 Job status: failure. Test status: skipped.
Compute Benchmarks level_zero run (with params: --save baseline): https://github.com/oneapi-src/unified-runtime/actions/runs/11145360258
Compute Benchmarks level_zero run (--save baseline): https://github.com/oneapi-src/unified-runtime/actions/runs/11145360258 Job status: failure. Test status: skipped.
Compute Benchmarks level_zero run (with params: --save baseline): https://github.com/oneapi-src/unified-runtime/actions/runs/11146412593
Compute Benchmarks level_zero run (--save baseline): https://github.com/oneapi-src/unified-runtime/actions/runs/11146412593 Job status: failure. Test status: failure.
Compute Benchmarks level_zero run (with params: ): https://github.com/oneapi-src/unified-runtime/actions/runs/11146871014
Compute Benchmarks level_zero run (): https://github.com/oneapi-src/unified-runtime/actions/runs/11146871014 Job status: cancelled. Test status: cancelled.
Compute Benchmarks level_zero run (with params: ): https://github.com/oneapi-src/unified-runtime/actions/runs/11147212802
Compute Benchmarks level_zero run (): https://github.com/oneapi-src/unified-runtime/actions/runs/11147212802 Job status: failure. Test status: failure.
Compute Benchmarks level_zero run (with params: --save baseline): https://github.com/oneapi-src/unified-runtime/actions/runs/11158482367
Compute Benchmarks level_zero run (--save baseline): https://github.com/oneapi-src/unified-runtime/actions/runs/11158482367 Job status: failure. Test status: failure.
Compute Benchmarks level_zero run (with params: ): https://github.com/oneapi-src/unified-runtime/actions/runs/11160336350
Compute Benchmarks level_zero run (): https://github.com/oneapi-src/unified-runtime/actions/runs/11160336350 Job status: success. Test status: success.
Summary
No diffs to calculate performance change
(result is better)
Performance change in benchmark groups
Relative perf in group Runtime (8): cannot calculate
| Benchmark | This PR | Relative perf | Change | - |
|---|---|---|---|---|
| Runtime_BlockedTransform_iter_512_blocksize_2048 | 0.072000 ms | |||
| Runtime_BlockedTransform_iter_256_blocksize_2048 | 0.071000 ms | |||
| Runtime_BlockedTransform_iter_256_blocksize_1024 | 0.081000 ms | |||
| Runtime_BlockedTransform_iter_128_blocksize_2048 | 0.166000 ms | |||
| Runtime_BlockedTransform_iter_512_blocksize_1024 | 0.174000 ms | |||
| Runtime_BlockedTransform_iter_128_blocksize_1024 | 0.174000 ms | |||
| Runtime_BlockedTransform_iter_64_blocksize_1024 | 0.079000 ms | |||
| Runtime_BlockedTransform_iter_64_blocksize_2048 | 0.168000 ms |
Details
Benchmark details - environment, command, output...
Runtime_BlockedTransform_iter_512_blocksize_2048
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=2049 --local=1024
Output:
['Runtime_BlockedTransform_iter_512_blocksize_2048', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '1024', '2049', '0.000109', '0.000072', '0.000065', '0.000065 0.000066 0.000072 0.000169 0.000171', '0.000056', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']
Runtime_BlockedTransform_iter_256_blocksize_2048
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=2049 --local=1024
Output:
['Runtime_BlockedTransform_iter_256_blocksize_2048', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '1024', '2049', '0.000107', '0.000071', '0.000060', '0.000060 0.000060 0.000071 0.000165 0.000180', '0.000060', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']
Runtime_BlockedTransform_iter_256_blocksize_1024
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=2049 --local=1024
Output:
['Runtime_BlockedTransform_iter_256_blocksize_1024', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '1024', '2049', '0.000124', '0.000081', '0.000066', '0.000066 0.000069 0.000081 0.000174 0.000231', '0.000075', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']
Runtime_BlockedTransform_iter_128_blocksize_2048
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=2049 --local=1024
Output:
['Runtime_BlockedTransform_iter_128_blocksize_2048', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '1024', '2049', '0.000138', '0.000166', '0.000058', '0.000058 0.000059 0.000166 0.000171 0.000236', '0.000078', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']
Runtime_BlockedTransform_iter_512_blocksize_1024
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=2049 --local=1024
Output:
['Runtime_BlockedTransform_iter_512_blocksize_1024', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '1024', '2049', '0.000141', '0.000174', '0.000082', '0.000082 0.000086 0.000174 0.000175 0.000188', '0.000052', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']
Runtime_BlockedTransform_iter_128_blocksize_1024
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=2049 --local=1024
Output:
['Runtime_BlockedTransform_iter_128_blocksize_1024', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '1024', '2049', '0.000143', '0.000174', '0.000069', '0.000069 0.000091 0.000174 0.000183 0.000195', '0.000058', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']
Runtime_BlockedTransform_iter_64_blocksize_1024
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=2049 --local=1024
Output:
['Runtime_BlockedTransform_iter_64_blocksize_1024', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '1024', '2049', '0.000437', '0.000079', '0.000064', '0.000064 0.000076 0.000079 0.000201 0.001762', '0.000743', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']
Runtime_BlockedTransform_iter_64_blocksize_2048
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=2049 --local=1024
Output:
['Runtime_BlockedTransform_iter_64_blocksize_2048', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '1024', '2049', '0.000145', '0.000168', '0.000086', '0.000086 0.000109 0.000168 0.000168 0.000194', '0.000046', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']
Compute Benchmarks level_zero run (with params: --save baseline): https://github.com/oneapi-src/unified-runtime/actions/runs/11160575425
Compute Benchmarks level_zero run (--save baseline): https://github.com/oneapi-src/unified-runtime/actions/runs/11160575425 Job status: success. Test status: success.
Summary
No diffs to calculate performance change
(result is better)
Performance change in benchmark groups
Relative perf in group api (6): cannot calculate
| Benchmark | This PR | Relative perf | Change | - |
|---|---|---|---|---|
| api_overhead_benchmark_sycl SubmitKernel out of order | 25.430000 μs | |||
| api_overhead_benchmark_sycl SubmitKernel in order | 25.333000 μs | |||
| api_overhead_benchmark_ur SubmitKernel out of order | 17.647000 μs | |||
| api_overhead_benchmark_ur SubmitKernel in order | 13.226000 μs | |||
| api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024 | 2.157000 μs | |||
| api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024 | 1.663000 μs |
Relative perf in group memory (4): cannot calculate
| Benchmark | This PR | Relative perf | Change | - |
|---|---|---|---|---|
| memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024 | 226.426000 μs | |||
| memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024 | 113.628000 μs | |||
| memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024 | 5.745000 μs | |||
| memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240 | 3.233000 μs |
Relative perf in group miscellaneous (1): cannot calculate
| Benchmark | This PR | Relative perf | Change | - |
|---|---|---|---|---|
| miscellaneous_benchmark_sycl VectorSum | 858.160000 μs |
Relative perf in group Velocity-Bench (5): cannot calculate
| Benchmark | This PR | Relative perf | Change | - |
|---|---|---|---|---|
| Velocity-Bench Hashtable | 361.439400 M keys/sec | |||
| Velocity-Bench Bitcracker | 35.562800 s | |||
| Velocity-Bench CudaSift | 218.822000 ms | |||
| Velocity-Bench QuickSilver | 118.210000 MMS/CTT | |||
| Velocity-Bench Sobel Filter | 551.852000 ms |
Relative perf in group Runtime (16): cannot calculate
| Benchmark | This PR | Relative perf | Change | - |
|---|---|---|---|---|
| Runtime_BlockedTransform_iter_256_blocksize_1024 | 0.076000 ms | |||
| Runtime_BlockedTransform_iter_64_blocksize_2048 | 0.072000 ms | |||
| Runtime_BlockedTransform_iter_512_blocksize_1024 | 0.173000 ms | |||
| Runtime_BlockedTransform_iter_256_blocksize_2048 | 0.061000 ms | |||
| Runtime_BlockedTransform_iter_512_blocksize_2048 | 0.169000 ms | |||
| Runtime_BlockedTransform_iter_128_blocksize_1024 | 0.088000 ms | |||
| Runtime_BlockedTransform_iter_64_blocksize_1024 | 0.241000 ms | |||
| Runtime_BlockedTransform_iter_128_blocksize_2048 | 0.062000 ms | |||
| Runtime_IndependentDAGTaskThroughput_SingleTask | 271.280000 ms | |||
| Runtime_IndependentDAGTaskThroughput_NDRangeParallelFor | 273.134000 ms | |||
| Runtime_IndependentDAGTaskThroughput_HierarchicalParallelFor | 275.897000 ms | |||
| Runtime_IndependentDAGTaskThroughput_BasicParallelFor | 276.025000 ms | |||
| Runtime_DAGTaskThroughput_NDRangeParallelFor | 1794.983000 ms | |||
| Runtime_DAGTaskThroughput_BasicParallelFor | 1703.181000 ms | |||
| Runtime_DAGTaskThroughput_HierarchicalParallelFor | 1696.627000 ms | |||
| Runtime_DAGTaskThroughput_SingleTask | 1649.814000 ms |
Relative perf in group MicroBench (16): cannot calculate
| Benchmark | This PR | Relative perf | Change | - |
|---|---|---|---|---|
| MicroBench_HostDeviceBandwidth_1D_H2D_Strided | 4.693000 ms | |||
| MicroBench_HostDeviceBandwidth_1D_D2H_Strided | 4.874000 ms | |||
| MicroBench_HostDeviceBandwidth_3D_H2D_Contiguous | 5.001000 ms | |||
| MicroBench_HostDeviceBandwidth_3D_H2D_Strided | 4.917000 ms | |||
| MicroBench_HostDeviceBandwidth_3D_D2H_Strided | 617.648000 ms | |||
| MicroBench_HostDeviceBandwidth_1D_D2H_Contiguous | 5.126000 ms | |||
| MicroBench_HostDeviceBandwidth_1D_H2D_Contiguous | 4.903000 ms | |||
| MicroBench_HostDeviceBandwidth_2D_D2H_Strided | 617.440000 ms | |||
| MicroBench_HostDeviceBandwidth_2D_H2D_Contiguous | 5.022000 ms | |||
| MicroBench_HostDeviceBandwidth_2D_H2D_Strided | 4.906000 ms | |||
| MicroBench_HostDeviceBandwidth_2D_D2H_Contiguous | 618.361000 ms | |||
| MicroBench_HostDeviceBandwidth_3D_D2H_Contiguous | 618.374000 ms | |||
| MicroBench_LocalMem_fp32_4096 | 30.433000 ms | |||
| MicroBench_LocalMem_int32_4096 | 30.379000 ms | |||
| MicroBench_Arith_fp32_512 | 0.019000 ms | |||
| MicroBench_Arith_int32_512 | 0.037000 ms |
Relative perf in group Pattern (10): cannot calculate
| Benchmark | This PR | Relative perf | Change | - |
|---|---|---|---|---|
| Pattern_Reduction_NDRange_int32 | 16.333000 ms | |||
| Pattern_Reduction_Hierarchical_int32 | 16.204000 ms | |||
| Pattern_SegmentedReduction_Hierarchical_int16 | 12.216000 ms | |||
| Pattern_SegmentedReduction_NDRange_fp32 | 5.715000 ms | |||
| Pattern_SegmentedReduction_NDRange_int64 | 6.193000 ms | |||
| Pattern_SegmentedReduction_Hierarchical_int64 | 12.256000 ms | |||
| Pattern_SegmentedReduction_Hierarchical_fp32 | 12.051000 ms | |||
| Pattern_SegmentedReduction_NDRange_int32 | 5.720000 ms | |||
| Pattern_SegmentedReduction_NDRange_int16 | 6.078000 ms | |||
| Pattern_SegmentedReduction_Hierarchical_int32 | 12.056000 ms |
Relative perf in group ScalarProduct (6): cannot calculate
| Benchmark | This PR | Relative perf | Change | - |
|---|---|---|---|---|
| ScalarProduct_NDRange_fp32 | 6.352000 ms | |||
| ScalarProduct_NDRange_int64 | 8.233000 ms | |||
| ScalarProduct_Hierarchical_int64 | 11.557000 ms | |||
| ScalarProduct_NDRange_int32 | 6.330000 ms | |||
| ScalarProduct_Hierarchical_fp32 | 10.263000 ms | |||
| ScalarProduct_Hierarchical_int32 | 10.595000 ms |
Relative perf in group USM (7): cannot calculate
| Benchmark | This PR | Relative perf | Change | - |
|---|---|---|---|---|
| USM_Allocation_latency_fp32_shared | 0.137000 ms | |||
| USM_Allocation_latency_fp32_host | 37.346000 ms | |||
| USM_Allocation_latency_fp32_device | 0.145000 ms | |||
| USM_Instr_Mix_fp32_device_1:1mix_no_init_no_prefetch | 1.801000 ms | |||
| USM_Instr_Mix_fp32_device_1:1mix_with_init_no_prefetch | 1.649000 ms | |||
| USM_Instr_Mix_fp32_host_1:1mix_no_init_no_prefetch | 1.192000 ms | |||
| USM_Instr_Mix_fp32_host_1:1mix_with_init_no_prefetch | 1.035000 ms |
Relative perf in group VectorAddition (3): cannot calculate
| Benchmark | This PR | Relative perf | Change | - |
|---|---|---|---|---|
| VectorAddition_int32 | 1.447000 ms | |||
| VectorAddition_fp32 | 1.449000 ms | |||
| VectorAddition_int64 | 3.075000 ms |
Relative perf in group Polybench (4): cannot calculate
| Benchmark | This PR | Relative perf | Change | - |
|---|---|---|---|---|
| Polybench_2DConvolution | 0.194000 ms | |||
| Polybench_2mm | 1.223000 ms | |||
| Polybench_3mm | 1.725000 ms | |||
| Polybench_Atax | 6.736000 ms |
Relative perf in group ReductionAtomic (4): cannot calculate
| Benchmark | This PR | Relative perf | Change | - |
|---|---|---|---|---|
| ReductionAtomic_fp64 | 0.020000 ms | |||
| ReductionAtomic_int32 | 0.012000 ms | |||
| ReductionAtomic_int64 | 0.010000 ms | |||
| ReductionAtomic_fp32 | 0.020000 ms |
Relative perf in group Kmeans (1): cannot calculate
| Benchmark | This PR | Relative perf | Change | - |
|---|---|---|---|---|
| Kmeans_fp32 | 16.170000 ms |
Relative perf in group LinearRegressionCoeff (1): cannot calculate
| Benchmark | This PR | Relative perf | Change | - |
|---|---|---|---|---|
| LinearRegressionCoeff_fp32 | 966.801000 ms |
Relative perf in group LinearRegression (1): cannot calculate
| Benchmark | This PR | Relative perf | Change | - |
|---|---|---|---|---|
| LinearRegression_fp32 | 0.427000 ms |
Relative perf in group MolecularDynamics (1): cannot calculate
| Benchmark | This PR | Relative perf | Change | - |
|---|---|---|---|---|
| MolecularDynamics | 0.027000 ms |
Details
Benchmark details - environment, command, output...
api_overhead_benchmark_sycl SubmitKernel out of order
Environment Variables:
Command:
/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1
Output:
TestCase,Mean,Median,StdDev,Min,Max,Type SubmitKernel(api=sycl Profiling=0 Ioq=0 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),25.430,25.726,5.87%,21.891,357.199,[CPU],[us]
api_overhead_benchmark_sycl SubmitKernel in order
Environment Variables:
Command:
/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1
Output:
TestCase,Mean,Median,StdDev,Min,Max,Type SubmitKernel(api=sycl Profiling=0 Ioq=1 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),25.333,25.319,3.41%,23.969,268.200,[CPU],[us]
api_overhead_benchmark_ur SubmitKernel out of order
Environment Variables:
Command:
/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1
Output:
TestCase,Mean,Median,StdDev,Min,Max,Type SubmitKernel(api=ur Profiling=0 Ioq=0 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),17.647,17.625,4.63%,16.433,250.023,[CPU],[us]
api_overhead_benchmark_ur SubmitKernel in order
Environment Variables:
Command:
/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1
Output:
TestCase,Mean,Median,StdDev,Min,Max,Type SubmitKernel(api=ur Profiling=0 Ioq=1 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),13.226,13.215,1.82%,12.533,48.001,[CPU],[us]
memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024
Environment Variables:
Command:
/home/test-user/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueInOrderMemcpy --csv --noHeaders --iterations=10000 --IsCopyOnly=0 --sourcePlacement=Device --destinationPlacement=Device --size=1024 --count=100
Output:
TestCase,Mean,Median,StdDev,Min,Max,Type QueueInOrderMemcpy(api=sycl IsCopyOnly=0 sourcePlacement=Device destinationPlacement=Device size=1KB count=100),226.426,226.335,1.07%,220.607,435.450,[CPU],[us]
memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024
Environment Variables:
Command:
/home/test-user/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueInOrderMemcpy --csv --noHeaders --iterations=10000 --IsCopyOnly=0 --sourcePlacement=Host --destinationPlacement=Device --size=1024 --count=100
Output:
TestCase,Mean,Median,StdDev,Min,Max,Type QueueInOrderMemcpy(api=sycl IsCopyOnly=0 sourcePlacement=Host destinationPlacement=Device size=1KB count=100),113.628,113.585,0.85%,111.127,171.830,[CPU],[us]
memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024
Environment Variables:
Command:
/home/test-user/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueMemcpy --csv --noHeaders --iterations=10000 --sourcePlacement=Device --destinationPlacement=Device --size=1024
Output:
TestCase,Mean,Median,StdDev,Min,Max,Type QueueMemcpy(api=sycl sourcePlacement=Device destinationPlacement=Device size=1KB),5.745,5.565,11.05%,5.185,34.508,[CPU],[us]
memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240
Environment Variables:
Command:
/home/test-user/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=StreamMemory --csv --noHeaders --iterations=10000 --type=Triad --size=10240 --memoryPlacement=Device --useEvents=0 --contents=Zeros
Output:
TestCase,Mean,Median,StdDev,Min,Max,Type StreamMemory(api=sycl type=Triad size=10KB useEvents=0 contents=Zeros memoryPlacement=Device),3.233,3.254,3.57%,0.516,3.464,[CPU],[GB/s]
api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024
Environment Variables:
Command:
/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=ExecImmediateCopyQueue --csv --noHeaders --iterations=100000 --ioq=0 --IsCopyOnly=1 --MeasureCompletionTime=0 --src=Device --dst=Device --size=1024
Output:
TestCase,Mean,Median,StdDev,Min,Max,Type ExecImmediateCopyQueue(api=sycl IsCopyOnly=1 MeasureCompletionTime=0 src=Device dst=Device size=1KB ioq=0),2.157,2.152,6.26%,1.958,33.815,[CPU],[us]
api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024
Environment Variables:
Command:
/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=ExecImmediateCopyQueue --csv --noHeaders --iterations=100000 --ioq=1 --IsCopyOnly=1 --MeasureCompletionTime=0 --src=Host --dst=Host --size=1024
Output:
TestCase,Mean,Median,StdDev,Min,Max,Type ExecImmediateCopyQueue(api=sycl IsCopyOnly=1 MeasureCompletionTime=0 src=Host dst=Host size=1KB ioq=1),1.663,1.657,5.71%,1.568,23.445,[CPU],[us]
miscellaneous_benchmark_sycl VectorSum
Environment Variables:
Command:
/home/test-user/bench_workdir/compute-benchmarks-build/bin/miscellaneous_benchmark_sycl --test=VectorSum --csv --noHeaders --iterations=1000 --numberOfElementsX=512 --numberOfElementsY=256 --numberOfElementsZ=256
Output:
TestCase,Mean,Median,StdDev,Min,Max,Type VectorSum(api=sycl numberOfElementsX=512 numberOfElementsY=256 numberOfElementsZ=256),858.160,858.609,0.39%,807.892,867.787,[GPU],bw [GB/s]
Velocity-Bench Hashtable
Environment Variables:
Command:
/home/test-user/bench_workdir/hashtable/hashtable_sycl --no-verify
Output:
hashtable - total time for whole calculation: 0.371342 s 361.439400 million keys/second
Velocity-Bench Bitcracker
Environment Variables:
Command:
/home/test-user/bench_workdir/bitcracker/bitcracker -f /home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/img_win8_user_hash.txt -d /home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/user_passwords_60000.txt -b 60000
Output:
---------> BitCracker: BitLocker password cracking tool <---------
================================== Retrieving Info
Reading hash file "/home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/img_win8_user_hash.txt"
Attack
================================================ Type of attack: User Password Psw per thread: 1 max_num_pswd_per_read: 60000 Dictionary: /home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/user_passwords_60000.txt MAC Comparison (-m): Yes
Iter: 1, num passwords read: 60000 Kernel execution: Effective passwords: 60000 Passwords Range: npknpByH7N2m3OnLNH1X9DJxLrzIFWk ..... dL_7uuf3QCz-c6K3xDu0
================================================ Bitcracker attack completed Total passwords evaluated: 60000 Password not found!
time to subtract from total: 0.00421934 s bitcracker - total time for whole calculation: 35.5628 s
Velocity-Bench CudaSift
Environment Variables:
Command:
/home/test-user/bench_workdir/cudaSift/cudaSift
Output:
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1237 1272 33.5868% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1222 1255 33.1795% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1208 1264 32.7993% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1222 1257 33.1795% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1239 1275 33.6411% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1230 1266 33.3967% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1222 1255 33.1795% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1224 1257 33.2338% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1233 1265 33.4781% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1232 1264 33.451% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1228 1262 33.3424% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1218 1254 33.0709% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1223 1266 33.2066% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1089 1255 29.5683% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1232 1265 33.451% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1226 1262 33.2881% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1214 1261 32.9623% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1224 1274 33.2338% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1235 1266 33.5324% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1237 1268 33.5868% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1232 1266 33.451% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1072 1257 29.1067% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1123 1259 30.4914% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1240 1274 33.6682% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1217 1249 33.0437% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1233 1267 33.4781% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1169 1266 31.7404% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1225 1258 33.2609% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1228 1266 33.3424% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1028 1265 27.912% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1137 1262 30.8716% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1233 1266 33.4781% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1235 1270 33.5324% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1217 1252 33.0437% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1216 1252 33.0166% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1212 1265 32.908% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1241 1275 33.6954% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1100 1262 29.867% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1227 1262 33.3152% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1234 1269 33.5053% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1228 1261 33.3424% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1231 1263 33.4238% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1095 1272 29.7312% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1234 1271 33.5053% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1105 1260 30.0027% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1099 1259 29.8398% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1213 1252 32.9351% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1234 1269 33.5053% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1238 1271 33.6139% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1238 1272 33.6139% 1 2
Performing data verification Data verification is SUCCESSFUL.
Avg workload time = 218.822 ms
Velocity-Bench QuickSilver
Environment Variables:
QS_DEVICE=GPU
Command:
/home/test-user/bench_workdir/QuickSilver/qs -i /home/test-user/bench_workdir/velocity-bench-repo/QuickSilver/Examples/AllScattering/scatteringOnly.inp
Output:
Copyright (c) 2016 Lawrence Livermore National Security, LLC All Rights Reserved Quicksilver Version : Quicksilver Git Hash : MPI Version : 3.0 Number of MPI ranks : 1 Number of OpenMP Threads: 1 Number of OpenMP CPUs : 1
Loading params Finished loading params Simulation: dt: 1e-08 fMax: 0.1 inputFile: /home/test-user/bench_workdir/velocity-bench-repo/QuickSilver/Examples/AllScattering/scatteringOnly.inp energySpectrum: boundaryCondition: octant loadBalance: 1 cycleTimers: 0 debugThreads: 0 lx: 100 ly: 100 lz: 100 nParticles: 10000000 batchSize: 0 nBatches: 10 nSteps: 10 nx: 10 ny: 10 nz: 10 seed: 1029384756 xDom: 0 yDom: 0 zDom: 0 eMax: 20 eMin: 1e-09 nGroups: 230 lowWeightCutoff: 0.001 bTally: 1 fTally: 1 cTally: 1 coralBenchmark: 0 crossSectionsOut:
Geometry: material: sourceMaterial shape: brick xMax: 100 xMin: 0 yMax: 100 yMin: 0 zMax: 100 zMin: 0
Material: name: sourceMaterial mass: 1000 nIsotopes: 10 nReactions: 9 sourceRate: 1e+10 totalCrossSection: 0.1 absorptionCrossSection: flat fissionCrossSection: flat scatteringCrossSection: flat absorptionCrossSectionRatio: 0 fissionCrossSectionRatio: 0 scatteringCrossSectionRatio: 1
CrossSection: name: flat A: 0 B: 0 C: 0 D: 0 E: 1 nuBar: 2.4 setting GPU setting parameters Building partition 0 Building partition 1 Building partition 2 Building partition 3 Building MC_Domain 0 Building MC_Domain 1 Building MC_Domain 2 Building MC_Domain 3 Starting Consistency Check Finished Consistency Check Finished initMesh Started copyMaterialDatabase_device Finished copyMaterialDatabase_device Finished copyNuclearData_device Finished copyDomainDevice cycle start source rr split absorb scatter fission produce collisn escape census num_seg scalar_flux cycleInit cycleTracking cycleFinalize 0 0 1000000 0 9000000 0 18533189 0 0 18533189 1151780 8848220 55527935 1.854923e+09 4.303970e-01 6.148200e-01 0.000000e+00 1 8848220 1000000 0 151478 0 34281997 0 0 34281997 1664159 8335539 94633679 5.047651e+09 3.638310e-01 7.487510e-01 0.000000e+00 2 8335539 1000000 0 663717 0 34354432 0 0 34354432 1366771 8632485 95010375 7.705930e+09 3.329380e-01 7.640800e-01 0.000000e+00 3 8632485 1000000 0 367978 0 34302727 0 0 34302727 1242216 8758247 94953591 9.992076e+09 3.678010e-01 8.284720e-01 0.000000e+00 4 8758247 1000000 0 242076 0 34141236 0 0 34141236 1168452 8831871 94599337 1.199834e+10 3.590150e-01 7.972970e-01 0.000000e+00 5 8831871 1000000 0 168070 0 33948724 0 0 33948724 1121156 8878785 94148236 1.377636e+10 3.596570e-01 7.672700e-01 0.000000e+00 6 8878785 1000000 0 120572 0 33760567 0 0 33760567 1089103 8910254 93689264 1.535668e+10 3.289830e-01 7.648450e-01 0.000000e+00 7 8910254 1000000 0 89810 0 33552179 0 0 33552179 1065203 8934861 93216931 1.676993e+10 3.299110e-01 7.897290e-01 0.000000e+00 8 8934861 1000000 0 65491 0 33384605 0 0 33384605 1047720 8952632 92768273 1.804559e+10 3.296550e-01 7.852080e-01 0.000000e+00 9 8952632 1000000 0 47165 0 33198494 0 0 33198494 1033968 8965829 92324678 1.920208e+10 3.305270e-01 7.601730e-01 0.000000e+00
Timer Cumulative Cumulative Cumulative Cumulative Cumulative Cumulative Name number microSecs microSecs microSecs microSecs Efficiency of calls min avg max stddev Rating main 1 1.115e+07 1.115e+07 1.115e+07 0.000e+00 100.00 cycleInit 10 3.533e+06 3.533e+06 3.533e+06 0.000e+00 100.00 cycleTracking 10 7.621e+06 7.621e+06 7.621e+06 0.000e+00 100.00 cycleTracking_Kernel 104 4.935e+06 4.935e+06 4.935e+06 0.000e+00 100.00 cycleTracking_MPI 117 2.123e+05 2.123e+05 2.123e+05 0.000e+00 100.00 cycleTracking_Test_Done 0 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.00 cycleFinalize 20 4.800e+02 4.800e+02 4.800e+02 0.000e+00 100.00 Figure Of Merit 118.21 [Num Mega Segments / Cycle Tracking Time]
Velocity-Bench Sobel Filter
Environment Variables:
OPENCV_IO_MAX_IMAGE_PIXELS=1677721600
Command:
/home/test-user/bench_workdir/sobel_filter/sobel_filter -i /home/test-user/bench_workdir/data/sobel_filter/sobel_filter_data/silverfalls_32Kx32K.png -n 5
Output:
SYMN: Welcome to the SYCL version of Sobel filter workload. SYMN: Input image file: /home/test-user/bench_workdir/data/sobel_filter/sobel_filter_data/silverfalls_32Kx32K.png SYMN: Launching SYCL kernel with # of iterations: 5 time to subtract from total: 7.47102 s sobelfilter - total time for whole calculation: 0.551852 s
Runtime_BlockedTransform_iter_256_blocksize_1024
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=2049 --local=1024
Output:
['Runtime_BlockedTransform_iter_256_blocksize_1024', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '1024', '2049', '0.000092', '0.000076', '0.000068', '0.000068 0.000069 0.000076 0.000078 0.000167', '0.000042', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']
Runtime_BlockedTransform_iter_64_blocksize_2048
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=2049 --local=1024
Output:
['Runtime_BlockedTransform_iter_64_blocksize_2048', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '1024', '2049', '0.000117', '0.000072', '0.000059', '0.000059 0.000072 0.000072 0.000118 0.000266', '0.000086', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']
Runtime_BlockedTransform_iter_512_blocksize_1024
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=2049 --local=1024
Output:
['Runtime_BlockedTransform_iter_512_blocksize_1024', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '1024', '2049', '0.000159', '0.000173', '0.000092', '0.000092 0.000172 0.000173 0.000176 0.000181', '0.000037', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']
Runtime_BlockedTransform_iter_256_blocksize_2048
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=2049 --local=1024
Output:
['Runtime_BlockedTransform_iter_256_blocksize_2048', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '1024', '2049', '0.000083', '0.000061', '0.000058', '0.000058 0.000058 0.000061 0.000071 0.000166', '0.000047', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']
Runtime_BlockedTransform_iter_512_blocksize_2048
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=2049 --local=1024
Output:
['Runtime_BlockedTransform_iter_512_blocksize_2048', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '1024', '2049', '0.000168', '0.000169', '0.000162', '0.000162 0.000166 0.000169 0.000169 0.000172', '0.000004', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']
Runtime_BlockedTransform_iter_128_blocksize_1024
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=2049 --local=1024
Output:
['Runtime_BlockedTransform_iter_128_blocksize_1024', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '1024', '2049', '0.000116', '0.000088', '0.000067', '0.000067 0.000069 0.000088 0.000173 0.000183', '0.000057', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']
Runtime_BlockedTransform_iter_64_blocksize_1024
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=2049 --local=1024
Output:
['Runtime_BlockedTransform_iter_64_blocksize_1024', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '1024', '2049', '0.000491', '0.000241', '0.000065', '0.000065 0.000071 0.000241 0.000245 0.001830', '0.000754', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']
Runtime_BlockedTransform_iter_128_blocksize_2048
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/blocked_transform --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/BlockedTransform_multi.csv --size=2049 --local=1024
Output:
['Runtime_BlockedTransform_iter_128_blocksize_2048', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '1024', '2049', '0.000088', '0.000062', '0.000059', '0.000059 0.000062 0.000062 0.000069 0.000187', '0.000056', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']
Runtime_IndependentDAGTaskThroughput_SingleTask
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_independent --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/IndependentDAGTaskThroughput_multi.csv --size=32768
Output:
['Runtime_IndependentDAGTaskThroughput_SingleTask', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '32768', '0.274622', '0.271280', '0.262243', '0.262243 0.270004 0.271280 0.274935 0.294648', '0.012113', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']
Runtime_IndependentDAGTaskThroughput_NDRangeParallelFor
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_independent --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/IndependentDAGTaskThroughput_multi.csv --size=32768
Output:
['Runtime_IndependentDAGTaskThroughput_NDRangeParallelFor', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '32768', '0.272927', '0.273134', '0.271921', '0.271921 0.272357 0.273134 0.273243 0.273982', '0.000805', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']
Runtime_IndependentDAGTaskThroughput_HierarchicalParallelFor
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_independent --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/IndependentDAGTaskThroughput_multi.csv --size=32768
Output:
['Runtime_IndependentDAGTaskThroughput_HierarchicalParallelFor', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '32768', '0.280174', '0.275897', '0.273980', '0.273980 0.275569 0.275897 0.276162 0.299265', '0.010706', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']
Runtime_IndependentDAGTaskThroughput_BasicParallelFor
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_independent --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/IndependentDAGTaskThroughput_multi.csv --size=32768
Output:
['Runtime_IndependentDAGTaskThroughput_BasicParallelFor', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '32768', '0.278067', '0.276025', '0.269861', '0.269861 0.271613 0.276025 0.279537 0.293297', '0.009317', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']
Runtime_DAGTaskThroughput_NDRangeParallelFor
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_sequential --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/DAGTaskThroughput_multi.csv --size=327680
Output:
['Runtime_DAGTaskThroughput_NDRangeParallelFor', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '327680', '1.793923', '1.794983', '1.790988', '1.790988 1.791860 1.794983 1.795719 1.796067', '0.002335', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']
Runtime_DAGTaskThroughput_BasicParallelFor
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_sequential --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/DAGTaskThroughput_multi.csv --size=327680
Output:
['Runtime_DAGTaskThroughput_BasicParallelFor', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '327680', '1.704656', '1.703181', '1.701674', '1.701674 1.701734 1.703181 1.706069 1.710621', '0.003781', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']
Runtime_DAGTaskThroughput_HierarchicalParallelFor
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_sequential --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/DAGTaskThroughput_multi.csv --size=327680
Output:
['Runtime_DAGTaskThroughput_HierarchicalParallelFor', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '327680', '1.732877', '1.696627', '1.692526', '1.692526 1.692941 1.696627 1.768344 1.813945', '0.055603', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']
Runtime_DAGTaskThroughput_SingleTask
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_sequential --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/DAGTaskThroughput_multi.csv --size=327680
Output:
['Runtime_DAGTaskThroughput_SingleTask', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '327680', '1.654001', '1.649814', '1.648181', '1.648181 1.649490 1.649814 1.655479 1.667043', '0.007811', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']
MicroBench_HostDeviceBandwidth_1D_H2D_Strided
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512
Output:
['MicroBench_HostDeviceBandwidth_1D_H2D_Strided', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.004731', '0.004693', '0.004669', '0.004669 0.004684 0.004693 0.004799 0.004808', '0.000067', '26.772001', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']
MicroBench_HostDeviceBandwidth_1D_D2H_Strided
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512
Output:
['MicroBench_HostDeviceBandwidth_1D_D2H_Strided', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.004956', '0.004874', '0.004778', '0.004778 0.004831 0.004874 0.005104 0.005191', '0.000181', '26.160128', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']
MicroBench_HostDeviceBandwidth_3D_H2D_Contiguous
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512
Output:
['MicroBench_HostDeviceBandwidth_3D_H2D_Contiguous', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.004993', '0.005001', '0.004937', '0.004937 0.004963 0.005001 0.005021 0.005042', '0.000043', '25.320107', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']
MicroBench_HostDeviceBandwidth_3D_H2D_Strided
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512
Output:
['MicroBench_HostDeviceBandwidth_3D_H2D_Strided', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.004854', '0.004917', '0.004562', '0.004562 0.004853 0.004917 0.004962 0.004976', '0.000170', '27.401428', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']
MicroBench_HostDeviceBandwidth_3D_D2H_Strided
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512
Output:
['MicroBench_HostDeviceBandwidth_3D_D2H_Strided', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.617645', '0.617648', '0.617522', '0.617522 0.617612 0.617648 0.617711 0.617731', '0.000084', '0.202422', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']
MicroBench_HostDeviceBandwidth_1D_D2H_Contiguous
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512
Output:
['MicroBench_HostDeviceBandwidth_1D_D2H_Contiguous', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.005112', '0.005126', '0.005030', '0.005030 0.005078 0.005126 0.005136 0.005191', '0.000061', '24.850786', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']
MicroBench_HostDeviceBandwidth_1D_H2D_Contiguous
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512
Output:
['MicroBench_HostDeviceBandwidth_1D_H2D_Contiguous', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.005249', '0.004903', '0.004793', '0.004793 0.004836 0.004903 0.004955 0.006757', '0.000846', '26.077627', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']
MicroBench_HostDeviceBandwidth_2D_D2H_Strided
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512
Output:
['MicroBench_HostDeviceBandwidth_2D_D2H_Strided', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.617479', '0.617440', '0.617293', '0.617293 0.617417 0.617440 0.617519 0.617725', '0.000160', '0.202497', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']
MicroBench_HostDeviceBandwidth_2D_H2D_Contiguous
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512
Output:
['MicroBench_HostDeviceBandwidth_2D_H2D_Contiguous', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.005027', '0.005022', '0.004937', '0.004937 0.004962 0.005022 0.005092 0.005123', '0.000080', '25.318332', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']
MicroBench_HostDeviceBandwidth_2D_H2D_Strided
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512
Output:
['MicroBench_HostDeviceBandwidth_2D_H2D_Strided', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.004842', '0.004906', '0.004729', '0.004729 0.004731 0.004906 0.004913 0.004932', '0.000103', '26.434858', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']
MicroBench_HostDeviceBandwidth_2D_D2H_Contiguous
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512
Output:
['MicroBench_HostDeviceBandwidth_2D_D2H_Contiguous', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.618351', '0.618361', '0.618271', '0.618271 0.618352 0.618361 0.618370 0.618400', '0.000048', '0.202177', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']
MicroBench_HostDeviceBandwidth_3D_D2H_Contiguous
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512
Output:
['MicroBench_HostDeviceBandwidth_3D_D2H_Contiguous', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.618328', '0.618374', '0.618116', '0.618116 0.618316 0.618374 0.618415 0.618421', '0.000126', '0.202227', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']
MicroBench_LocalMem_fp32_4096
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/local_mem --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/LocalMem_multi.csv --size=10240000
Output:
['MicroBench_LocalMem_fp32_4096', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '10240000', '0.030430', '0.030433', '0.030367', '0.030367 0.030421 0.030433 0.030451 0.030480', '0.000042', '10274.406342', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '312.000000']
MicroBench_LocalMem_int32_4096
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/local_mem --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/LocalMem_multi.csv --size=10240000
Output:
['MicroBench_LocalMem_int32_4096', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '10240000', '0.030385', '0.030379', '0.030320', '0.030320 0.030373 0.030379 0.030393 0.030462', '0.000051', '10290.324690', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '312.000000']
Pattern_Reduction_NDRange_int32
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/reduction --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Pattern_Reduction_multi.csv --size=10240000
Output:
['Pattern_Reduction_NDRange_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '10240000', '0.016354', '0.016333', '0.016189', '0.016189 0.016260 0.016333 0.016423 0.016565', '0.000146', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']
Pattern_Reduction_Hierarchical_int32
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/reduction --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Pattern_Reduction_multi.csv --size=10240000
Output:
['Pattern_Reduction_Hierarchical_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '10240000', '0.016367', '0.016204', '0.016132', '0.016132 0.016174 0.016204 0.016507 0.016818', '0.000292', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']
ScalarProduct_NDRange_fp32
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/ScalarProduct_multi.csv --size=102400000
Output:
['ScalarProduct_NDRange_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.006377', '0.006352', '0.006335', '0.006335 0.006349 0.006352 0.006374 0.006474', '0.000056', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']
ScalarProduct_NDRange_int64
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/ScalarProduct_multi.csv --size=102400000
Output:
['ScalarProduct_NDRange_int64', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.008235', '0.008233', '0.008216', '0.008216 0.008227 0.008233 0.008246 0.008255', '0.000015', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']
ScalarProduct_Hierarchical_int64
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/ScalarProduct_multi.csv --size=102400000
Output:
['ScalarProduct_Hierarchical_int64', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.011566', '0.011557', '0.011543', '0.011543 0.011550 0.011557 0.011590 0.011591', '0.000023', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']
ScalarProduct_NDRange_int32
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/ScalarProduct_multi.csv --size=102400000
Output:
['ScalarProduct_NDRange_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.006349', '0.006330', '0.006310', '0.006310 0.006330 0.006330 0.006336 0.006438', '0.000051', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']
ScalarProduct_Hierarchical_fp32
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/ScalarProduct_multi.csv --size=102400000
Output:
['ScalarProduct_Hierarchical_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.010249', '0.010263', '0.010203', '0.010203 0.010242 0.010263 0.010263 0.010273', '0.000028', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']
ScalarProduct_Hierarchical_int32
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/ScalarProduct_multi.csv --size=102400000
Output:
['ScalarProduct_Hierarchical_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.010595', '0.010595', '0.010531', '0.010531 0.010569 0.010595 0.010630 0.010649', '0.000047', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']
Pattern_SegmentedReduction_Hierarchical_int16
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000
Output:
['Pattern_SegmentedReduction_Hierarchical_int16', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.012220', '0.012216', '0.012213', '0.012213 0.012215 0.012216 0.012222 0.012235', '0.000009', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']
Pattern_SegmentedReduction_NDRange_fp32
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000
Output:
['Pattern_SegmentedReduction_NDRange_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.005716', '0.005715', '0.005712', '0.005712 0.005714 0.005715 0.005718 0.005722', '0.000004', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']
Pattern_SegmentedReduction_NDRange_int64
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000
Output:
['Pattern_SegmentedReduction_NDRange_int64', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.006193', '0.006193', '0.006182', '0.006182 0.006192 0.006193 0.006197 0.006202', '0.000007', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']
Pattern_SegmentedReduction_Hierarchical_int64
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000
Output:
['Pattern_SegmentedReduction_Hierarchical_int64', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.012259', '0.012256', '0.012235', '0.012235 0.012247 0.012256 0.012259 0.012297', '0.000023', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']
Pattern_SegmentedReduction_Hierarchical_fp32
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000
Output:
['Pattern_SegmentedReduction_Hierarchical_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.012053', '0.012051', '0.012034', '0.012034 0.012046 0.012051 0.012052 0.012083', '0.000018', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']
Pattern_SegmentedReduction_NDRange_int32
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000
Output:
['Pattern_SegmentedReduction_NDRange_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.005720', '0.005720', '0.005713', '0.005713 0.005714 0.005720 0.005725 0.005730', '0.000007', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']
Pattern_SegmentedReduction_NDRange_int16
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000
Output:
['Pattern_SegmentedReduction_NDRange_int16', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.006082', '0.006078', '0.006076', '0.006076 0.006077 0.006078 0.006083 0.006097', '0.000009', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']
Pattern_SegmentedReduction_Hierarchical_int32
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000
Output:
['Pattern_SegmentedReduction_Hierarchical_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.012057', '0.012056', '0.012049', '0.012049 0.012050 0.012056 0.012057 0.012070', '0.000008', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']
USM_Allocation_latency_fp32_shared
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/usm_allocation_latency --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/USM_Allocation_latency_multi.csv --size=1024000000
Output:
['USM_Allocation_latency_fp32_shared', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '1024000000', '0.000138', '0.000137', '0.000136', '0.000136 0.000137 0.000137 0.000137 0.000142', '0.000002', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']
USM_Allocation_latency_fp32_host
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/usm_allocation_latency --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/USM_Allocation_latency_multi.csv --size=1024000000
Output:
['USM_Allocation_latency_fp32_host', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '1024000000', '0.037358', '0.037346', '0.037234', '0.037234 0.037322 0.037346 0.037386 0.037501', '0.000097', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']
USM_Allocation_latency_fp32_device
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/usm_allocation_latency --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/USM_Allocation_latency_multi.csv --size=1024000000
Output:
['USM_Allocation_latency_fp32_device', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '1024000000', '0.000183', '0.000145', '0.000047', '0.000047 0.000138 0.000145 0.000169 0.000415', '0.000138', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']
USM_Instr_Mix_fp32_device_1:1mix_no_init_no_prefetch
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/USM_Instr_Mix_multi.csv --size=8192
Output:
['USM_Instr_Mix_fp32_device_1:1mix_no_init_no_prefetch', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '8192', '0.001827', '0.001801', '0.001797', '0.001797 0.001799 0.001801 0.001858 0.001882', '0.000040', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']
USM_Instr_Mix_fp32_device_1:1mix_with_init_no_prefetch
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/USM_Instr_Mix_multi.csv --size=8192
Output:
['USM_Instr_Mix_fp32_device_1:1mix_with_init_no_prefetch', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '8192', '0.001974', '0.001649', '0.001638', '0.001638 0.001642 0.001649 0.001651 0.003289', '0.000736', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']
USM_Instr_Mix_fp32_host_1:1mix_no_init_no_prefetch
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/USM_Instr_Mix_multi.csv --size=8192
Output:
['USM_Instr_Mix_fp32_host_1:1mix_no_init_no_prefetch', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '8192', '0.001194', '0.001192', '0.001189', '0.001189 0.001192 0.001192 0.001193 0.001202', '0.000005', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']
USM_Instr_Mix_fp32_host_1:1mix_with_init_no_prefetch
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/USM_Instr_Mix_multi.csv --size=8192
Output:
['USM_Instr_Mix_fp32_host_1:1mix_with_init_no_prefetch', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '8192', '0.001039', '0.001035', '0.001033', '0.001033 0.001033 0.001035 0.001039 0.001057', '0.000010', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']
VectorAddition_int32
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/vec_add --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/VectorAddition_multi.csv --size=102400000
Output:
['VectorAddition_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.001472', '0.001447', '0.001429', '0.001429 0.001443 0.001447 0.001507 0.001534', '0.000046', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']
VectorAddition_fp32
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/vec_add --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/VectorAddition_multi.csv --size=102400000
Output:
['VectorAddition_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.001461', '0.001449', '0.001432', '0.001432 0.001448 0.001449 0.001450 0.001525', '0.000037', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']
VectorAddition_int64
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/vec_add --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/VectorAddition_multi.csv --size=102400000
Output:
['VectorAddition_int64', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.003069', '0.003075', '0.003049', '0.003049 0.003063 0.003075 0.003079 0.003080', '0.000013', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']
Polybench_2DConvolution
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/2DConvolution --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/2DConvolution.csv
Output:
['Polybench_2DConvolution', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000194', '0.000194', '0.000183', '0.000183 0.000183 0.000194 0.000199 0.000209', '0.000011', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']
Polybench_2mm
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/2mm --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/2mm.csv --size=512
Output:
['Polybench_2mm', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.001221', '0.001223', '0.001208', '0.001208 0.001212 0.001223 0.001228 0.001232', '0.000010', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']
Polybench_3mm
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/3mm --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/3mm.csv --size=512
Output:
['Polybench_3mm', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.001728', '0.001725', '0.001722', '0.001722 0.001723 0.001725 0.001727 0.001741', '0.000007', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']
MicroBench_Arith_fp32_512
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/arith --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Arith_int32_512.csv --size=16384
Output:
['MicroBench_Arith_fp32_512', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '16384', '0.000021', '0.000019', '0.000019', '0.000019 0.000019 0.000019 0.000020 0.000030', '0.000005', '1658.528819', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.031250']
MicroBench_Arith_int32_512
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/arith --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Arith_int32_512.csv --size=16384
Output:
['MicroBench_Arith_int32_512', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '16384', '0.000053', '0.000037', '0.000037', '0.000037 0.000037 0.000037 0.000042 0.000113', '0.000033', '852.939571', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.031250']
Polybench_Atax
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/atax --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Atax.csv --size=8192
Output:
['Polybench_Atax', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '8192', '0.006771', '0.006736', '0.006711', '0.006711 0.006732 0.006736 0.006798 0.006877', '0.000068', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']
ReductionAtomic_fp64
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/atomic_reduction --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/ReductionAtomic_fp64.csv
Output:
['ReductionAtomic_fp64', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000050', '0.000020', '0.000019', '0.000019 0.000019 0.000020 0.000022 0.000171', '0.000067', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']
ReductionAtomic_int32
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/atomic_reduction --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/ReductionAtomic_fp64.csv
Output:
['ReductionAtomic_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000019', '0.000012', '0.000010', '0.000010 0.000010 0.000012 0.000013 0.000049', '0.000017', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']
ReductionAtomic_int64
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/atomic_reduction --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/ReductionAtomic_fp64.csv
Output:
['ReductionAtomic_int64', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000046', '0.000010', '0.000009', '0.000009 0.000010 0.000010 0.000012 0.000190', '0.000081', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']
ReductionAtomic_fp32
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/atomic_reduction --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/ReductionAtomic_fp64.csv
Output:
['ReductionAtomic_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '3072', '0.000021', '0.000020', '0.000020', '0.000020 0.000020 0.000020 0.000020 0.000025', '0.000002', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']
Kmeans_fp32
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/kmeans --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Kmeans.csv --size=700000000
Output:
['Kmeans_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '700000000', '0.016169', '0.016170', '0.016159', '0.016159 0.016160 0.016170 0.016173 0.016182', '0.000010', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']
LinearRegressionCoeff_fp32
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/lin_reg_coeff --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/LinearRegressionCoeff.csv --size=1638400000
Output:
['LinearRegressionCoeff_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '1638400000', '0.966762', '0.966801', '0.966601', '0.966601 0.966659 0.966801 0.966835 0.966914', '0.000129', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']
LinearRegression_fp32
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/lin_reg_error --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/LinearRegression.csv --size=4096
Output:
['LinearRegression_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '4096', '0.000432', '0.000427', '0.000420', '0.000420 0.000425 0.000427 0.000430 0.000459', '0.000015', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']
MolecularDynamics
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/mol_dyn --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/MolecularDynamics.csv --size=8196
Output:
['MolecularDynamics', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '8196', '0.000033', '0.000027', '0.000025', '0.000025 0.000026 0.000027 0.000029 0.000059', '0.000014', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']
Compute Benchmarks level_zero run (with params: ): https://github.com/oneapi-src/unified-runtime/actions/runs/11161331244
Compute Benchmarks level_zero run (): https://github.com/oneapi-src/unified-runtime/actions/runs/11161331244 Job status: failure. Test status: failure.
Compute Benchmarks level_zero run (with params: ): https://github.com/oneapi-src/unified-runtime/actions/runs/11161403971
Compute Benchmarks level_zero run (): https://github.com/oneapi-src/unified-runtime/actions/runs/11161403971 Job status: success. Test status: success.
Summary
Total 70 benchmarks in mean. Geomean 100.399%. Improved 12 Regressed 24 (threshold 0.50%)
(result is better)
Performance change in benchmark groups
Relative perf in group api (6): 100.313%
| Benchmark | This PR | baseline | Relative perf | Change | - |
|---|---|---|---|---|---|
| api_overhead_benchmark_ur SubmitKernel out of order | 14.440000 μs | 17.647 μs | 122.21% | 22.21% | ++ |
| api_overhead_benchmark_sycl SubmitKernel in order | 24.939000 μs | 25.333 μs | 101.58% | 1.58% | . |
| api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024 | 2.139000 μs | 2.157 μs | 100.84% | 0.84% | . |
| api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024 | 1.673 μs | 1.663000 μs | 99.40% | -0.60% | . |
| api_overhead_benchmark_sycl SubmitKernel out of order | 26.076 μs | 25.430000 μs | 97.52% | -2.48% | . |
| api_overhead_benchmark_ur SubmitKernel in order | 15.752 μs | 13.226000 μs | 83.96% | -16.04% | -- |
Relative perf in group memory (4): 96.421%
| Benchmark | This PR | baseline | Relative perf | Change | - |
|---|---|---|---|---|---|
| memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240 | 3.195000 μs | 3.233 μs | 101.19% | 1.19% | . |
| memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024 | 5.790 μs | 5.745000 μs | 99.22% | -0.78% | . |
| memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024 | 119.515 μs | 113.628000 μs | 95.07% | -4.93% | . |
| memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024 | 250.058 μs | 226.426000 μs | 90.55% | -9.45% | - |
Relative perf in group miscellaneous (1): 99.497%
| Benchmark | This PR | baseline | Relative perf | Change | - |
|---|---|---|---|---|---|
| miscellaneous_benchmark_sycl VectorSum | 862.496 μs | 858.160000 μs | 99.50% | -0.50% | . |
Relative perf in group Velocity-Bench (5): 99.878%
| Benchmark | This PR | baseline | Relative perf | Change | - |
|---|---|---|---|---|---|
| Velocity-Bench Bitcracker | 35.424300 s | 35.563 s | 100.39% | 0.39% | . |
| Velocity-Bench Hashtable | 361.727553 M keys/sec | 361.439 M keys/sec | 100.08% | 0.08% | . |
| Velocity-Bench QuickSilver | 118.170 MMS/CTT | 118.210000 MMS/CTT | 99.97% | -0.03% | . |
| Velocity-Bench CudaSift | 219.701 ms | 218.822000 ms | 99.60% | -0.40% | . |
| Velocity-Bench Sobel Filter | 555.433 ms | 551.852000 ms | 99.36% | -0.64% | . |
Relative perf in group Runtime (16): 97.040%
| Benchmark | This PR | baseline | Relative perf | Change | - |
|---|---|---|---|---|---|
| Runtime_DAGTaskThroughput_NDRangeParallelFor | 1778.195000 ms | 1794.983 ms | 100.94% | 0.94% | . |
| Runtime_IndependentDAGTaskThroughput_SingleTask | 270.339000 ms | 271.280 ms | 100.35% | 0.35% | . |
| Runtime_IndependentDAGTaskThroughput_NDRangeParallelFor | 277.160 ms | 273.134000 ms | 98.55% | -1.45% | . |
| Runtime_IndependentDAGTaskThroughput_HierarchicalParallelFor | 280.121 ms | 275.897000 ms | 98.49% | -1.51% | . |
| Runtime_IndependentDAGTaskThroughput_BasicParallelFor | 285.659 ms | 276.025000 ms | 96.63% | -3.37% | . |
| Runtime_DAGTaskThroughput_SingleTask | 1749.395 ms | 1649.814000 ms | 94.31% | -5.69% | - |
| Runtime_DAGTaskThroughput_HierarchicalParallelFor | 1802.311 ms | 1696.627000 ms | 94.14% | -5.86% | - |
| Runtime_DAGTaskThroughput_BasicParallelFor | 1826.810 ms | 1703.181000 ms | 93.23% | -6.77% | - |
| Runtime_BlockedTransform_iter_256_blocksize_1024 | - | 0.076000 ms | |||
| Runtime_BlockedTransform_iter_64_blocksize_2048 | - | 0.072000 ms | |||
| Runtime_BlockedTransform_iter_512_blocksize_1024 | - | 0.173000 ms | |||
| Runtime_BlockedTransform_iter_256_blocksize_2048 | - | 0.061000 ms | |||
| Runtime_BlockedTransform_iter_512_blocksize_2048 | - | 0.169000 ms | |||
| Runtime_BlockedTransform_iter_128_blocksize_1024 | - | 0.088000 ms | |||
| Runtime_BlockedTransform_iter_64_blocksize_1024 | - | 0.241000 ms | |||
| Runtime_BlockedTransform_iter_128_blocksize_2048 | - | 0.062000 ms |
Relative perf in group MicroBench (16): 99.592%
| Benchmark | This PR | baseline | Relative perf | Change | - |
|---|---|---|---|---|---|
| MicroBench_HostDeviceBandwidth_1D_D2H_Contiguous | 5.006000 ms | 5.126 ms | 102.40% | 2.40% | . |
| MicroBench_HostDeviceBandwidth_3D_H2D_Contiguous | 4.932000 ms | 5.001 ms | 101.40% | 1.40% | . |
| MicroBench_HostDeviceBandwidth_2D_H2D_Contiguous | 4.996000 ms | 5.022 ms | 100.52% | 0.52% | . |
| MicroBench_LocalMem_int32_4096 | 30.346000 ms | 30.379 ms | 100.11% | 0.11% | . |
| MicroBench_LocalMem_fp32_4096 | 30.416000 ms | 30.433 ms | 100.06% | 0.06% | . |
| MicroBench_HostDeviceBandwidth_2D_D2H_Contiguous | 618.337000 ms | 618.361 ms | 100.00% | 0.00% | . |
| MicroBench_HostDeviceBandwidth_3D_D2H_Contiguous | 618.388 ms | 618.374000 ms | 100.00% | -0.00% | . |
| MicroBench_HostDeviceBandwidth_3D_D2H_Strided | 617.690 ms | 617.648000 ms | 99.99% | -0.01% | . |
| MicroBench_HostDeviceBandwidth_2D_D2H_Strided | 617.668 ms | 617.440000 ms | 99.96% | -0.04% | . |
| MicroBench_HostDeviceBandwidth_1D_H2D_Contiguous | 4.917 ms | 4.903000 ms | 99.72% | -0.28% | . |
| MicroBench_HostDeviceBandwidth_2D_H2D_Strided | 4.960 ms | 4.906000 ms | 98.91% | -1.09% | . |
| MicroBench_HostDeviceBandwidth_1D_H2D_Strided | 4.778 ms | 4.693000 ms | 98.22% | -1.78% | . |
| MicroBench_HostDeviceBandwidth_3D_H2D_Strided | 5.011 ms | 4.917000 ms | 98.12% | -1.88% | . |
| MicroBench_HostDeviceBandwidth_1D_D2H_Strided | 5.127 ms | 4.874000 ms | 95.07% | -4.93% | . |
| MicroBench_Arith_fp32_512 | - | 0.019000 ms | |||
| MicroBench_Arith_int32_512 | - | 0.037000 ms |
Relative perf in group Pattern (10): 99.742%
| Benchmark | This PR | baseline | Relative perf | Change | - |
|---|---|---|---|---|---|
| Pattern_SegmentedReduction_NDRange_int32 | 5.714000 ms | 5.720 ms | 100.11% | 0.11% | . |
| Pattern_SegmentedReduction_NDRange_int64 | 6.188000 ms | 6.193 ms | 100.08% | 0.08% | . |
| Pattern_SegmentedReduction_Hierarchical_int16 | 12.214000 ms | 12.216 ms | 100.02% | 0.02% | . |
| Pattern_SegmentedReduction_Hierarchical_int32 | 12.055000 ms | 12.056 ms | 100.01% | 0.01% | . |
| Pattern_SegmentedReduction_Hierarchical_int64 | 12.256000 ms | 12.256 ms | 100.00% | 0.00% | . |
| Pattern_SegmentedReduction_Hierarchical_fp32 | 12.054 ms | 12.051000 ms | 99.98% | -0.02% | . |
| Pattern_SegmentedReduction_NDRange_int16 | 6.083 ms | 6.078000 ms | 99.92% | -0.08% | . |
| Pattern_SegmentedReduction_NDRange_fp32 | 5.720 ms | 5.715000 ms | 99.91% | -0.09% | . |
| Pattern_Reduction_NDRange_int32 | 16.443 ms | 16.333000 ms | 99.33% | -0.67% | . |
| Pattern_Reduction_Hierarchical_int32 | 16.519 ms | 16.204000 ms | 98.09% | -1.91% | . |
Relative perf in group ScalarProduct (6): 100.022%
| Benchmark | This PR | baseline | Relative perf | Change | - |
|---|---|---|---|---|---|
| ScalarProduct_NDRange_fp32 | 6.322000 ms | 6.352 ms | 100.47% | 0.47% | . |
| ScalarProduct_Hierarchical_fp32 | 10.239000 ms | 10.263 ms | 100.23% | 0.23% | . |
| ScalarProduct_NDRange_int64 | 8.222000 ms | 8.233 ms | 100.13% | 0.13% | . |
| ScalarProduct_Hierarchical_int32 | 10.605 ms | 10.595000 ms | 99.91% | -0.09% | . |
| ScalarProduct_NDRange_int32 | 6.340 ms | 6.330000 ms | 99.84% | -0.16% | . |
| ScalarProduct_Hierarchical_int64 | 11.610 ms | 11.557000 ms | 99.54% | -0.46% | . |
Relative perf in group USM (7): 110.796%
| Benchmark | This PR | baseline | Relative perf | Change | - |
|---|---|---|---|---|---|
| USM_Allocation_latency_fp32_device | 0.071000 ms | 0.145 ms | 204.23% | 104.23% | ++++++++++ |
| USM_Allocation_latency_fp32_shared | 0.134000 ms | 0.137 ms | 102.24% | 2.24% | . |
| USM_Instr_Mix_fp32_device_1:1mix_with_init_no_prefetch | 1.650 ms | 1.649000 ms | 99.94% | -0.06% | . |
| USM_Instr_Mix_fp32_host_1:1mix_no_init_no_prefetch | 1.193 ms | 1.192000 ms | 99.92% | -0.08% | . |
| USM_Allocation_latency_fp32_host | 37.444 ms | 37.346000 ms | 99.74% | -0.26% | . |
| USM_Instr_Mix_fp32_host_1:1mix_with_init_no_prefetch | 1.042 ms | 1.035000 ms | 99.33% | -0.67% | . |
| USM_Instr_Mix_fp32_device_1:1mix_no_init_no_prefetch | 1.815 ms | 1.801000 ms | 99.23% | -0.77% | . |
Relative perf in group VectorAddition (3): 99.926%
| Benchmark | This PR | baseline | Relative perf | Change | - |
|---|---|---|---|---|---|
| VectorAddition_int64 | 3.067000 ms | 3.075 ms | 100.26% | 0.26% | . |
| VectorAddition_fp32 | 1.447000 ms | 1.449 ms | 100.14% | 0.14% | . |
| VectorAddition_int32 | 1.456 ms | 1.447000 ms | 99.38% | -0.62% | . |
Relative perf in group Polybench (4): 99.520%
| Benchmark | This PR | baseline | Relative perf | Change | - |
|---|---|---|---|---|---|
| Polybench_2mm | 1.215000 ms | 1.223 ms | 100.66% | 0.66% | . |
| Polybench_3mm | 1.729 ms | 1.725000 ms | 99.77% | -0.23% | . |
| Polybench_Atax | 6.863 ms | 6.736000 ms | 98.15% | -1.85% | . |
| Polybench_2DConvolution | - | 0.194000 ms |
Relative perf in group Kmeans (1): 100.000%
| Benchmark | This PR | baseline | Relative perf | Change | - |
|---|---|---|---|---|---|
| Kmeans_fp32 | 16.170000 ms | 16.170 ms | 100.00% | 0.00% | . |
Relative perf in group LinearRegressionCoeff (1): 100.006%
| Benchmark | This PR | baseline | Relative perf | Change | - |
|---|---|---|---|---|---|
| LinearRegressionCoeff_fp32 | 966.740000 ms | 966.801 ms | 100.01% | 0.01% | . |
Relative perf in group MolecularDynamics (1): 103.846%
| Benchmark | This PR | baseline | Relative perf | Change | - |
|---|---|---|---|---|---|
| MolecularDynamics | 0.026000 ms | 0.027 ms | 103.85% | 3.85% | . |
Relative perf in group ReductionAtomic (4): cannot calculate
| Benchmark | This PR | baseline | Relative perf | Change | - |
|---|---|---|---|---|---|
| ReductionAtomic_fp64 | - | 0.020000 ms | |||
| ReductionAtomic_int32 | - | 0.012000 ms | |||
| ReductionAtomic_int64 | - | 0.010000 ms | |||
| ReductionAtomic_fp32 | - | 0.020000 ms |
Relative perf in group LinearRegression (1): cannot calculate
| Benchmark | This PR | baseline | Relative perf | Change | - |
|---|---|---|---|---|---|
| LinearRegression_fp32 | - | 0.427000 ms |
Details
Benchmark details - environment, command, output...
api_overhead_benchmark_sycl SubmitKernel out of order
Environment Variables:
Command:
/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1
Output:
TestCase,Mean,Median,StdDev,Min,Max,Type SubmitKernel(api=sycl Profiling=0 Ioq=0 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),26.076,26.058,3.34%,24.392,276.210,[CPU],[us]
api_overhead_benchmark_sycl SubmitKernel in order
Environment Variables:
Command:
/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1
Output:
TestCase,Mean,Median,StdDev,Min,Max,Type SubmitKernel(api=sycl Profiling=0 Ioq=1 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),24.939,25.113,4.41%,21.793,274.068,[CPU],[us]
api_overhead_benchmark_ur SubmitKernel out of order
Environment Variables:
Command:
/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1
Output:
TestCase,Mean,Median,StdDev,Min,Max,Type SubmitKernel(api=ur Profiling=0 Ioq=0 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),14.440,14.347,3.36%,13.726,32.785,[CPU],[us]
api_overhead_benchmark_ur SubmitKernel in order
Environment Variables:
Command:
/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1
Output:
TestCase,Mean,Median,StdDev,Min,Max,Type SubmitKernel(api=ur Profiling=0 Ioq=1 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),15.752,15.817,7.30%,12.526,243.328,[CPU],[us]
memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024
Environment Variables:
Command:
/home/test-user/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueInOrderMemcpy --csv --noHeaders --iterations=10000 --IsCopyOnly=0 --sourcePlacement=Device --destinationPlacement=Device --size=1024 --count=100
Output:
TestCase,Mean,Median,StdDev,Min,Max,Type QueueInOrderMemcpy(api=sycl IsCopyOnly=0 sourcePlacement=Device destinationPlacement=Device size=1KB count=100),250.058,252.295,3.03%,220.259,508.610,[CPU],[us]
memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024
Environment Variables:
Command:
/home/test-user/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueInOrderMemcpy --csv --noHeaders --iterations=10000 --IsCopyOnly=0 --sourcePlacement=Host --destinationPlacement=Device --size=1024 --count=100
Output:
TestCase,Mean,Median,StdDev,Min,Max,Type QueueInOrderMemcpy(api=sycl IsCopyOnly=0 sourcePlacement=Host destinationPlacement=Device size=1KB count=100),119.515,112.294,25.82%,109.653,302.517,[CPU],[us]
memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024
Environment Variables:
Command:
/home/test-user/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueMemcpy --csv --noHeaders --iterations=10000 --sourcePlacement=Device --destinationPlacement=Device --size=1024
Output:
TestCase,Mean,Median,StdDev,Min,Max,Type QueueMemcpy(api=sycl sourcePlacement=Device destinationPlacement=Device size=1KB),5.790,5.483,12.24%,5.061,39.117,[CPU],[us]
memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240
Environment Variables:
Command:
/home/test-user/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=StreamMemory --csv --noHeaders --iterations=10000 --type=Triad --size=10240 --memoryPlacement=Device --useEvents=0 --contents=Zeros
Output:
TestCase,Mean,Median,StdDev,Min,Max,Type StreamMemory(api=sycl type=Triad size=10KB useEvents=0 contents=Zeros memoryPlacement=Device),3.195,3.203,2.88%,0.374,3.427,[CPU],[GB/s]
api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024
Environment Variables:
Command:
/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=ExecImmediateCopyQueue --csv --noHeaders --iterations=100000 --ioq=0 --IsCopyOnly=1 --MeasureCompletionTime=0 --src=Device --dst=Device --size=1024
Output:
TestCase,Mean,Median,StdDev,Min,Max,Type ExecImmediateCopyQueue(api=sycl IsCopyOnly=1 MeasureCompletionTime=0 src=Device dst=Device size=1KB ioq=0),2.139,2.136,4.07%,1.942,9.274,[CPU],[us]
api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024
Environment Variables:
Command:
/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=ExecImmediateCopyQueue --csv --noHeaders --iterations=100000 --ioq=1 --IsCopyOnly=1 --MeasureCompletionTime=0 --src=Host --dst=Host --size=1024
Output:
TestCase,Mean,Median,StdDev,Min,Max,Type ExecImmediateCopyQueue(api=sycl IsCopyOnly=1 MeasureCompletionTime=0 src=Host dst=Host size=1KB ioq=1),1.673,1.667,5.87%,1.584,26.452,[CPU],[us]
miscellaneous_benchmark_sycl VectorSum
Environment Variables:
Command:
/home/test-user/bench_workdir/compute-benchmarks-build/bin/miscellaneous_benchmark_sycl --test=VectorSum --csv --noHeaders --iterations=1000 --numberOfElementsX=512 --numberOfElementsY=256 --numberOfElementsZ=256
Output:
TestCase,Mean,Median,StdDev,Min,Max,Type VectorSum(api=sycl numberOfElementsX=512 numberOfElementsY=256 numberOfElementsZ=256),862.496,863.026,0.48%,814.955,873.207,[GPU],bw [GB/s]
Velocity-Bench Hashtable
Environment Variables:
Command:
/home/test-user/bench_workdir/hashtable/hashtable_sycl --no-verify
Output:
hashtable - total time for whole calculation: 0.371046 s 361.727553 million keys/second
Velocity-Bench Bitcracker
Environment Variables:
Command:
/home/test-user/bench_workdir/bitcracker/bitcracker -f /home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/img_win8_user_hash.txt -d /home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/user_passwords_60000.txt -b 60000
Output:
---------> BitCracker: BitLocker password cracking tool <---------
================================== Retrieving Info
Reading hash file "/home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/img_win8_user_hash.txt"
Attack
================================================ Type of attack: User Password Psw per thread: 1 max_num_pswd_per_read: 60000 Dictionary: /home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/user_passwords_60000.txt MAC Comparison (-m): Yes
Iter: 1, num passwords read: 60000 Kernel execution: Effective passwords: 60000 Passwords Range: npknpByH7N2m3OnLNH1X9DJxLrzIFWk ..... dL_7uuf3QCz-c6K3xDu0
================================================ Bitcracker attack completed Total passwords evaluated: 60000 Password not found!
time to subtract from total: 0.00432567 s bitcracker - total time for whole calculation: 35.4243 s
Velocity-Bench CudaSift
Environment Variables:
Command:
/home/test-user/bench_workdir/cudaSift/cudaSift
Output:
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1225 1260 33.2609% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1229 1263 33.3695% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1216 1250 33.0166% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1244 1279 33.7768% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1227 1261 33.3152% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1239 1274 33.6411% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1226 1257 33.2881% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1242 1275 33.7225% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1095 1252 29.7312% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1228 1264 33.3424% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1103 1266 29.9484% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1222 1255 33.1795% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1090 1255 29.5954% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1220 1262 33.1252% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1175 1271 31.9033% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1224 1260 33.2338% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1208 1261 32.7993% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1120 1258 30.41% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1236 1267 33.5596% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1170 1265 31.7676% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1088 1268 29.5411% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1216 1252 33.0166% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1097 1241 29.7855% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1231 1267 33.4238% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1214 1247 32.9623% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1205 1271 32.7179% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1111 1268 30.1656% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1225 1261 33.2609% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1226 1273 33.2881% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1230 1263 33.3967% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1230 1262 33.3967% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1109 1267 30.1113% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1229 1264 33.3695% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1122 1258 30.4643% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1176 1258 31.9305% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1235 1272 33.5324% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1224 1263 33.2338% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1085 1269 29.4597% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1218 1260 33.0709% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1229 1270 33.3695% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1068 1259 28.9981% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1232 1268 33.451% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1218 1263 33.0709% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1187 1250 32.2292% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1105 1263 30.0027% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1232 1268 33.451% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1110 1265 30.1385% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1087 1252 29.514% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1197 1272 32.5007% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1225 1261 33.2609% 1 2
Performing data verification Data verification is SUCCESSFUL.
Avg workload time = 219.701 ms
Velocity-Bench QuickSilver
Environment Variables:
QS_DEVICE=GPU
Command:
/home/test-user/bench_workdir/QuickSilver/qs -i /home/test-user/bench_workdir/velocity-bench-repo/QuickSilver/Examples/AllScattering/scatteringOnly.inp
Output:
Copyright (c) 2016 Lawrence Livermore National Security, LLC All Rights Reserved Quicksilver Version : Quicksilver Git Hash : MPI Version : 3.0 Number of MPI ranks : 1 Number of OpenMP Threads: 1 Number of OpenMP CPUs : 1
Loading params Finished loading params Simulation: dt: 1e-08 fMax: 0.1 inputFile: /home/test-user/bench_workdir/velocity-bench-repo/QuickSilver/Examples/AllScattering/scatteringOnly.inp energySpectrum: boundaryCondition: octant loadBalance: 1 cycleTimers: 0 debugThreads: 0 lx: 100 ly: 100 lz: 100 nParticles: 10000000 batchSize: 0 nBatches: 10 nSteps: 10 nx: 10 ny: 10 nz: 10 seed: 1029384756 xDom: 0 yDom: 0 zDom: 0 eMax: 20 eMin: 1e-09 nGroups: 230 lowWeightCutoff: 0.001 bTally: 1 fTally: 1 cTally: 1 coralBenchmark: 0 crossSectionsOut:
Geometry: material: sourceMaterial shape: brick xMax: 100 xMin: 0 yMax: 100 yMin: 0 zMax: 100 zMin: 0
Material: name: sourceMaterial mass: 1000 nIsotopes: 10 nReactions: 9 sourceRate: 1e+10 totalCrossSection: 0.1 absorptionCrossSection: flat fissionCrossSection: flat scatteringCrossSection: flat absorptionCrossSectionRatio: 0 fissionCrossSectionRatio: 0 scatteringCrossSectionRatio: 1
CrossSection: name: flat A: 0 B: 0 C: 0 D: 0 E: 1 nuBar: 2.4 setting GPU setting parameters Building partition 0 Building partition 1 Building partition 2 Building partition 3 Building MC_Domain 0 Building MC_Domain 1 Building MC_Domain 2 Building MC_Domain 3 Starting Consistency Check Finished Consistency Check Finished initMesh Started copyMaterialDatabase_device Finished copyMaterialDatabase_device Finished copyNuclearData_device Finished copyDomainDevice cycle start source rr split absorb scatter fission produce collisn escape census num_seg scalar_flux cycleInit cycleTracking cycleFinalize 0 0 1000000 0 9000000 0 18533189 0 0 18533189 1151780 8848220 55527935 1.854923e+09 4.327580e-01 6.169830e-01 0.000000e+00 1 8848220 1000000 0 151478 0 34281997 0 0 34281997 1664159 8335539 94633679 5.047651e+09 3.678780e-01 7.578090e-01 0.000000e+00 2 8335539 1000000 0 663717 0 34354432 0 0 34354432 1366771 8632485 95010375 7.705930e+09 3.640840e-01 7.712350e-01 0.000000e+00 3 8632485 1000000 0 367978 0 34302727 0 0 34302727 1242216 8758247 94953591 9.992076e+09 3.956240e-01 8.197900e-01 0.000000e+00 4 8758247 1000000 0 242076 0 34141236 0 0 34141236 1168452 8831871 94599337 1.199834e+10 3.341700e-01 7.922870e-01 0.000000e+00 5 8831871 1000000 0 168070 0 33948724 0 0 33948724 1121156 8878785 94148236 1.377636e+10 3.332440e-01 7.666780e-01 0.000000e+00 6 8878785 1000000 0 120572 0 33760567 0 0 33760567 1089103 8910254 93689264 1.535668e+10 3.321390e-01 7.659400e-01 0.000000e+00 7 8910254 1000000 0 89810 0 33552179 0 0 33552179 1065203 8934861 93216931 1.676993e+10 3.326710e-01 7.865830e-01 0.000000e+00 8 8934861 1000000 0 65491 0 33384605 0 0 33384605 1047720 8952632 92768273 1.804559e+10 3.321060e-01 7.858210e-01 0.000000e+00 9 8952632 1000000 0 47165 0 33198494 0 0 33198494 1033968 8965829 92324678 1.920208e+10 3.324400e-01 7.605160e-01 0.000000e+00
Timer Cumulative Cumulative Cumulative Cumulative Cumulative Cumulative Name number microSecs microSecs microSecs microSecs Efficiency of calls min avg max stddev Rating main 1 1.118e+07 1.118e+07 1.118e+07 0.000e+00 100.00 cycleInit 10 3.557e+06 3.557e+06 3.557e+06 0.000e+00 100.00 cycleTracking 10 7.624e+06 7.624e+06 7.624e+06 0.000e+00 100.00 cycleTracking_Kernel 104 4.940e+06 4.940e+06 4.940e+06 0.000e+00 100.00 cycleTracking_MPI 117 2.192e+05 2.192e+05 2.192e+05 0.000e+00 100.00 cycleTracking_Test_Done 0 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.00 cycleFinalize 20 4.080e+02 4.080e+02 4.080e+02 0.000e+00 100.00 Figure Of Merit 118.17 [Num Mega Segments / Cycle Tracking Time]
Velocity-Bench Sobel Filter
Environment Variables:
OPENCV_IO_MAX_IMAGE_PIXELS=1677721600
Command:
/home/test-user/bench_workdir/sobel_filter/sobel_filter -i /home/test-user/bench_workdir/data/sobel_filter/sobel_filter_data/silverfalls_32Kx32K.png -n 5
Output:
SYMN: Welcome to the SYCL version of Sobel filter workload. SYMN: Input image file: /home/test-user/bench_workdir/data/sobel_filter/sobel_filter_data/silverfalls_32Kx32K.png SYMN: Launching SYCL kernel with # of iterations: 5 time to subtract from total: 7.4896 s sobelfilter - total time for whole calculation: 0.555433 s
Runtime_IndependentDAGTaskThroughput_NDRangeParallelFor
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_independent --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/IndependentDAGTaskThroughput_multi.csv --size=32768
Output:
['Runtime_IndependentDAGTaskThroughput_NDRangeParallelFor', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '32768', '0.279397', '0.277160', '0.275892', '0.275892 0.276733 0.277160 0.280282 0.286919', '0.004521', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']
Runtime_IndependentDAGTaskThroughput_SingleTask
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_independent --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/IndependentDAGTaskThroughput_multi.csv --size=32768
Output:
['Runtime_IndependentDAGTaskThroughput_SingleTask', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '32768', '0.272788', '0.270339', '0.262477', '0.262477 0.268819 0.270339 0.275869 0.286433', '0.008996', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']
Runtime_IndependentDAGTaskThroughput_BasicParallelFor
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_independent --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/IndependentDAGTaskThroughput_multi.csv --size=32768
Output:
['Runtime_IndependentDAGTaskThroughput_BasicParallelFor', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '32768', '0.293907', '0.285659', '0.280444', '0.280444 0.281217 0.285659 0.305152 0.317063', '0.016378', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']
Runtime_IndependentDAGTaskThroughput_HierarchicalParallelFor
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_independent --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/IndependentDAGTaskThroughput_multi.csv --size=32768
Output:
['Runtime_IndependentDAGTaskThroughput_HierarchicalParallelFor', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '32768', '0.280812', '0.280121', '0.276607', '0.276607 0.279503 0.280121 0.280673 0.287157', '0.003878', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']
Runtime_DAGTaskThroughput_BasicParallelFor
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_sequential --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/DAGTaskThroughput_multi.csv --size=327680
Output:
['Runtime_DAGTaskThroughput_BasicParallelFor', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '327680', '1.827242', '1.826810', '1.826309', '1.826309 1.826335 1.826810 1.827326 1.829432', '0.001293', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']
Runtime_DAGTaskThroughput_SingleTask
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_sequential --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/DAGTaskThroughput_multi.csv --size=327680
Output:
['Runtime_DAGTaskThroughput_SingleTask', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '327680', '1.745794', '1.749395', '1.731069', '1.731069 1.739506 1.749395 1.753183 1.755815', '0.010300', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']
Runtime_DAGTaskThroughput_HierarchicalParallelFor
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_sequential --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/DAGTaskThroughput_multi.csv --size=327680
Output:
['Runtime_DAGTaskThroughput_HierarchicalParallelFor', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '327680', '1.803061', '1.802311', '1.800942', '1.800942 1.801165 1.802311 1.802992 1.807894', '0.002829', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']
Runtime_DAGTaskThroughput_NDRangeParallelFor
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_sequential --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/DAGTaskThroughput_multi.csv --size=327680
Output:
['Runtime_DAGTaskThroughput_NDRangeParallelFor', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '327680', '1.778789', '1.778195', '1.777463', '1.777463 1.777877 1.778195 1.779784 1.780624', '0.001351', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']
MicroBench_HostDeviceBandwidth_3D_D2H_Strided
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512
Output:
['MicroBench_HostDeviceBandwidth_3D_D2H_Strided', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.617635', '0.617690', '0.617454', '0.617454 0.617636 0.617690 0.617691 0.617705', '0.000104', '0.202444', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']
MicroBench_HostDeviceBandwidth_2D_D2H_Strided
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512
Output:
['MicroBench_HostDeviceBandwidth_2D_D2H_Strided', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.617669', '0.617668', '0.617628', '0.617628 0.617667 0.617668 0.617677 0.617703', '0.000027', '0.202387', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']
MicroBench_HostDeviceBandwidth_3D_D2H_Contiguous
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512
Output:
['MicroBench_HostDeviceBandwidth_3D_D2H_Contiguous', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.618356', '0.618388', '0.618162', '0.618162 0.618369 0.618388 0.618395 0.618465', '0.000114', '0.202212', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']
MicroBench_HostDeviceBandwidth_3D_H2D_Strided
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512
Output:
['MicroBench_HostDeviceBandwidth_3D_H2D_Strided', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.005022', '0.005011', '0.004993', '0.004993 0.005006 0.005011 0.005035 0.005063', '0.000028', '25.037185', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']
MicroBench_HostDeviceBandwidth_1D_D2H_Contiguous
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512
Output:
['MicroBench_HostDeviceBandwidth_1D_D2H_Contiguous', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.005027', '0.005006', '0.004972', '0.004972 0.004974 0.005006 0.005061 0.005122', '0.000064', '25.140733', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']
MicroBench_HostDeviceBandwidth_2D_D2H_Contiguous
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512
Output:
['MicroBench_HostDeviceBandwidth_2D_D2H_Contiguous', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.618361', '0.618337', '0.618297', '0.618297 0.618333 0.618337 0.618413 0.618424', '0.000055', '0.202168', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']
MicroBench_HostDeviceBandwidth_2D_H2D_Strided
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512
Output:
['MicroBench_HostDeviceBandwidth_2D_H2D_Strided', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.004976', '0.004960', '0.004918', '0.004918 0.004951 0.004960 0.004984 0.005067', '0.000056', '25.416025', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']
MicroBench_HostDeviceBandwidth_2D_H2D_Contiguous
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512
Output:
['MicroBench_HostDeviceBandwidth_2D_H2D_Contiguous', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.005037', '0.004996', '0.004926', '0.004926 0.004951 0.004996 0.005101 0.005212', '0.000118', '25.375316', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']
MicroBench_HostDeviceBandwidth_1D_D2H_Strided
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512
Output:
['MicroBench_HostDeviceBandwidth_1D_D2H_Strided', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.005131', '0.005127', '0.005077', '0.005077 0.005119 0.005127 0.005137 0.005194', '0.000042', '24.621639', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']
MicroBench_HostDeviceBandwidth_1D_H2D_Contiguous
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512
Output:
['MicroBench_HostDeviceBandwidth_1D_H2D_Contiguous', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.005287', '0.004917', '0.004864', '0.004864 0.004883 0.004917 0.004943 0.006828', '0.000862', '25.696525', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']
MicroBench_HostDeviceBandwidth_3D_H2D_Contiguous
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512
Output:
['MicroBench_HostDeviceBandwidth_3D_H2D_Contiguous', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.004918', '0.004932', '0.004837', '0.004837 0.004924 0.004932 0.004946 0.004951', '0.000047', '25.841455', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']
MicroBench_HostDeviceBandwidth_1D_H2D_Strided
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512
Output:
['MicroBench_HostDeviceBandwidth_1D_H2D_Strided', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.004800', '0.004778', '0.004734', '0.004734 0.004773 0.004778 0.004793 0.004924', '0.000073', '26.404330', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']
MicroBench_LocalMem_fp32_4096
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/local_mem --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/LocalMem_multi.csv --size=10240000
Output:
['MicroBench_LocalMem_fp32_4096', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '10240000', '0.030403', '0.030416', '0.030339', '0.030339 0.030404 0.030416 0.030425 0.030433', '0.000038', '10283.962283', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '312.000000']
MicroBench_LocalMem_int32_4096
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/local_mem --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/LocalMem_multi.csv --size=10240000
Output:
['MicroBench_LocalMem_int32_4096', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '10240000', '0.030375', '0.030346', '0.030327', '0.030327 0.030340 0.030346 0.030400 0.030461', '0.000056', '10288.023099', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '312.000000']
Pattern_Reduction_NDRange_int32
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/reduction --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Pattern_Reduction_multi.csv --size=10240000
Output:
['Pattern_Reduction_NDRange_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '10240000', '0.016395', '0.016443', '0.016182', '0.016182 0.016413 0.016443 0.016466 0.016470', '0.000121', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']
Pattern_Reduction_Hierarchical_int32
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/reduction --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Pattern_Reduction_multi.csv --size=10240000
Output:
['Pattern_Reduction_Hierarchical_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '10240000', '0.016538', '0.016519', '0.016490', '0.016490 0.016495 0.016519 0.016540 0.016648', '0.000065', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']
ScalarProduct_NDRange_int64
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/ScalarProduct_multi.csv --size=102400000
Output:
['ScalarProduct_NDRange_int64', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.008246', '0.008222', '0.008210', '0.008210 0.008215 0.008222 0.008237 0.008344', '0.000056', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']
ScalarProduct_Hierarchical_fp32
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/ScalarProduct_multi.csv --size=102400000
Output:
['ScalarProduct_Hierarchical_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.010237', '0.010239', '0.010205', '0.010205 0.010234 0.010239 0.010243 0.010263', '0.000021', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']
ScalarProduct_NDRange_int32
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/ScalarProduct_multi.csv --size=102400000
Output:
['ScalarProduct_NDRange_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.006349', '0.006340', '0.006326', '0.006326 0.006332 0.006340 0.006362 0.006386', '0.000025', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']
ScalarProduct_Hierarchical_int64
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/ScalarProduct_multi.csv --size=102400000
Output:
['ScalarProduct_Hierarchical_int64', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.011613', '0.011610', '0.011576', '0.011576 0.011590 0.011610 0.011618 0.011669', '0.000035', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']
ScalarProduct_Hierarchical_int32
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/ScalarProduct_multi.csv --size=102400000
Output:
['ScalarProduct_Hierarchical_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.010601', '0.010605', '0.010561', '0.010561 0.010592 0.010605 0.010611 0.010636', '0.000027', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']
ScalarProduct_NDRange_fp32
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/ScalarProduct_multi.csv --size=102400000
Output:
['ScalarProduct_NDRange_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.006316', '0.006322', '0.006298', '0.006298 0.006299 0.006322 0.006324 0.006334', '0.000016', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']
Pattern_SegmentedReduction_Hierarchical_int16
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000
Output:
['Pattern_SegmentedReduction_Hierarchical_int16', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.012218', '0.012214', '0.012211', '0.012211 0.012211 0.012214 0.012220 0.012234', '0.000010', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']
Pattern_SegmentedReduction_NDRange_fp32
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000
Output:
['Pattern_SegmentedReduction_NDRange_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.005716', '0.005720', '0.005699', '0.005699 0.005706 0.005720 0.005720 0.005738', '0.000015', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']
Pattern_SegmentedReduction_NDRange_int64
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000
Output:
['Pattern_SegmentedReduction_NDRange_int64', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.006190', '0.006188', '0.006183', '0.006183 0.006188 0.006188 0.006192 0.006201', '0.000007', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']
Pattern_SegmentedReduction_Hierarchical_int64
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000
Output:
['Pattern_SegmentedReduction_Hierarchical_int64', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.012260', '0.012256', '0.012234', '0.012234 0.012236 0.012256 0.012274 0.012301', '0.000028', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']
Pattern_SegmentedReduction_Hierarchical_fp32
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000
Output:
['Pattern_SegmentedReduction_Hierarchical_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.012063', '0.012054', '0.012045', '0.012045 0.012050 0.012054 0.012061 0.012106', '0.000024', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']
Pattern_SegmentedReduction_Hierarchical_int32
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000
Output:
['Pattern_SegmentedReduction_Hierarchical_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.012057', '0.012055', '0.012035', '0.012035 0.012049 0.012055 0.012071 0.012076', '0.000017', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']
Pattern_SegmentedReduction_NDRange_int32
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000
Output:
['Pattern_SegmentedReduction_NDRange_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.005716', '0.005714', '0.005709', '0.005709 0.005712 0.005714 0.005717 0.005727', '0.000007', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']
Pattern_SegmentedReduction_NDRange_int16
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000
Output:
['Pattern_SegmentedReduction_NDRange_int16', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.006085', '0.006083', '0.006079', '0.006079 0.006081 0.006083 0.006083 0.006098', '0.000008', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']
USM_Allocation_latency_fp32_shared
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/usm_allocation_latency --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/USM_Allocation_latency_multi.csv --size=1024000000
Output:
['USM_Allocation_latency_fp32_shared', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '1024000000', '0.000122', '0.000134', '0.000073', '0.000073 0.000125 0.000134 0.000137 0.000141', '0.000028', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']
USM_Allocation_latency_fp32_device
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/usm_allocation_latency --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/USM_Allocation_latency_multi.csv --size=1024000000
Output:
['USM_Allocation_latency_fp32_device', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '1024000000', '0.000201', '0.000071', '0.000047', '0.000047 0.000062 0.000071 0.000404 0.000420', '0.000193', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']
USM_Allocation_latency_fp32_host
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/usm_allocation_latency --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/USM_Allocation_latency_multi.csv --size=1024000000
Output:
['USM_Allocation_latency_fp32_host', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '1024000000', '0.037429', '0.037444', '0.037221', '0.037221 0.037369 0.037444 0.037519 0.037593', '0.000143', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']
USM_Instr_Mix_fp32_host_1:1mix_no_init_no_prefetch
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/USM_Instr_Mix_multi.csv --size=8192
Output:
['USM_Instr_Mix_fp32_host_1:1mix_no_init_no_prefetch', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '8192', '0.001195', '0.001193', '0.001188', '0.001188 0.001191 0.001193 0.001198 0.001205', '0.000007', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']
USM_Instr_Mix_fp32_device_1:1mix_no_init_no_prefetch
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/USM_Instr_Mix_multi.csv --size=8192
Output:
['USM_Instr_Mix_fp32_device_1:1mix_no_init_no_prefetch', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '8192', '0.001829', '0.001815', '0.001805', '0.001805 0.001809 0.001815 0.001827 0.001889', '0.000035', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']
USM_Instr_Mix_fp32_device_1:1mix_with_init_no_prefetch
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/USM_Instr_Mix_multi.csv --size=8192
Output:
['USM_Instr_Mix_fp32_device_1:1mix_with_init_no_prefetch', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '8192', '0.002297', '0.001650', '0.001647', '0.001647 0.001647 0.001650 0.001664 0.004876', '0.001442', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']
USM_Instr_Mix_fp32_host_1:1mix_with_init_no_prefetch
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/USM_Instr_Mix_multi.csv --size=8192
Output:
['USM_Instr_Mix_fp32_host_1:1mix_with_init_no_prefetch', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '8192', '0.001045', '0.001042', '0.001033', '0.001033 0.001035 0.001042 0.001048 0.001065', '0.000013', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']
VectorAddition_int64
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/vec_add --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/VectorAddition_multi.csv --size=102400000
Output:
['VectorAddition_int64', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.003071', '0.003067', '0.003058', '0.003058 0.003058 0.003067 0.003084 0.003086', '0.000014', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']
VectorAddition_int32
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/vec_add --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/VectorAddition_multi.csv --size=102400000
Output:
['VectorAddition_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.001464', '0.001456', '0.001447', '0.001447 0.001452 0.001456 0.001470 0.001495', '0.000020', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']
VectorAddition_fp32
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/vec_add --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/VectorAddition_multi.csv --size=102400000
Output:
['VectorAddition_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.001468', '0.001447', '0.001446', '0.001446 0.001446 0.001447 0.001459 0.001544', '0.000043', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']
Polybench_2mm
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/2mm --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/2mm.csv --size=512
Output:
['Polybench_2mm', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.001217', '0.001215', '0.001200', '0.001200 0.001214 0.001215 0.001223 0.001232', '0.000012', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']
Polybench_3mm
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/3mm --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/3mm.csv --size=512
Output:
['Polybench_3mm', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.001732', '0.001729', '0.001717', '0.001717 0.001729 0.001729 0.001731 0.001758', '0.000015', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']
Polybench_Atax
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/atax --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Atax.csv --size=8192
Output:
['Polybench_Atax', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '8192', '0.006808', '0.006863', '0.006701', '0.006701 0.006715 0.006863 0.006871 0.006888', '0.000091', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']
Kmeans_fp32
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/kmeans --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Kmeans.csv --size=700000000
Output:
['Kmeans_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '700000000', '0.016170', '0.016170', '0.016164', '0.016164 0.016165 0.016170 0.016170 0.016182', '0.000007', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']
LinearRegressionCoeff_fp32
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/lin_reg_coeff --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/LinearRegressionCoeff.csv --size=1638400000
Output:
['LinearRegressionCoeff_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '1638400000', '0.966747', '0.966740', '0.966649', '0.966649 0.966665 0.966740 0.966798 0.966883', '0.000097', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']
MolecularDynamics
Environment Variables:
Command:
/home/test-user/bench_workdir/sycl-bench-build/mol_dyn --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/MolecularDynamics.csv --size=8196
Output:
['MolecularDynamics', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '8196', '0.000033', '0.000026', '0.000025', '0.000025 0.000025 0.000026 0.000029 0.000059', '0.000015', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']
Compute Benchmarks level_zero run (with params: ): https://github.com/oneapi-src/unified-runtime/actions/runs/11214454134
Compute Benchmarks level_zero run (): https://github.com/oneapi-src/unified-runtime/actions/runs/11214454134 Job status: failure. Test status: failure.
Compute Benchmarks level_zero run (with params: ): https://github.com/oneapi-src/unified-runtime/actions/runs/11216034016
Compute Benchmarks level_zero run (): https://github.com/oneapi-src/unified-runtime/actions/runs/11216034016 Job status: failure. Test status: failure.
Compute Benchmarks level_zero run (with params: ): https://github.com/oneapi-src/unified-runtime/actions/runs/11216232257
Compute Benchmarks level_zero run (): https://github.com/oneapi-src/unified-runtime/actions/runs/11216232257 Job status: failure. Test status: failure.
Compute Benchmarks level_zero run (with params: ): https://github.com/oneapi-src/unified-runtime/actions/runs/11216678935
Compute Benchmarks level_zero run (): https://github.com/oneapi-src/unified-runtime/actions/runs/11216678935 Job status: failure. Test status: failure.