dpcpp sample iso3dfd run on the incorrect device on tigerlake
Summary
Provide a short summary of the issue. Sections below provide guidance on what factors are considered important to reproduce an issue.
https://github.com/oneapi-src/oneAPI-samples/blob/4bed52e76ceb17243a0bc4ce24e9aed52aaa6e49/DirectProgramming/DPC%2B%2B/StructuredGrids/iso3dfd_dpcpp/src/iso3dfd.cpp#L282-L288
- the pattern not works on
tigerlake
Version
Report oneAPI Toolkit version and oneAPI Sample version or hash.
- 2021.4
Environment
Provide OS information and hardware information if applicable.
- Ubuntu 20.04
Steps to reproduce
Please check that the issue is reproducible with the latest revision on master. Include all the steps to reproduce the issue.
- Just run on
tigerlake
Observed behavior
Document behavior you observe. For performance defects, like performance regressions or a function being slow, provide a log if possible.
-
make runshould run on the gpu, but on the cpu -
make run_cpushould run on the cpu, but on the gpu
$ make run
Grid Sizes: 256 256 256
Memory Usage: 230 MB
***** Running C++ Serial variant *****
Initializing ...
--------------------------------------
time : 1.91385 secs
throughput : 87.6621 Mpts/s
flops : 5.34739 GFlops
bytes : 1.05195 GBytes/s
--------------------------------------
--------------------------------------
***** Running SYCL variant *****
Initializing ...
Running on 11th Gen Intel(R) Core(TM) i7-1185G7E @ 2.80GHz
The Device Max Work Group Size is : 8192
The Device Max EUCount is : 8
The blockSize x is : 32
The blockSize y is : 8
Using Global Memory Kernel
--------------------------------------
time : 0.657646 secs
throughput : 255.11 Mpts/s
flops : 15.5617 GFlops
bytes : 3.06132 GBytes/s
--------------------------------------
--------------------------------------
Final wavefields from SYCL device and CPU are equivalent: Success
$ make run_cpu
Scanning dependencies of target run_cpu
Grid Sizes: 256 256 256
Memory Usage: 230 MB
***** Running C++ Serial variant *****
Initializing ...
--------------------------------------
time : 1.7656 secs
throughput : 95.0226 Mpts/s
flops : 5.79638 GFlops
bytes : 1.14027 GBytes/s
--------------------------------------
--------------------------------------
***** Running SYCL variant *****
Initializing ...
Running on Intel(R) Iris(R) Xe Graphics [0x9a49]
The Device Max Work Group Size is : 512
The Device Max EUCount is : 96
The blockSize x is : 256
The blockSize y is : 1
Using Global Memory Kernel
--------------------------------------
time : 0.505061 secs
throughput : 332.182 Mpts/s
flops : 20.2631 GFlops
bytes : 3.98618 GBytes/s
--------------------------------------
--------------------------------------
Final wavefields from SYCL device and CPU are equivalent: Success
Expected behavior
Document behavior you expect.
- run on the correct device