oneAPI-samples icon indicating copy to clipboard operation
oneAPI-samples copied to clipboard

dpcpp sample iso3dfd run on the incorrect device on tigerlake

Open junxnone opened this issue 4 years ago • 0 comments

Summary

Provide a short summary of the issue. Sections below provide guidance on what factors are considered important to reproduce an issue.

https://github.com/oneapi-src/oneAPI-samples/blob/4bed52e76ceb17243a0bc4ce24e9aed52aaa6e49/DirectProgramming/DPC%2B%2B/StructuredGrids/iso3dfd_dpcpp/src/iso3dfd.cpp#L282-L288

  • the pattern not works on tigerlake

Version

Report oneAPI Toolkit version and oneAPI Sample version or hash.

  • 2021.4

Environment

Provide OS information and hardware information if applicable.

  • Ubuntu 20.04

Steps to reproduce

Please check that the issue is reproducible with the latest revision on master. Include all the steps to reproduce the issue.

  • Just run on tigerlake

Observed behavior

Document behavior you observe. For performance defects, like performance regressions or a function being slow, provide a log if possible.

  • make run should run on the gpu, but on the cpu
  • make run_cpu should run on the cpu, but on the gpu
$ make run 
Grid Sizes: 256 256 256
Memory Usage: 230 MB
 ***** Running C++ Serial variant *****
Initializing ...
--------------------------------------
time         : 1.91385 secs
throughput   : 87.6621 Mpts/s
flops        : 5.34739 GFlops
bytes        : 1.05195 GBytes/s

--------------------------------------

--------------------------------------
 ***** Running SYCL variant *****
Initializing ...
 Running on 11th Gen Intel(R) Core(TM) i7-1185G7E @ 2.80GHz
 The Device Max Work Group Size is : 8192
 The Device Max EUCount is : 8
 The blockSize x is : 32
 The blockSize y is : 8
 Using Global Memory Kernel
--------------------------------------
time         : 0.657646 secs
throughput   : 255.11 Mpts/s
flops        : 15.5617 GFlops
bytes        : 3.06132 GBytes/s

--------------------------------------

--------------------------------------
Final wavefields from SYCL device and CPU are equivalent: Success

$ make run_cpu

Scanning dependencies of target run_cpu
Grid Sizes: 256 256 256
Memory Usage: 230 MB
 ***** Running C++ Serial variant *****
Initializing ...
--------------------------------------
time         : 1.7656 secs
throughput   : 95.0226 Mpts/s
flops        : 5.79638 GFlops
bytes        : 1.14027 GBytes/s

--------------------------------------

--------------------------------------
 ***** Running SYCL variant *****
Initializing ...
 Running on Intel(R) Iris(R) Xe Graphics [0x9a49]
 The Device Max Work Group Size is : 512
 The Device Max EUCount is : 96
 The blockSize x is : 256
 The blockSize y is : 1
 Using Global Memory Kernel
--------------------------------------
time         : 0.505061 secs
throughput   : 332.182 Mpts/s
flops        : 20.2631 GFlops
bytes        : 3.98618 GBytes/s

--------------------------------------

--------------------------------------
Final wavefields from SYCL device and CPU are equivalent: Success

Expected behavior

Document behavior you expect.

  • run on the correct device

junxnone avatar Nov 02 '21 02:11 junxnone