oneAPI-samples icon indicating copy to clipboard operation
oneAPI-samples copied to clipboard

OpenMP offload performance is worse than sequential CPU performance

Open saratpoluri opened this issue 4 years ago • 0 comments

Summary

Provide a short summary of the issue. Sections below provide guidance on what factors are considered important to reproduce an issue. The time taken to complete calculation of pi using sequential CPU execution is better than OpenMP CPU and OpenMP GPU

Version

Report oneAPI Toolkit version and oneAPI Sample version or hash. Samples commit id: 4bed52e76ceb17243a0bc4ce24e9aed52aaa6e49

Environment

Provide OS information and hardware information if applicable. Ubuntu 20.04 11th Gen Intel(R) Core(TM) i7-1185GRE @ 2.80GHz

Steps to reproduce

Please check that the issue is reproducible with the latest revision on master. Include all the steps to reproduce the issue. build and run the openmp reduction sample.

Observed behavior

Document behavior you observe. For performance defects, like performance regressions or a function being slow, provide a log if possible. observe the time taken by each and notice that openmp offload performs the worst.

Expected behavior

Document behavior you expect. OpenMP offload should perform better.

Verified Fix

Increase the number of steps by a factor of 100 and the time taken for each lines up according to expectations. OpenMP offload better than OpenMP CPU better than Seq CPU. We do not want developers to be presented with a poorly performing offload to begin with. The alternative is mention this in the README or perhaps do both. Increase num_steps and add some detail to the README. It will also be good to add the num_steps as a tunable parameter with a default value so developer can play with it.

saratpoluri avatar Nov 18 '21 17:11 saratpoluri