Tom Deakin
Tom Deakin
- it also seems to work with just `-fopenmp` and the right `craype-accel` module loaded - plain LLVM also works with `-fopenmp-targets=nvptx64`. CCE won't work with the longer `-fopenmp-targets=nvptx64-nvidia-cuda`
Related to #122
In that PR, the number of blocks was set to `4 * prop.multiProcessorCount;` in a similar way to the other models that need to guess this number (i.e. OpenCL).
> I am really winning at filing duplicate issues today, aren't I? 😄 Just shows us that we need to do some housekeeping ASAP... Good to bring this to the...
Thanks Jeff - we started #127 to support larger inputs than an `int` can hold. There is definitely some weird type behaviour going on. Thanks for sharing the fix for...
The metadirective could be used as below. This would negate the need to have our own compile time switch. ``` #pragma omp metadirective \ when(device={arch(gpu)}: target teams distribute parallel for...
`order(concurrent)` clause on worksharing loops to identify a *concurrent* loop
`loop` directive
We definitely don't want to offer unsigned types here. We need signed types to help with vectorisation, and don't want to suggest that using unsigned is best practice. Given that,...
Should be resolved by 6945cbcec71be6ad09b2ea, pending https://github.com/LLNL/RAJA/issues/1296 and https://github.com/LLNL/RAJA/pull/1302