Tom Deakin comments

Results 70 comments of


                                            Tom Deakin

For info: TARGET flags for clang-based CCE for OpenMP target

- it also seems to work with just `-fopenmp` and the right `craype-accel` module loaded - plain LLVM also works with `-fopenmp-targets=nvptx64`. CCE won't work with the longer `-fopenmp-targets=nvptx64-nvidia-cuda`

CUDA dot tuning

Related to #122

CUDA dot tuning

In that PR, the number of blocks was set to `4 * prop.multiProcessorCount;` in a similar way to the other models that need to guess this number (i.e. OpenCL).

CUDA dot tuning

> I am really winning at filing duplicate issues today, aren't I? 😄 Just shows us that we need to do some housekeeping ASAP... Good to bring this to the...

overflow in CUDA

Thanks Jeff - we started #127 to support larger inputs than an `int` can hold. There is definitely some weird type behaviour going on. Thanks for sharing the fix for...

Update for latest OpenMP version (5.2 and beyond)

The metadirective could be used as below. This would negate the need to have our own compile time switch. ``` #pragma omp metadirective \ when(device={arch(gpu)}: target teams distribute parallel for...

Update for latest OpenMP version (5.2 and beyond)

`order(concurrent)` clause on worksharing loops to identify a *concurrent* loop

Update for latest OpenMP version (5.2 and beyond)

`loop` directive

WIP: support massive input sizes

We definitely don't want to offer unsigned types here. We need signed types to help with vectorisation, and don't want to suggest that using unsigned is best practice. Given that,...

Upgrade RAJA to 0.14.x

Should be resolved by 6945cbcec71be6ad09b2ea, pending https://github.com/LLNL/RAJA/issues/1296 and https://github.com/LLNL/RAJA/pull/1302