occa icon indicating copy to clipboard operation
occa copied to clipboard

Warp/sub-group barriers

Open kris-rowe opened this issue 4 years ago • 1 comments

A related issue: If the inner size <= warpSize a warp-wide barrier should be added. Currently no @barrier is added at all. That's tricky at least for Nvidia's Volta and later architectures (you can no longer assume that the threads in a wrap run in lock-step).

Originally posted by @stgeke in https://github.com/libocca/occa/issues/484#issuecomment-919249600

kris-rowe avatar Sep 14 '21 16:09 kris-rowe

This is also relevant for OpenCL and SYCL/DPC++ since the innermost @inner loop will be mapped to a sub-group. The new versions of the standards support sub-group barriers.

kris-rowe avatar Sep 14 '21 16:09 kris-rowe