Joe Todd
Joe Todd
This PR adds SYCL blocksize deduction (similar to that implemented for the CUDA backend). For now it only supports `KOKKOS_ARCH_AMPERE80` and _per streaming multiprocessor (SM)_ info is hardcoded. Depends on...
During recent delivery of this training material, we had the sense that possibly Catch2 is overkill for most of the exercises, and makes for a slightly burdensome dependency. We're removing...
This is the 5th PR for the pure SYCL onesweep radix_sort implementation. This PR builds on top of (and depends on) https://github.com/oneapi-src/oneDPL/pull/1245, adding the actual histogram, scan and onesweep kernels,...
This is the 4th PR for the pure SYCL onesweep radix_sort implementation. This PR builds on top of (and depends on) https://github.com/oneapi-src/oneDPL/pull/1244, adding submitter definitions and kernel declarations.
This PR adds a default constructor for `syclcompat::dim3`, and makes the members non-const. This means patterns like this are now possible: ```cpp syclcompat::dim3 myDim3; myDim3.x = 32; ``` @Alcpz this...
This PR enables (a subset of) the SYCLcompat memory APIs on devices which lack USM support. Defining `COMPAT_USM_LEVEL_NONE` enables this mode, in which `syclcompat` memory APIs (`malloc`, `memcpy`, `memset`, `free`,...
**Is your feature request related to a problem? Please describe** In CUDA, a kernel functor can be copied to a CUDA symbol, which can be marked \_\_const\_\_. This enables the...