Joe Todd issues

Results 7 issues of


                                            Joe Todd

[WIP] SYCL blocksize deduction (for AMPERE80)

This PR adds SYCL blocksize deduction (similar to that implemented for the CUDA backend). For now it only supports `KOKKOS_ARCH_AMPERE80` and _per streaming multiprocessor (SM)_ info is hardcoded. Depends on...

Remove Catch2?

During recent delivery of this training material, we had the sense that possibly Catch2 is overkill for most of the exercises, and makes for a slightly burdensome dependency. We're removing...

Add kernels & tests for onesweep radix_sort

This is the 5th PR for the pure SYCL onesweep radix_sort implementation. This PR builds on top of (and depends on) https://github.com/oneapi-src/oneDPL/pull/1245, adding the actual histogram, scan and onesweep kernels,...

follow through

Submitter defs & kernel decls for kt::gpu::radix_sort

This is the 4th PR for the pure SYCL onesweep radix_sort implementation. This PR builds on top of (and depends on) https://github.com/oneapi-src/oneDPL/pull/1244, adding submitter definitions and kernel declarations.

follow through

[SYCL][COMPAT] Add default ctor to dim3 and update tests/docs

This PR adds a default constructor for `syclcompat::dim3`, and makes the members non-const. This means patterns like this are now possible: ```cpp syclcompat::dim3 myDim3; myDim3.x = 32; ``` @Alcpz this...

[SYCL][COMPAT] Re-add buffer (USM_LEVEL_NONE) support

This PR enables (a subset of) the SYCLcompat memory APIs on devices which lack USM support. Defining `COMPAT_USM_LEVEL_NONE` enables this mode, in which `syclcompat` memory APIs (`malloc`, `memcpy`, `memset`, `free`,...

Constant memory optimization for CUDA backend

**Is your feature request related to a problem? Please describe** In CUDA, a kernel functor can be copied to a CUDA symbol, which can be marked \_\_const\_\_. This enables the...

enhancement

performance

cuda