James Price comments

Results 76 comments of


                                            James Price

cmake can't find Vulkan

My approach was just to clone the `Vulkan-Headers` repository and point CMake at it via `-DVulkan_INCLUDE_DIRS`. I'm not currently using the Vulkan Loader, so I just pointed `-DVulkan_LIBRARIES` at the...

Correct values for device queries that are not available in Vulkan

> Looking at what MACE is doing, I'd say the ideas are sound but OpenCL doesn't give enough guarantees and information for this to be portable. It may work well...

Respect VkPhysicalDeviceLimits::maxMemoryAllocationCount

I'm hitting this with the MACE benchmarks. I'm happy to wait for push constants, but just wanted to flag this as a potential issue when not using them, if we...

clvk not compatible with applications that use global destructors

I've raised this with Halide, and the [suggested fix](https://github.com/halide/Halide/issues/4827) is to remove `__attribute__((destructor))` from this routine in Halide, and just rely on the OpenCL implementation to clean everything up properly...

Segmentation fault in LLVM 10 and 11 when trying a simple SYCL reduction on an NVIDIA GPU

ComputeCpp likely generates SPIR that conforms to LLVM 3.2 (as per the standard). It's entirely possible that the passes we run in the CUDA backend makes assumptions that weren't true...

Segmentation fault in LLVM 10 and 11 when trying a simple SYCL reduction on an NVIDIA GPU

I had a quick look here and managed to reproduce it. AFAICT it's not getting as far as the CUDA specific passes in `pocl-ptx-gen.cc`, so it's actually one of the...

Segmentation fault in LLVM 10 and 11 when trying a simple SYCL reduction on an NVIDIA GPU

I did some more digging just before this was closed, and it seems that the issue is somewhere in `copyKernelFromBitcode()`. I can dump the program that comes into that function...

Segmentation fault in LLVM 10 and 11 when trying a simple SYCL reduction on an NVIDIA GPU

> One could argue that the segmentation fault itself is a failure in LLVM If pocl is corrupting the AST somehow, then it's not LLVM's fault if it crashes. As...

Segmentation fault in LLVM 10 and 11 when trying a simple SYCL reduction on an NVIDIA GPU

> Is there a way to disable / bypass this selection, and launch the 2nd stage on the original module(s)? You could implement `copyKernelFromBitcode` using `llvm::CloneModule` to just make a...

Core affinity for pthreads?

FWIW, I just tried this with STREAM (specifically [BabelStream](https://github.com/UoB-HPC/BabelStream)), but I was unable to get any significant speed up on a dual-socket Skylake system. I think part of the problem...