James Price
James Price
My approach was just to clone the `Vulkan-Headers` repository and point CMake at it via `-DVulkan_INCLUDE_DIRS`. I'm not currently using the Vulkan Loader, so I just pointed `-DVulkan_LIBRARIES` at the...
> Looking at what MACE is doing, I'd say the ideas are sound but OpenCL doesn't give enough guarantees and information for this to be portable. It may work well...
I'm hitting this with the MACE benchmarks. I'm happy to wait for push constants, but just wanted to flag this as a potential issue when not using them, if we...
I've raised this with Halide, and the [suggested fix](https://github.com/halide/Halide/issues/4827) is to remove `__attribute__((destructor))` from this routine in Halide, and just rely on the OpenCL implementation to clean everything up properly...
ComputeCpp likely generates SPIR that conforms to LLVM 3.2 (as per the standard). It's entirely possible that the passes we run in the CUDA backend makes assumptions that weren't true...
I had a quick look here and managed to reproduce it. AFAICT it's not getting as far as the CUDA specific passes in `pocl-ptx-gen.cc`, so it's actually one of the...
I did some more digging just before this was closed, and it seems that the issue is somewhere in `copyKernelFromBitcode()`. I can dump the program that comes into that function...
> One could argue that the segmentation fault itself is a failure in LLVM If pocl is corrupting the AST somehow, then it's not LLVM's fault if it crashes. As...
> Is there a way to disable / bypass this selection, and launch the 2nd stage on the original module(s)? You could implement `copyKernelFromBitcode` using `llvm::CloneModule` to just make a...
FWIW, I just tried this with STREAM (specifically [BabelStream](https://github.com/UoB-HPC/BabelStream)), but I was unable to get any significant speed up on a dual-socket Skylake system. I think part of the problem...