Add pyopencl-like GenericScanKernel
I've tried to keep the kernel source code similar to the scan kernel in pyopencl using pycuda._cluda. I am listing out the differences between the two:
- Added a
RESTRICTmacro to_cluda(will make a corresponding PR topyopencl) - Removed the
is_gpuvariable and the extra code that came along with it for the CPU - This line had to be changed slightly
Apart from this I've added some helper functions in pycuda.tools, added tests for int64 dtype and a test for segmented scans that are exactly similar to those in pyopencl and updated the documentation.
Transplanted here for CI: https://gitlab.tiker.net/inducer/pycuda/merge_requests/11
- You (I assume inadvertently) changed the
bpl-subsetsubmodule back to an old version. (see the CI failure) - I may not have been fully clear on what I was looking for in #187, but I am not interested in maintaining two versions of
GenericScanKernel. i would like for both versions to be letter-for-letter the same, so that I can copy them back and forth between pycuda and pyopencl after each change.
- Yes I did, accidentally. Sorry about that.
- Ah! I think I misunderstood earlier. But just to be clear, you want not only the
SCAN_INTERVALS_SOURCEand theUPDATE_SOURCEto be identical but the entireGenericScanKernelclass, right?
Ah! I think I misunderstood earlier. But just to be clear, you want not only the SCAN_INTERVALS_SOURCE and the UPDATE_SOURCE to be identical but the entire GenericScanKernel class, right?
Ideally, I'd like for the whole file to be identical, so that for each change I can just drop a pull request into each repo with the same file, wait for CI to pass, and move on. I don't know if that's feasible though. If not, I could be OK with the identical bits (maybe the CL/CUDA source and a base class) living in a seperate file.