pycuda icon indicating copy to clipboard operation
pycuda copied to clipboard

Add pyopencl-like GenericScanKernel

Open adityapb opened this issue 7 years ago • 4 comments

I've tried to keep the kernel source code similar to the scan kernel in pyopencl using pycuda._cluda. I am listing out the differences between the two:

  1. Added a RESTRICT macro to _cluda (will make a corresponding PR to pyopencl)
  2. Removed the is_gpu variable and the extra code that came along with it for the CPU
  3. This line had to be changed slightly

Apart from this I've added some helper functions in pycuda.tools, added tests for int64 dtype and a test for segmented scans that are exactly similar to those in pyopencl and updated the documentation.

adityapb avatar Sep 13 '18 20:09 adityapb

Transplanted here for CI: https://gitlab.tiker.net/inducer/pycuda/merge_requests/11

inducer avatar Oct 07 '18 20:10 inducer

  • You (I assume inadvertently) changed the bpl-subset submodule back to an old version. (see the CI failure)
  • I may not have been fully clear on what I was looking for in #187, but I am not interested in maintaining two versions of GenericScanKernel. i would like for both versions to be letter-for-letter the same, so that I can copy them back and forth between pycuda and pyopencl after each change.

inducer avatar Oct 07 '18 20:10 inducer

  • Yes I did, accidentally. Sorry about that.
  • Ah! I think I misunderstood earlier. But just to be clear, you want not only the SCAN_INTERVALS_SOURCE and the UPDATE_SOURCE to be identical but the entire GenericScanKernel class, right?

adityapb avatar Oct 09 '18 06:10 adityapb

Ah! I think I misunderstood earlier. But just to be clear, you want not only the SCAN_INTERVALS_SOURCE and the UPDATE_SOURCE to be identical but the entire GenericScanKernel class, right?

Ideally, I'd like for the whole file to be identical, so that for each change I can just drop a pull request into each repo with the same file, wait for CI to pass, and move on. I don't know if that's feasible though. If not, I could be OK with the identical bits (maybe the CL/CUDA source and a base class) living in a seperate file.

inducer avatar Oct 09 '18 15:10 inducer