pycuda Add pyopencl-like GenericScanKernel

I've tried to keep the kernel source code similar to the scan kernel in pyopencl using pycuda._cluda. I am listing out the differences between the two:

Added a RESTRICT macro to _cluda (will make a corresponding PR to pyopencl)
Removed the is_gpu variable and the extra code that came along with it for the CPU
This line had to be changed slightly

Apart from this I've added some helper functions in pycuda.tools, added tests for int64 dtype and a test for segmented scans that are exactly similar to those in pyopencl and updated the documentation.

Sep 13 '18 20:09 adityapb

Transplanted here for CI: https://gitlab.tiker.net/inducer/pycuda/merge_requests/11

Oct 07 '18 20:10 inducer

You (I assume inadvertently) changed the bpl-subset submodule back to an old version. (see the CI failure)
I may not have been fully clear on what I was looking for in #187, but I am not interested in maintaining two versions of GenericScanKernel. i would like for both versions to be letter-for-letter the same, so that I can copy them back and forth between pycuda and pyopencl after each change.

Oct 07 '18 20:10 inducer

Yes I did, accidentally. Sorry about that.
Ah! I think I misunderstood earlier. But just to be clear, you want not only the SCAN_INTERVALS_SOURCE and the UPDATE_SOURCE to be identical but the entire GenericScanKernel class, right?

Oct 09 '18 06:10 adityapb

Ah! I think I misunderstood earlier. But just to be clear, you want not only the SCAN_INTERVALS_SOURCE and the UPDATE_SOURCE to be identical but the entire GenericScanKernel class, right?

Ideally, I'd like for the whole file to be identical, so that for each change I can just drop a pull request into each repo with the same file, wait for CI to pass, and move on. I don't know if that's feasible though. If not, I could be OK with the identical bits (maybe the CL/CUDA source and a base class) living in a seperate file.

Oct 09 '18 15:10 inducer