Andreas Klöckner comments

Results 957 comments of


                                            Andreas Klöckner

pycuda.compiler.SourceModule incorrectly handles sources in `bytes` type

I'd be happy to take a patch.

issue with pycuda and multiprocessing

> It init's the driver, then spawns a child process According to Nvidia, you may not `fork()` after intializing CUDA.

[precompute] Makes the compute insn's deps. precise

> Depend on an instruction only if it writes to the variables that the > compute instruction reads. Is this correct? The write could occur anywhere in the transitive closure...

PyOpenCL target: Overflow large argument counts into SVM struct

With - https://github.com/pocl/pocl/pull/1069 - https://github.com/inducer/pyopencl/pull/452 the following passes for me: ``` LOOPY_NO_CACHE=1 pycl test_target.py 'test_passing_bajillions_of_svm_args(cl._csc)' ``` Let me know if you can reproduce that.

PyOpenCL target: Overflow large argument counts into SVM struct

I obviously can't guarantee that that's what at issue here, but I suspect you'll need https://github.com/pocl/pocl/pull/1069 (or another fix for the same issue) in order to allow this to work....

PyOpenCL target: Overflow large argument counts into SVM struct

With `CU_MEM_ATTACH_GLOBAL`, I don't think you have a guarantee that the memory should be accessible from the host. Also, since you seem to attribute the crash in the sample code...

PyOpenCL target: Overflow large argument counts into SVM struct

Btw, I agree that this discussion does not have much to do with Loopy. Maybe let's continue the discussion here: https://github.com/inducer/pyopencl/pull/452.

PyOpenCL target: Overflow large argument counts into SVM struct

> I found another fix (workaround?) in [pocl/pocl@03ffc71](https://github.com/pocl/pocl/commit/03ffc7146f425bee6e6345dfe4208d095ddd7e7b) which just uses CUDA functions for the memfill operation. With that fix, my simple test and the test in this PR also...

PyOpenCL target: Overflow large argument counts into SVM struct

@matthiasdiener Please don't force-push to branches on which more than one person is working. Not only is there a risk of clobbering one another's work, it's also very hard to...

Global reduction

In either case, you'll need a global barrier. After that, you might as well run a (short!) sequential reduction loop, which is going to be faster (and matches best practices...