Andreas Klöckner comments

Results 957 comments of


                                            Andreas Klöckner

`async` is a keyword in Python 3.7

Re PyOpenCL: https://github.com/inducer/pyopencl/commit/0e5a2fb12142f236c3219baeab76bc52d0aba1c1

`async` is a keyword in Python 3.7

But you're right: PyOpenCL needs a release. Done: https://pypi.org/project/pyopencl/

`async` is a keyword in Python 3.7

Tried with Cython 0.27.3: https://gitlab.tiker.net/inducer/pycuda/-/jobs/39551 Is there anything I need to do to make Numpy realized it should rebuild its Cython-generated C?

[Nvidia Jetson TX2] LogicError: cuMemHostRegister failed: operation not supported

This is a CUDA limitation. `register_host_memory` won't work on ARM; the rest of PyCUDA should work fine. I'd welcome a patch detecting this and `xfail`ing the test. https://forums.developer.nvidia.com/t/cudaerrornotsupported-when-calling-cv-cudahostregister-on-nvidia-tx2/60236 As suggested,...

[PyOpenCL execution]: Scalar argument always returned as pyopencl arrays

* What would you like to happen? * If you pass `out_host=True`, you get a (transferred) numpy array * Using `np.float32(3)` as an input type gives me the same result....

[PyOpenCL execution]: Scalar argument always returned as pyopencl arrays

OIC. I'm not opposed. While I generally have grown to dislike the automatic-transfer-back-to-host functionality, I don't dislike it enough to want to rip it out. In your case, I think...

Portability when the host and device have different endianness

Thanks for your report. You're right--PyOpenCL doesn't currently try to do anything about different endian-ness between host and device. I'm not sure I even understand all the concerns. * Matching...

`lp.duplicate_inames` should take in a soundness check flag

I wish I could thumbs-up this more than once. I'm not sure it's doable without detailed dependencies though.

optimization: support for 2-D/3-D arrays with strides >= for better memory bandwidth utilization

PyOpenCL's arrays are, for now, restricted to being contiguous for most operations. To my mind, the correct way to implement operations on multi-dimensional arrays would use [loopy](/inducer/loopy).

Transfer of arrays with different strides

Good catch! Thanks for reporting that. I'd be happy to take a PR that adds that check.