Andreas Klöckner
Andreas Klöckner
Re PyOpenCL: https://github.com/inducer/pyopencl/commit/0e5a2fb12142f236c3219baeab76bc52d0aba1c1
But you're right: PyOpenCL needs a release. Done: https://pypi.org/project/pyopencl/
Tried with Cython 0.27.3: https://gitlab.tiker.net/inducer/pycuda/-/jobs/39551 Is there anything I need to do to make Numpy realized it should rebuild its Cython-generated C?
This is a CUDA limitation. `register_host_memory` won't work on ARM; the rest of PyCUDA should work fine. I'd welcome a patch detecting this and `xfail`ing the test. https://forums.developer.nvidia.com/t/cudaerrornotsupported-when-calling-cv-cudahostregister-on-nvidia-tx2/60236 As suggested,...
* What would you like to happen? * If you pass `out_host=True`, you get a (transferred) numpy array * Using `np.float32(3)` as an input type gives me the same result....
OIC. I'm not opposed. While I generally have grown to dislike the automatic-transfer-back-to-host functionality, I don't dislike it enough to want to rip it out. In your case, I think...
Thanks for your report. You're right--PyOpenCL doesn't currently try to do anything about different endian-ness between host and device. I'm not sure I even understand all the concerns. * Matching...
I wish I could thumbs-up this more than once. I'm not sure it's doable without detailed dependencies though.
PyOpenCL's arrays are, for now, restricted to being contiguous for most operations. To my mind, the correct way to implement operations on multi-dimensional arrays would use [loopy](/inducer/loopy).
Good catch! Thanks for reporting that. I'd be happy to take a PR that adds that check.