Andreas Klöckner
Andreas Klöckner
> * `elwise_kernel_runner` could probably add to `write_events` for the first arg and to `read_events` for the rest of them, so we don't clutter all the rest of the code....
> * > > * It seems to me that something like an `C = AXPBY` would need wait for `read_events` on `C` and `write_events` on `A` and `B`. Does...
I'm supportive of this suggestion, including the warnings you describe. I would be happy to consider a PR along these lines.
I don't have the bandwidth to put those together, but if you (or anyone) would like to submit a PR that builds those wheels (say, using cibuildwheel), I might be...
If you can do 2 without introducing per-argument processing in Python (such as by modifying the custom struct packing), that could be viable.
I'm aware of it, having evaluated it for pyopencl (https://github.com/inducer/pyopencl/pull/546). Migrating to pybind is a good stepping stone at any rate. For now, I appreciate that pybind is a more...
What format comes out of `descr.format = drv.dtype_to_array_format(a.dtype)`? Is that the correct one in your view?
That looks OK. Could you reference the Nvidia docs on how this is intended to work? It's been years since I've directly worked with textures, I don't remember.
Thanks for timing this! A few thoughts: - At least in a way, this is good news, because it means our choice of implementation language isn't slowing us down much....
> I'll further assume that it's time genuinely spent in isl and not in the wrapper. (Do you have any indication otherwise?) ~~Yes: https://github.com/inducer/islpy/issues/28~~