Andreas Klöckner

Results 957 comments of Andreas Klöckner

> @inducer, what are your thoughts on memoizing `get_var_dict` and `get_var_names` in islpy? If it helps, go for it. You'll probably need to add a pytools dep.

Concurrent access to a single buffer is ill-formed in OpenCL 1, but allowed with fine- and coarse-grain SVM in OpenCL 2. So this is only relevant after #220.

Nice catch! That makes sense. A good question is what we should do about that. We could apply that offset as a transformation before codegen, to preserve the literal iname...

I still get an email (at least) once a week about how Loopy doesn't pass its GPU CI.

What version of the CUDA toolkit do you have? It's `lib64` on mine. (11.2.2 installed from Debian.)

I suspect that test could be improved by checking if `lib64` exists, and if it does not, to try `lib`.

FWIW, if you're using conda, you might as well use the pycuda package from conda-forge.

Pytorch may have run on an alternate backend. Could you try and run one of the CUDA SDK examples to verify? Also copy-paste the output of `nvidia-smi`?

> Could it be a problem? Deprecated doesn't mean unsupported. Does the GL context sharing example work for you?