Andreas Klöckner comments

Results 957 comments of


                                            Andreas Klöckner

make_default_context() wasn't able to create a context

> Tesla V100 on a cluster Is their X server using the Nv driver? Do they even have an X server running? I would ask them to run one of...

make_default_context() wasn't able to create a context

Out of curiosity, could you share what ultimately solved the problem?

make_default_context() wasn't able to create a context

You could try the (relatively recent) [`retain_primary_context`](https://documen.tician.de/pycuda/driver.html#pycuda.driver.Device.retain_primary_context) to create a context.

make_default_context() wasn't able to create a context

* Do the CUDA SDK samples work on that machine? Particularly the one for the driver SDK? * Can you get a backtrace (via gdb) at the moment the program...

gpuarray.dot() works too slow at the first calling

That's because the first time the function is called, a few kernels are compiled behind the scenes to do the work. The basic assumption is that your program will run...

gpuarray.dot() works too slow at the first calling

If that works for your use case, then yes, that should avoid compilation/module load delays on subsequent runs of the kernel.

Is PyopenCL slower than its counterpart C OpenCL

PyOpenCL is not involved in the execution of the kernels. The actual on-device kernel execution time should be exactly the same between a C program using OpenCL and a Python...

Is PyopenCL slower than its counterpart C OpenCL

Are you saying your kernel times are different? Switch your command queue to enable profiling and get kernel execution times in both settings. They should match pretty closely.

Is PyopenCL slower than its counterpart C OpenCL

The Python bits (setting and preparing arguments) is slower, but this time can (and should be) hidden by kernel execution, which occurs asynchronously on the device.

Look for type information from other instructions in case of self type dependency

cc @xywei