PyCREPS GPU acceleration support

Make available optional use of GPU acceleration.

Aug 24 '18 08:08 RicardoDominguez

#7 is a good first step, and should be very straightforward to extend to GPUs, but testing is required.

Aug 27 '18 08:08 RicardoDominguez

For Theano: (source)

Better on GPU: matrix multiplication, convolution, and large element-wise operations can be accelerated a lot (5-50x) when arguments are large enough to keep 30 processors busy.

Equal on GPU/CPU: indexing, dimension-shuffling and constant-time reshaping.

Better on CPU: summation over rows/columns of tensors.

Copying of large quantities of data to and from a device is relatively slow, and often cancels most of the advantage of one or two accelerated functions on that data. Getting GPU performance largely hinges on making data transfer to the device pay off. By default all inputs will get transferred to GPU. You can prevent an input from getting transferred by setting its tag.target attribute to ‘cpu’.

Tips for Improving Performance on GPU

Consider adding floatX=float32 (or the type you are using) to your .theanorc file if you plan to do a lot of GPU work. Prefer constructors like matrix, vector and scalar (which follow the type set in floatX) to dmatrix, dvector and dscalar.
Minimize transfers to the GPU device by using shared variables to store frequently-accessed data (see shared()). When using the GPU, tensor shared variables are stored on the GPU by default to eliminate transfer time for GPU ops using those variables.
If you aren’t happy with the performance you see, try running your script with profile=True flag. This should print some timing information at program termination. Is time being used sensibly? If an op or Apply is taking more time than its share, then if you know something about GPU programming, have a look at how it’s implemented in theano.gpuarray. Check the line similar to Spent Xs(X%) in cpu op, Xs(X%) in gpu op and Xs(X%) in transfer op. This can tell you if not enough of your graph is on the GPU or if there is too much memory transfer.
To investigate whether all the Ops in the computational graph are running on GPU, it is possible to debug or check your code by providing a value to assert_no_cpu_op flag, i.e. warn, for warning, raise for raising an error or pdb for putting a breakpoint in the computational graph if there is a CPU Op.

Sep 28 '18 11:09 RicardoDominguez