gpuR icon indicating copy to clipboard operation
gpuR copied to clipboard

Support for QR-decompsition, SVD, Cholesky for NON-SQUARE gpuMatrix/vclMatrix objects

Open mjmg opened this issue 8 years ago • 4 comments

I would like to followup on the status of support for QR-decompsition, SVD, Cholesky solvers for NON-SQUARE Matrices as this issue in ViennaCL is still open as of this time: https://github.com/viennacl/viennacl-dev/issues/191

As much as an OpenCL backend for these solvers with gpuR is most preferable for its portability, I may have to stick with gmatrix which requires a CUDA backend if these limitations cannot be overcome in the foreseeable future.

mjmg avatar Jul 11 '17 10:07 mjmg

@mjmg I agree, I really want to have this limitation removed. I am hesitant to build in a additional custom OpenCL for this purpose because it is supposed to be fixed in ViennaCL. I will need to check back with the author there before I consider doing anything more here.

Until then, you are correct that using gmatrix is probably the best course if you have a NVIDIA gpu.

cdeterman avatar Jul 12 '17 17:07 cdeterman

What is your experience for SVD on gpu? I tried it from latest gpuR on square matrices, but it is orders of magnitude slower what on CPU (for 1000*1000 matrix - 0.4 sec on CPU vs 8.4 sec single precision on AMD Radeon Pro 560 on my mac). Is it normal?

dselivanov avatar Nov 14 '17 05:11 dselivanov

I don't have much experience with it so I couldn't say for sure. But I believe the code on which it is based in ViennaCL is 'experimental'. You can see the line in the code denoting this here. I was hoping to find some more efficient implementations either by writing some myself in direct OpenCL or by leveraging the clMagma library which I am intending to wrap in the gpuRclmagma package.

cdeterman avatar Nov 14 '17 14:11 cdeterman

For my use case, I can consider this issue partially resolved. My usual datasets are matrices with m<<n and I hope to use GPU accelerated SVD for computing PCA.

As pointed out here: https://stackoverflow.com/questions/26797226/svd-speed-in-cpu-and-gpu?noredirect=1&lq=1 SVD may be difficult to parallelize on the GPU.

For the case where the dataset is non-square (m!=n), a workaround is to always use the crossproduct or covariance matrix (M' x M or M x M' where M' is the transpose of the matrix) as input to the SVD algorithm which is always a SQUARE matrix.

For the general use case of non-square matrices as inputs to the solvers, other users may have need for this case so I'm not totally marking this issue as closed.

mjmg avatar Nov 25 '17 13:11 mjmg