rocSOLVER icon indicating copy to clipboard operation
rocSOLVER copied to clipboard

Next generation LAPACK implementation for ROCm platform

Results 49 rocSOLVER issues
Sort by recently updated
recently updated
newest added

- Modifies STEDC and STEBZ to accommodate divide & conquer for partial decompositions - Adds initial optimizations to merge phase of divide & conquer - Adds new family of APIs...

noOptimizations

Implement recursive formulation of Cholesky factorization for n by n symmetric positive definite matrix A. Let the following be a block partitioning of matrix A. Here submatrix L22 is n/2...

noOptimizations

As previously discussed, I have been experimenting with optimizing BDSQR by using multiple kernel launches, with device synchronizations to determine the iterative loop's stopping condition. Broadly speaking, I have made...

noOptimizations

Optimize getrf_npvt (LU factorization without pivoting) by using a block algorithm that is similar to the block algorithm used in Cholesky factorization. The diagonal block is factored using a specialized...

This PR fixes a bug in `roclapack_stein` where eigenvectors would fail to converge because the initial values used in stein would be orthogonal to the real eigenvectors (to numerical precision)....

noOptimizations

This is a draft because the work is incomplete. However, the library builds and the main thing left to do is to actually load the rocsparse symbols with dlopen.

noOptimizations
noExtendedCI

This is essentially a rewrite of the `host_batch_vector` and `host_strided_batch_vector` classes. I've deleted a number of unused options, enabled error checking on failed allocations by default, and added a number...

noOptimizations

Hi, I've been trying to improve performance of SYEVD function lately. The sterf kernel is the most time-consuming part of the code. I tried to use two ways to improve...

The Cholesky decomposition doens't performs well on a MI50. Using 10240 matrices the double precision performance is just 150 GFlop/s, and increasing the matrix size to 20480 the performance even...

This PR introduces a potential optimization to the LARFT routine. The modification aims to reduce the size of the gemv computations and instead offloads the block part of the computation...

noOptimizations