rocSOLVER
rocSOLVER copied to clipboard
Next generation LAPACK implementation for ROCm platform
- Modifies STEDC and STEBZ to accommodate divide & conquer for partial decompositions - Adds initial optimizations to merge phase of divide & conquer - Adds new family of APIs...
Implement recursive formulation of Cholesky factorization for n by n symmetric positive definite matrix A. Let the following be a block partitioning of matrix A. Here submatrix L22 is n/2...
As previously discussed, I have been experimenting with optimizing BDSQR by using multiple kernel launches, with device synchronizations to determine the iterative loop's stopping condition. Broadly speaking, I have made...
Optimize getrf_npvt (LU factorization without pivoting) by using a block algorithm that is similar to the block algorithm used in Cholesky factorization. The diagonal block is factored using a specialized...
This PR fixes a bug in `roclapack_stein` where eigenvectors would fail to converge because the initial values used in stein would be orthogonal to the real eigenvectors (to numerical precision)....
This is a draft because the work is incomplete. However, the library builds and the main thing left to do is to actually load the rocsparse symbols with dlopen.
This is essentially a rewrite of the `host_batch_vector` and `host_strided_batch_vector` classes. I've deleted a number of unused options, enabled error checking on failed allocations by default, and added a number...
Hi, I've been trying to improve performance of SYEVD function lately. The sterf kernel is the most time-consuming part of the code. I tried to use two ways to improve...
The Cholesky decomposition doens't performs well on a MI50. Using 10240 matrices the double precision performance is just 150 GFlop/s, and increasing the matrix size to 20480 the performance even...
This PR introduces a potential optimization to the LARFT routine. The modification aims to reduce the size of the gemv computations and instead offloads the block part of the computation...