rocSOLVER issues

Partial eigenvalue decomposition with divide & conquer

- Modifies STEDC and STEBZ to accommodate divide & conquer for partial decompositions - Adds initial optimizations to merge phase of divide & conquer - Adds new family of APIs...

jzuniga-amd

noOptimizations

Recursive cholesky

1

Implement recursive formulation of Cholesky factorization for n by n symmetric positive definite matrix A. Let the following be a block partitioning of matrix A. Here submatrix L22 is n/2...

EdDAzevedo

noOptimizations

Multikernel option for bdsqr optimization

As previously discussed, I have been experimenting with optimizing BDSQR by using multiple kernel launches, with device synchronizations to determine the iterative loop's stopping condition. Broadly speaking, I have made...

tfalders

noOptimizations

Optimize LU factorization without pivoting

Optimize getrf_npvt (LU factorization without pivoting) by using a block algorithm that is similar to the block algorithm used in Cholesky factorization. The diagonal block is factored using a specialized...

EdDAzevedo

Fix stein initial eigenvectors' choices

1

This PR fixes a bug in `roclapack_stein` where eigenvectors would fail to converge because the initial values used in stein would be orthogonal to the real eigenvectors (to numerical precision)....

jmachado-amd

noOptimizations

Load rocsparse using dlopen

This is a draft because the work is incomplete. However, the library builds and the main thing left to do is to actually load the rocsparse symbols with dlopen.

cgmb

noOptimizations

noExtendedCI

Replace host_batch_vector and host_strided_batch_vector

This is essentially a rewrite of the `host_batch_vector` and `host_strided_batch_vector` classes. I've deleted a number of unused options, enabled error checking on failed allocations by default, and added a number...

cgmb

noOptimizations

Move sterf to CPU; Add experimental parallelism for sterf

6

Hi, I've been trying to improve performance of SYEVD function lately. The sterf kernel is the most time-consuming part of the code. I tried to use two ways to improve...

mdvizov

Low performance of xPOTRF.

2

The Cholesky decomposition doens't performs well on a MI50. Using 10240 matrices the double precision performance is just 150 GFlop/s, and increasing the matrix size to 20480 the performance even...

rasolca

Use L3 BLAS in LARFT

This PR introduces a potential optimization to the LARFT routine. The modification aims to reduce the size of the gemv computations and instead offloads the block part of the computation...

AGonzales-amd

noOptimizations

rocSOLVER
rocSOLVER copied to clipboard

Metadata

Partial eigenvalue decomposition with divide & conquer

Recursive cholesky

Multikernel option for bdsqr optimization

Optimize LU factorization without pivoting

Fix stein initial eigenvectors' choices

Load rocsparse using dlopen

Replace host_batch_vector and host_strided_batch_vector

Move sterf to CPU; Add experimental parallelism for sterf

Low performance of xPOTRF.

Use L3 BLAS in LARFT

← Metadata

Owner

Metadata

rocSOLVER rocSOLVER copied to clipboard

Metadata

← Metadata

Owner

Metadata

rocSOLVER
rocSOLVER copied to clipboard