Jay Zhuang issues

Results 34 issues of


                                            Jay Zhuang

Use sparse LU as a faster coarse solver than Pinv

The only available `CoarseSolver` is `Pinv`, which gets very slow if the coarsest-level A is large, for example in a two-grid setting instead of multigrid V-cycle. I wrote a code...

Does STRUMPACK support sparse LDLT factorization for symmetric matrix?

From the [Sparse Direct Solver](https://portal.nersc.gov/project/sparse/strumpack/v6.3.1/sparse.html) section, it seems to use the general sparse LU factorization (when no compression is used). Is there a LDLT implementation for symmetric matrix (which would...

Solving MFEM's ComplexHypreParMatrix in Strumpack's native complex arithmetic

I am attempting to use Strumpack to solve complex-valued, indefinite Maxwell equation (motivated by https://github.com/mfem/mfem/issues/2869#issuecomment-1123588971). I found the [`mfem::STRUMPACKSolver`](https://docs.mfem.org/html/classmfem_1_1STRUMPACKSolver.html#a23836709a06d0be816e1250efc93394e) interface and its usage in [MFEM's ex11p.cpp](https://github.com/mfem/mfem/blob/v4.2/examples/ex11p.cpp#L265-L275) (thanks for that!). Then...

Parallel executor for upper triangular solve (back-substitution)

## Problem description First thanks for releasing this code, great work! I saw various implementions of lower triangular solve (forward substitution), but not for upper triangular solve (back-substitution) . In...

With left-preconditioning, there is no way to record residual on original equation

## Problem description With left preconditioner `Pl`, the recorded residual (by setting `log=true`) is computed on the preconditioned equation `norm(Pl*b - Pl*A*x) = norm(Pl*r)`, not the original equation `norm(b -...

Allow reordering for incomplete cholesky

Reordering can reduce fill-in for both complete and incomplete factorizations. The `CholeskyPreconditioner` calls `lldl()` in `LimitedLDLFactorizations.jl`, but without suppling the reorder parameter: https://github.com/JuliaLinearAlgebra/Preconditioners.jl/blob/52b3702e9c8c5d42d6092ed2df7efdbe07c81411/src/incompletecholesky.jl#L8-L13 [In lldl's example](https://github.com/JuliaSmoothOptimizers/LimitedLDLFactorizations.jl/blob/main/examples/example.jl), AMD or METIS reordering...

Understanding why TorchInductor cannot speed-up huggingface transformer inference

## Problem `torch.compile()` shows an impressive ~2x speed-up for this code repo, but when applying to huggingface transformers there is barely no speed-up. I want to understand why, and then...

Out-of-place version of sptrsv executors?

All current [sptrsv executors](https://github.com/sympiler/sympiler/blob/97f3b839be22f3e86de87a43ec51c5e721878b89/sparse_blas/sptrsv.cpp) are in-place: ```cpp void sptrsv_csr(int n, int *Lp, int *Li, double *Lx, double *x) { int i, j; for (i = 0; i < n; i++)...

Any good way to support int64 type for sparse matrix index?

All the current [sparse BLAS routines](https://www.sympiler.com/docs/sympiler-lib/) assume `int` type for row/column pointers `Lp` and column/row indices `Li`. However for very large matrices the `nnz` size can exceed 2.1 billion (INT32_MAX=2,147,483,647),...

Code fixes for local-storage-only environment

In certain virtualized environment there is no shared storage. Both source code and data are stored (replicated) in each worker node's local storage. The code sections below only load data...