PyFVTool icon indicating copy to clipboard operation
PyFVTool copied to clipboard

Default sparse linear solver is slow for large systems

Open simulkade opened this issue 1 year ago • 14 comments

In a discussion with my friend @skieninger about the speed of sparse linear solvers of Matlab versus Python for solving a large system of equation (around 1e6) , we compared the backslash solver of Matlab with scipy's SuperLU, we observed a huge difference (with Matlab much faster), which is due to the fact that suitesparse and its poly algorithm (umfpack) is not installed by default in scipy. A package is available here but is a pain to install on Windows and even on WSL (or I had difficulties getting it to work easily with all the recent changes in numpy and scipy). When it works though, it makes a huge difference. For me, Pypardiso also did not work either with system and memory errors. I also noticed that switching between the different existing scipy solvers can also make a significant difference.
The problem is very well documented here.

To address this problem, I suggest the following list of tasks/objectives:

  • A notebook to introduce some of the solvers for large problems
  • Describing a workflow for iterative solvers and preconditioners
  • Possibly adding more sparse solvers that are more efficient than the default scipy solver (provided as optional packages)

Here's the script I am using to test the solvers:

import pyfvtool as pf
import numpy as np
import time
start_time = time.time()

# Solving a 1D diffusion equation with a fixed concentration 
# at the left boundary and a closed boundary on the right side


# Calculation parameters
Nx = 20 # number of finite volume cells
Lx = 1.0 # [m] length of the domain 
c_left = 1.0 # left boundary concentration
c_init = 0.0 # initial concentration
D_val = 1e-5 # diffusion coefficient (gas phase)
t_simulation = 120.0 # [s] simulation time
dt = 60.0 # [s] time step

# Define mesh
mesh = pf.Grid3D(Nx, Nx, Nx, Lx, Lx, Lx)

# Create a cell variable with initial concentration
# By default, 'no flux' boundary conditions are applied
c = pf.CellVariable(mesh, c_init)

# Switch the left boundary to Dirichlet: fixed concentration
c.BCs.left.a[:] = 0.0
c.BCs.left.b[:] = 1.0
c.BCs.left.c[:] = c_left
c.apply_BCs()

# Assign diffusivity to cells
D_cell = pf.CellVariable(mesh, D_val)
D_face = pf.geometricMean(D_cell) # average value of diffusivity at the interfaces between cells

Mt, RHSt = pf.transientTerm(c, dt, 1.0)
Mbc, RHSbc = pf.boundaryConditionsTerm(c.BCs)
Md = pf.diffusionTerm(D_face)
M = Mt - Md + Mbc
RHS = RHSt + RHSbc
c_new = pf.solveMatrixPDE(mesh, M, RHS)
# Print the execution time
print("--- %s seconds ---" % (time.time() - start_time))

simulkade avatar Aug 10 '24 18:08 simulkade

It will be great to have a notebook of 'recipes' for getting external sparse solvers to work with PyFVTool.

For our present calculations (2D cylindrical, 375000 cells), Pypardiso works well (Win11, miniconda, specific PyFVTool environment) and decreases total calculation time by a factor up to 10x (laptops, Intel i7, 32 Gb RAM). I think we are lucky.

Concerning MATLAB: you get what you pay for! I guess that MATLAB has been partnering for long with developers of sparse solvers to get these to work in a hassle-free manner and optimize the performance. This is probably true for other commercial software. Sparse array support in scientific Python still needs further work.

Some ideas:

  • It may be insightful to keep a (long-term) log book of calculation times for specific calculations. Clearly specify the configuration (OS version, Python distribution, Python requirements.txt), so that these may be reproduced. I am quite happy with Anaconda/miniconda/Conda-Forge and environments.
  • You may find inspiration with projects that have the same sparse solver problem: SfePy (UMFPACK, PETSc) and of course FiPy. Back in the Python 2.7 days, FiPy used pysparse on Linux (not Windows). It was really good, fast and easy-to-use, but not available in Python 3. I remember that our calculations became much slower, even with PETSc, in Python 3.
  • Starting from a certain size of problem, it stops being convenient to run calculations on a laptop (heat and fan noise...), and it is better to set up the calculations on a fixed (Linux) workstation. The workstation may even be old and slower, as you can leave it running all through the night! Store the results to HDF5 files for post-processing. We have used Jupyter Lab (in server mode) to connect to the workstation with a web browser via the local network (in addition to standard ssh and tmux tools).
  • Specifying the whole workflow for larger calculations (batch processing, data storage, post-processing) will indeed be helpful, in particular to students getting started.
  • UMFPACK via scikit-umfpack seems the way to go. Encouraging news concerning installation from conda-forge.
  • I have the impression that the biggest chance of getting this kind of thing working easily, is to use a dedicated Linux installation, perhaps on a separate workstation.

mhvwerts avatar Aug 11 '24 10:08 mhvwerts

I have the impression that the biggest chance of getting this kind of thing working easily, is to use a dedicated Linux installation, perhaps on a separate workstation.

I will give it a shot later, when my son does not play don't starve together on our Linux machine.

UMFPACK via scikit-umfpack seems the way to go. Encouraging news concerning https://github.com/scikit-umfpack/scikit-umfpack/issues/85.

Totally agree. I wish it was installed by default in scipy. It seems that the problem is a license incompatibility.

simulkade avatar Aug 13 '24 08:08 simulkade

For now, Linux seems indeed to be the easiest OS for working with UMFPACK: https://github.com/scikit-umfpack/scikit-umfpack/issues/85#issuecomment-2291483189

mhvwerts avatar Aug 15 '24 15:08 mhvwerts

pypardiso, which provides acces to a fast Intel MKL sparse solver on Windows, seems to be broken with NumPy 2.x.

Create a specific development environment to use pypardiso on Windows (from pyfvtool local clone):

conda create --name pyfvtool_dev_vintage python=3.12 numpy=1.26.4 scipy matplotlib pypardiso spyder jupyterlab pytest=7.4 pytest_notebook tqdm
conda activate pyfvtool_dev_vintage
pip install -e .

I haven't looked very deeply into the problem, but I document the problem and the specific conda work-around environment here, for future reference.

The pypardiso - NumPy2 problem is likely related to changes in the NumPy APIs, but I am no expert. It does not seem to be a problem with the MKL version itself.

If we get to document this problem more clearly, we might open an issue in the PyPardiso GitHub. For now, the 1.26.4 work-around, well, works around the problem.

mhvwerts avatar Apr 08 '25 12:04 mhvwerts

The PyPardiso/NumPy2 problem also affects other people: https://github.com/haasad/PyPardiso/issues/77.

For now, it is unclear where a solution could be found: PyPardiso, NumPy2 or SciPy. Meanwhile, keep using NumPy 1.26.4 with PyPardiso.

There may be an opportunity for working with UMFPACK in the near future.

mhvwerts avatar May 31 '25 11:05 mhvwerts

A fix for the PyPardiso/NumPy2 problem has been found. A pull request is available: https://github.com/haasad/PyPardiso/pull/79

It will probably be in a new release of PyPardiso soon.

mhvwerts avatar Jun 20 '25 19:06 mhvwerts

This "spsolve" issue can be expected to remain open in the foreseeing future. In a recent discussion with a colleague about porting his electromagnetic FEM code from MATLAB to Python, this question came up again. In MATLAB you can just do x = A\b with a sparse system and it will be smooth and fast and use available compute power. With spsolve in Python, the situation is more adventurous.

The last few days, I dug again into the question to see what the current state of spsolve is, i.e. the various projects that may give easy access to sparse linear solvers in Python. The people in these projects are doing the best they can, but progress is often stalling due to lack of time (I can relate to this!).

For PyFVTool, I propose to be pragmatic. Figure out what works, document it, and perhaps even include it into the code base (with optional dependencies). As indicated, I will consistently use Python 3.12, and the most recent versions of NumPy and SciPy, since we need to learn to live with NumPy 2 and the recent changes in SciPy. I anticipate that both NumPy and SciPy will stabilize a bit.

As is obvious from #50, the first efforts will focus on PyPardiso.

mhvwerts avatar Jul 21 '25 09:07 mhvwerts

Here is a short list of still active developments that we could monitor once in a while for recent progress. I focus on direct solvers using multithreading in a shared memory environment (i.e. a typical modern workstation or single compute node in a HPC cluster).

And an interesting link about sparse linear solvers in general:

  • https://tbetcke.github.io/hpc_lecture_notes/sparse_solvers_introduction.html

mhvwerts avatar Jul 21 '25 10:07 mhvwerts

Discussing the reputation of oneMKL PARDISO with one of the LLMs around:

oneMKL's PARDISO is generally regarded as a high-performance, production-grade sparse direct solver — particularly strong for solving large, sparse, and potentially ill-conditioned linear systems.

✅ Strengths

  • Highly optimized for Intel CPUs: Leverages vectorization, threading, and memory locality via oneMKL.
  • Excellent performance on symmetric and structurally symmetric matrices (SPD, nonsymmetric).
  • Robust pivoting and reordering strategies improve numerical stability and efficiency.
  • Multithreaded: Uses Intel TBB/OpenMP underneath to parallelize factorization and backsolve.
  • Handles very large problems efficiently — often outperforming open-source solvers like SuperLU or UMFPACK in large-scale scientific computing tasks.

⚠️ Limitations or Considerations

  • Proprietary — not ideal for fully open-source redistribution.
  • Best performance is on Intel CPUs; it may not perform as well on AMD or ARM (though it will still run).
  • Memory usage can be high — it’s a direct solver, not iterative.

✅ Software Integrations Intel PARDISO is embedded in:

  • COMSOL Multiphysics
  • ANSYS
  • ABAQUS (legacy version)
  • OpenFOAM (with optional MKL support)
  • MATLAB (when MKL is available)

Let's see how this works out for our PyFVTool calculations...

mhvwerts avatar Jul 21 '25 15:07 mhvwerts

Note that while oneMKL is a closed-source runtime library with certain restrictions on redistribution, it is perfectly OK to use it as an (in our case optional) dependency in an open-source Python project.

mhvwerts avatar Jul 21 '25 15:07 mhvwerts

For any future developments beyond oneMKL PARDISO, better look into MUMPS first, instead of UMFPACK. MUMPS appears to scale better for large systems in parallel environment, and seems better adapted to finite-element/finite-volume methods.

mhvwerts avatar Jul 21 '25 16:07 mhvwerts

I have a spsolve under development, I would be grateful if you are willing to test it, though i doubt it will bring you much improvement.

I took a simple look at your problem, looks like your RHS is a vector so it's not easy to be parallel, and most of time will spend on LU decomposition, which in scipy and my package all use a sequential SuperLU.

UMFPACK may be a good choice which MATLAB uses. The problem is it uses LGPL license so I cannot redistribute it, which should not be a problem for your project (you use LGPL also).

From what i witnessed, pypardiso is faster than UMFPACK for some denser cases or bigger size of matrix, probably because there is some overhead between phase changes.

capric98 avatar Jul 21 '25 17:07 capric98

I remember I looked into MUMPS, PARDISO, and umfpack when solving a coupled system of advection diffusion equation. I also tried various iterative solvers that never worked. My system was very sensitive to the accurate solution of the continuity equation and iterative solvers caused problems unless I forced them to converge to a tiny error that killed the performance. I also looked into preconditioning (that is really a necessity specially with boundary conditions that lead to ill-conditioned matrices) but never went to deep into them. I could never understand them enough to come up with an algorithmic way of calling preconditioners and the iterative solvers.
I ended up going for umfpack from Matlab because it was a pain to get PARDISO and specially MUMS installed. I remember guys in COMSOL mostly recommended PARDISO for systems similar to mine (Rayleigh–Bénard instability). I will soon recreate it in PyFVTool as an example for testing solvers.

simulkade avatar Jul 21 '25 21:07 simulkade

Thank you very much, @capric98 , for sharing your code and your insight. Your solver is on the list for future tests, although I do not know when the next occasion will be.

As @simulkade suggests, we should now first come up with a small set of realistic bench-mark cases, based on transport phenomena simulations from research and teaching projects, adapted into test scripts. (BTW such scripts would also be nice illustrations of PyFVTool). Then we can systematically compare.

BTW, @simulkade , my initial criterion for solver packages is that they should either 'automatically' install directly in my PyFVTool Conda environment (using conda, or pip if necessary), or be simply a Python module that I can put in the same folder as my calculation script. As you noticed, I am currently happy with PARDISO from Intel oneMKL, since I had could very easily install it (conda install mkl) and it works with the new PyFVTool solvers feature (#51 ).

mhvwerts avatar Jul 22 '25 08:07 mhvwerts