Add MPI speedup to pysteps blending

Open sidekock opened this issue 7 months ago • 0 comments

At RMI we are aiming to run pysteps in an operational environment. One thing that could improve copute time is adding MPI optionality. More info will be provided later in this thread by @mpvginde

The STEPS blending routine was recently refactored for clarity, but its performance footprint is still unknown. The historical “MPI version” that sat in a separate steps_mpi.py file tried to gain speed via mpi4py, while the new code is still single-process (although cleaner). Before we invest engineering time into reviving/rewriting a full MPI backend, we need solid evidence that parallelisation is worth the maintenance cost – and that MPI is the right flavour of parallelism.

Below a suggestion of future steps:

1 · Candidate approaches

Option	One-liner description
MPI4Py (manual scaling)	Revive `steps_mpi` through a clean adapter; launch with `mpirun`.
xarray → dask.array → Dask scheduler	Store fields as xarray objects, let Dask manage chunked execution across cores/nodes.

2 · Benchmark we must run

Measure wall-clock time and peak RAM for an identical 60-min, 12-member forecast on two machines (laptop 8 cores, HPC 2 × 20 cores).

Label	What we test
A	Old master branch (pre-refactor)
B	New master branch (current, single-process)
C	Old mpi-steps (pre-refactor)
D	Branch B plus prototype MPI adapter (same code paths, split over ranks)
E	Branch B plus prototype xarray code (same code paths, split over ranks)

3 · Acceptance & decision rule

Deliverables

[ ] Repro script(s) that run A, B, C and dump timing results (CSV).
[ ] Results table posted in this thread.

Decision

If C is ≥ 20 % faster than B on the HPC node and scales close to linearly, we proceed with a full MPI rewrite of STEPS blending.
Otherwise we keep single-process and explore the Dask route instead.

Jun 12 '25 13:06 sidekock