tmpi icon indicating copy to clipboard operation
tmpi copied to clipboard

MPI_INIT fails on all ranks but rank 0 when using mpi4py

Open EdCaunt opened this issue 1 year ago • 1 comments

Starting a Python shell with tmpi 2 python and running from mpi4py import MPI results in a failed MPI_INIT on all ranks but the first (for any number of ranks afaict) with the following message:

Python 3.11.5 (main, Sep 11 2023, 08:31:25) [Clang 14.0.6 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from mpi4py import MPI
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  PML add procs failed
  --> Returned "Not found" (-13) instead of "Success" (0)
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  ompi_mpi_init: ompi_mpi_instance_init failed
  --> Returned "Not found" (-13) instead of "Success" (0)
--------------------------------------------------------------------------
[dyn3168-24:00000] *** An error occurred in MPI_Init_thread
[dyn3168-24:00000] *** reported by process [2283732993,1]
[dyn3168-24:00000] *** on a NULL communicator
[dyn3168-24:00000] *** Unknown error
[dyn3168-24:00000] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[dyn3168-24:00000] ***    and MPI will try to terminate your MPI job as well)

Pane is dead (status 14, Mon Feb 19 14:17:33 2024)

This is on MacOS with OpenMPI 5.0.2 installed using Brew. A script containing this import works fine when run with mpiexec -n 2 --oversubscribe python script.py.

Any idea why this might be happening? The behaviour seems to be TMPI-specific. I worked initially after install, but started throwing this error, and reinstalling both OpenMPI and TMPI hasn't fixed the issue afaict.

EdCaunt avatar Feb 19 '24 14:02 EdCaunt

Having the same issue with a C project. Works fine with mpirun -n 4 xterm -e lldb <program>, aswell as running without attaching a debugger. Please let me know if you ever find a solution.

kristiansordal avatar Mar 06 '24 10:03 kristiansordal