dolfinx
dolfinx copied to clipboard
IndexMap's cosntructor hangs if there is a mismatch in the destination ranks
Minimum failing example:
int main(int argc, char* argv[])
{
common::subsystem::init_logging(argc, argv);
common::subsystem::init_petsc(argc, argv);
MPI_Comm mpi_comm{MPI_COMM_WORLD};
int mpi_size, mpi_rank;
MPI_Comm_size(mpi_comm, &mpi_size);
MPI_Comm_rank(mpi_comm, &mpi_rank);
const int size_local = 100;
// Create some ghost entries on next process
std::vector<std::int64_t> ghosts(1);
ghosts[0] = (mpi_rank + 1) % mpi_size * size_local + 1;
std::vector<int> global_ghost_owner(ghosts.size(), (mpi_rank + 1) % mpi_size);
// Compute destination edges
auto dest_edges = dolfinx::MPI::compute_graph_edges(
MPI_COMM_WORLD,
std::set<int>(global_ghost_owner.begin(), global_ghost_owner.end()));
// Add an extra edge or remove with pop_back
if (mpi_rank == 0)
dest_edges.push_back(1);
common::IndexMap idx_map(MPI_COMM_WORLD, size_local, dest_edges, ghosts,
global_ghost_owner);
common::subsystem::finalize_petsc();
return 0;
}
It hangs in compute_owned_shared on the first MPI_Neighbor_alltoall.
Using init_mpi instead of init_petsc, I get the following error message:
An error occurred in MPI_Neighbor_alltoall
*** reported by process [305790977,0]
*** on communicator MPI COMMUNICATOR 4 CREATE FROM 0
*** MPI_ERR_OTHER: known error not in list
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
*** and potentially your MPI job
This is caused by user error? Is there a low cost check?
@IgorBaratta
I think this can only be caused by user error (non consistent input). Checking for correctness of inputs requires communication, so it would be expensive.