GEOS icon indicating copy to clipboard operation
GEOS copied to clipboard

Generating multiple meshes in parallel

Open andrembcosta opened this issue 2 years ago • 1 comments

Hi all,

I have noticed recently that the InternalMeshGenerator crashes in parellel runs when I have more than one meshBody.

Could someone with a better understanding of mesh generation check what could be going on? There is a test file in the inputFiles folder (which should actually be an integratedTest), which one can use to replicate the error. Just run

srun -n 4 geosx -i inputFiles/multipleMeshBodies/testMultipleBodies.xml -x 2 -y 2 -z 1

I am getting the following error:

[macieira@quartz1154:/usr/workspace/macieira/new_geosx_folder/GEOScopy/GEOS]$srun -p pdebug -n 4 build-quartz-clang\@14-debug/bin/geosx -i inputFiles/multipleMeshBodies/testMultipleBodies.xml -x 2 -y 2 
srun: job 480016 queued and waiting for resources
srun: job 480016 has been allocated resources
Num ranks: 4
Max threads: 9
MKL max threads: 9
GEOSX version: 0.2.0 (feature/andrembcosta/arbitraryFractureShape, sha1: 7db3aee18)
  - c++ compiler: clang 14.0.6
  - openmp version: 201811
  - MPI version: MVAPICH2 Version      :        2.3.7
MVAPICH2 Release date : Wed March 02 22:00:00 EST 2022
MVAPICH2 Device       : ch3:psm
MVAPICH2 configure    : --prefix=/usr/tce/backend/installations/linux-rhel8-x86_64/clang-14.0.6/mvapich2-2.3.7-x3u23fmm2xvki3kxsugjhuxeblakeame --enable-shared --enable-romio --disable-silent-rules --disable-new-dtags --enable-fortran=all --enable-threads=multiple --with-ch3-rank-bits=32 --enable-wrapper-rpath=yes --disable-alloca --enable-fast=all --disable-cuda --enable-registration-cache --with-pm=hydra --with-device=ch3:psm --with-psm2=/usr --with-file-system=lustre+nfs+ufs --enable-llnl-site-specific-options --enable-debuginfo
MVAPICH2 CC           : /usr/tce/spack/lib/spack/env/clang/clang    -DNDEBUG -DNVALGRIND -O2
MVAPICH2 CXX          : /usr/tce/spack/lib/spack/env/clang/clang++   -DNDEBUG -DNVALGRIND -O2
MVAPICH2 F77          : /usr/tce/spack/lib/spack/env/clang/gfortran   -O2
MVAPICH2 FC           : /usr/tce/spack/lib/spack/env/clang/gfortran   -O2

  - HDF5 version: 1.12.1
  - Conduit version: 0.8.2
  - VTK version: 9.1.0
  - RAJA version: 2022.3.0
  - umpire version: 2022.3.0
  -  adiak version: ..
  - caliper version: 2.8.0
  - METIS version: 5.1.0
  - PARAMETIS version: 4.0.3
  - scotch version: 6.0.9
  - superlu_dist version: 6.3.0
  - suitesparse version: 5.7.9
  - Python3 version: 3.10.8
  - hypre release version: 2.28.0
Started at 2023-06-08 17:40:01
Adding Solver of type SolidMechanicsLagrangianSSLE, named SolidMechSolveBody1
Adding Solver of type SolidMechanicsLagrangianSSLE, named SolidMechSolveBody2
Adding Mesh: InternalMesh, body1
Adding Mesh: InternalMesh, body2
Adding Event: PeriodicEvent, solverApplications1
Adding Event: PeriodicEvent, solverApplications2
Adding Event: PeriodicEvent, outputs
Adding Event: PeriodicEvent, siloOutputs
Adding Event: PeriodicEvent, restarts
   TableFunction: f_b
Adding Output: Silo, MultiBodyTest_SiloOutput
Adding Output: VTK, MultiBodyTest_VTKOutput
Adding Output: Restart, restartOutput
Adding Object CellElementRegion named body1_cer from ObjectManager::Catalog.
Adding Object CellElementRegion named body2_cer from ObjectManager::Catalog.
body1: total number of nodes = 72
body1: total number of elems = 25
body2: total number of nodes = 32
body2: total number of elems = 9
***** ERROR
***** LOCATION: /usr/WS2/macieira/new_geosx_folder/GEOScopy/GEOS/src/coreComponents/LvArray/src/ArrayView.hpp:548
***** Controlling expression (should be false): index < 0 || index >= m_dims[ 0 ]
Array Bounds Check Failed: index=2 m_dims[0]=2

** StackTrace of 13 frames **
Frame 0: std::enable_if<(1)==(1), double&>::type LvArray::ArrayView<double, 1, 0, int, LvArray::ChaiBuffer>::operator[]<1>(int) const & 
Frame 1: void geos::InternalMeshGenerator::getNodePosition<LvArray::ArraySlice<double, 1, 0, int> >(int const (&) [3], int, LvArray::ArraySlice<double, 1, 0, int>&&) 
Frame 2: geos::InternalMeshGenerator::generateMesh(geos::DomainPartition&) 
Frame 3: /usr/WS2/macieira/new_geosx_folder/GEOScopy/GEOS/build-quartz-clang@14-debug/lib/libgeosx_core.so 
Frame 4: /usr/WS2/macieira/new_geosx_folder/GEOScopy/GEOS/build-quartz-clang@14-debug/lib/libgeosx_core.so 
Frame 5: /usr/WS2/macieira/new_geosx_folder/GEOScopy/GEOS/build-quartz-clang@14-debug/lib/libgeosx_core.so 
Frame 6: /usr/WS2/macieira/new_geosx_folder/GEOScopy/GEOS/build-quartz-clang@14-debug/lib/libgeosx_core.so 
Frame 7: geos::MeshManager::generateMeshes(geos::DomainPartition&) 
Frame 8: geos::ProblemManager::generateMesh() 
Frame 9: geos::ProblemManager::problemSetup() 
Frame 10: geos::GeosxState::initializeDataRepository() 
Frame 11: main 
Frame 12: __libc_start_main 
Frame 13: _start 
=====

application called MPI_Abort(MPI_COMM_WORLD, 1) - process 1
[cli_1]: aborting job:
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 1

Apparently, the partitioning of the second meshBody isn't working properly, with rank0 getting all the data and the other ranks getting nothing.

EDIT: Apparently, this only happens if the mesh coordinates go into the negative range.

andrembcosta avatar Jun 09 '23 00:06 andrembcosta

see also https://github.com/GEOS-DEV/GEOS/issues/2042

paveltomin avatar Sep 17 '25 19:09 paveltomin