Sporadic crashes during initialization (2)
Describe the bug
(probably related with #461, while it is a bit different)
I m trying to run a setup with SeisSol_Release_sskx_5_elastic and with a large mesh (90e6 cells).
(it is a rough fault simulation, with 50m resolution on fault, and mesh resolving 3Hz up to 30 km to the fault.
I had several runs that crashed right after Info: Computing LTS weights. Done. (301421 reductions.).
I ran on 150 nodes, 2 tasks per nodes on supermuc.
(with a different mesh, it also crashes on 100 and 110 noes. I have the impression that the higher the node count the more chance of a crash you get).
Actually, for the same setup, I have 2 identical setups, the same number of nodes, one crashed (1991408.RFV.out), the other went through (1992221.RFV.out)
Expected behavior never crash
To Reproduce Steps to reproduce the behavior:
- origin/thomas/plastic_moment (9179d509d807184711d3893939aa539f885c7ddf) 3 commits on top of master (ce8c70e59415b829c9816a558ed5006f2ad59a09). And these 3 commits have nothing to do with initialization.
- intel compiler
- supermucNG.
Currently Loaded Modulefiles:
1) admin/1.0 3) lrz/1.0 5) intel/19.0.5 7) intel-mpi/2019-intel 9) cmake/3.16.5 11) libszip/2.1.1 13) metis/5.1.0-intel19-i64-r64 15) numactl/2.0.12-intel19
2) tempdir/1.0 4) spack/21.1.1 6) intel-mkl/2019 8) gcc/9.3.0 10) python/3.8.8-extended 12) parmetis/4.0.3-intel19-impi-i64-r64 14) netcdf-hdf5-all/4.7_hdf5-1.10-intel19-impi 16) yaml-cpp/0.6.3-intel19
- Provide parameter/material files. /hppfs/work/pr83no/di73yeq4/bug_DRV_param4_issue_507
Note that this problem only occurs with Intel compilers. (I run the setup for 2min on 150 nodes 5 times with GCC compiler without a problem).