omnitrace icon indicating copy to clipboard operation
omnitrace copied to clipboard

Dyninst trap issue redux

Open skyreflectedinmirrors opened this issue 3 years ago • 0 comments

Same issue we saw previously in LAMMPS where dyninst isn't catching traps correctly, but now in PIConGPU. To repro use the instructions in #145 but with a binary rewrite and run with:

./picongpu --mpiDirect -d 1 1 1 -g 240 272 224 --periodic 1 1 1 -s 100 -r 2
...
PIConGPUVerbose PHYSICS(1) | Sliding Window is OFF
PIConGPUVerbose PHYSICS(1) | used Random Number Generator: RNGProvider3XorMin seed: 42
PIConGPUVerbose PHYSICS(1) | Field solver condition: c * dt <= 1.00502 ? (c * dt = 1)
PIConGPUVerbose PHYSICS(1) | Resolving plasma oscillations?
   Estimates are based on DensityRatio to BASE_DENSITY of each species
   (see: density.param, speciesDefinition.param).
   It and does not cover other forms of initialization
PIConGPUVerbose PHYSICS(1) | species e: omega_p * dt <= 0.1 ? (omega_p * dt = 0.00104301)
PIConGPUVerbose PHYSICS(1) | macro particles per device: 365568000
PIConGPUVerbose PHYSICS(1) | typical macro particle weighting: 1.6384
PIConGPUVerbose PHYSICS(1) | UNIT_SPEED 2.99792e+08
PIConGPUVerbose PHYSICS(1) | UNIT_TIME 6.53658e-17
PIConGPUVerbose PHYSICS(1) | UNIT_LENGTH 1.95962e-08
PIConGPUVerbose PHYSICS(1) | UNIT_MASS 1.49248e-30
PIConGPUVerbose PHYSICS(1) | UNIT_CHARGE 2.62501e-19
PIConGPUVerbose PHYSICS(1) | UNIT_EFIELD 2.60765e+13
PIConGPUVerbose PHYSICS(1) | UNIT_BFIELD 86981.7
PIConGPUVerbose PHYSICS(1) | UNIT_ENERGY 1.34138e-13
PIConGPUVerbose PHYSICS(1) | Resolving Debye length for species "e"?
PIConGPUVerbose PHYSICS(1) | Estimate used momentum variance in 57120 supercells with at least 10 macroparticles each
PIConGPUVerbose PHYSICS(1) | 57120 (100 %) supercells had local Debye length estimate not resolved by a single cell
PIConGPUVerbose PHYSICS(1) | Estimated weighted average temperature 0.00049991 keV and corresponding Debye length 1.31401e-08 m.
   The grid has 0.0821258 cells per average Debye length
Trace/breakpoint trap (core dumped)

Using the workaround of:

export OMNITRACE_IGNORE_DYNINST_TRAMPOLINE=1

fails with:

### ERROR ###  [ rank : 0 ] Error code : 11 @ 0 :  Signal:    SIGSEGV (signal number:  11)                   segmentation violation. Unknown segmentation fault error: 128.
[PID=144196][TID=0][0/5]> omnitrace_pop_region +0x59b3
[PID=144196][TID=0][1/5]> omnitrace_pop_region +0x5ee8
[PID=144196][TID=0][2/5]> __restore_rt
[PID=144196][TID=0][3/5]> _ZN5pmacc11TaskReceiveINS_4math6VectorIfLi3ENS1_16StandardAccessorENS1_17StandardNavigatorENS1_6detail17Vector_componentsIfLi3EEEEELj3EE13executeInternEv +0x23c
[PID=144196][TID=0][4/5]> pmacc::Manager::execute_dyninst +0x186

Current workaround is to simply exclude TaskRecieve

skyreflectedinmirrors avatar Aug 31 '22 17:08 skyreflectedinmirrors