lbann
lbann copied to clipboard
(Slightly less low priority) MPI Catch test output
There's a lot of output from the Catch2 MPI tests -- too much to open individual issues. It should be cleaned up so a clean test generates no output.
Actual:
> mpirun -n 2 ./unit_test/mpi-catch-tests
...
# MORE THAN 600 LINES OF OUTPUT
...
===============================================================================
All tests passed (19117 assertions in 199 test cases)
===============================================================================
All tests passed (19117 assertions in 199 test cases)
Expected:
> mpirun -n 2 ./unit_test/mpi-catch-tests
===============================================================================
All tests passed (19117 assertions in 199 test cases)
===============================================================================
All tests passed (19117 assertions in 199 test cases)
What is the flavor of the error messages, and are they all unique issues?
They're often not error messages, they're things like the LBANN startup flavor text, for example.
Num. I/O Threads: 1 (Limited to # Unused Compute Cores or 1) at offset 4
error in hwloc_set_cpubind, error=Unknown error: -1
error in hwloc_set_cpubind, error=Unknown error: -1
Testing using 0 samples.
Hardware properties (for master process)
Processes on node : 2
Total number of processes : 2
OpenMP threads per process : 4
I/O threads per process (+offset) : 1 (+4)
Background I/O enabled : 1
Running: LLNL LBANN version: 0.103.0 (v0.102-145-g5657072-dirty)
LLNL Hydrogen version: 1.5.2 (v1.5.1-8-gc6c220d-dirty)
Build settings
Type : Release
Aluminum : NOT detected
GPU : NOT detected
cuDNN : NOT detected
CUB : NOT detected
MV2_USE_CUDA :
Trainer settings
Trainers : 1
Processes per trainer : 2
Grid dimensions : 1 x 2
...
It looks like this happens when constructing the trainer. This issue could be rebranded as "Stop producing output side-effects when doing core functional things." Untangling the lbann_library.cpp APIs is a bit of a headache...