lbann icon indicating copy to clipboard operation
lbann copied to clipboard

(Slightly less low priority) MPI Catch test output

Open benson31 opened this issue 4 years ago • 2 comments

There's a lot of output from the Catch2 MPI tests -- too much to open individual issues. It should be cleaned up so a clean test generates no output.

Actual:

> mpirun -n 2 ./unit_test/mpi-catch-tests
...
# MORE THAN 600 LINES OF OUTPUT
...
===============================================================================
All tests passed (19117 assertions in 199 test cases)

===============================================================================
All tests passed (19117 assertions in 199 test cases)

Expected:

> mpirun -n 2 ./unit_test/mpi-catch-tests
===============================================================================
All tests passed (19117 assertions in 199 test cases)

===============================================================================
All tests passed (19117 assertions in 199 test cases)

benson31 avatar Feb 08 '22 20:02 benson31

What is the flavor of the error messages, and are they all unique issues?

timmoon10 avatar Feb 08 '22 21:02 timmoon10

They're often not error messages, they're things like the LBANN startup flavor text, for example.

        Num. I/O Threads: 1 (Limited to # Unused Compute Cores or 1) at offset 4
error in hwloc_set_cpubind, error=Unknown error: -1
error in hwloc_set_cpubind, error=Unknown error: -1
Testing using 0 samples.
Hardware properties (for master process)
  Processes on node          : 2
  Total number of processes  : 2
  OpenMP threads per process : 4
  I/O threads per process (+offset) : 1 (+4)
  Background I/O enabled     : 1

Running: LLNL LBANN version: 0.103.0 (v0.102-145-g5657072-dirty)
         LLNL Hydrogen version: 1.5.2 (v1.5.1-8-gc6c220d-dirty)

Build settings
  Type     : Release
  Aluminum : NOT detected
  GPU     : NOT detected
  cuDNN    : NOT detected
  CUB      : NOT detected
  MV2_USE_CUDA :

Trainer settings
  Trainers              : 1
  Processes per trainer : 2
  Grid dimensions       : 1 x 2
  ...

It looks like this happens when constructing the trainer. This issue could be rebranded as "Stop producing output side-effects when doing core functional things." Untangling the lbann_library.cpp APIs is a bit of a headache...

benson31 avatar Feb 08 '22 22:02 benson31