Tom Benson
Tom Benson
Just to clarify: Are you suggesting that the python code or the C++ code is problematic here (or both)? To me, it seems that, if an error, it exists in...
I could also see LBANN just being ok with zero-data processes and having them idle. Wasteful, probably, but the linear algebra _should_ be robust to this.
I somewhat disagree that it's not representative. Presumably running out-of-the-box resnet-50 (default mb size of 256) on >64 nodes of Lassen will present this same error, so a naïve strong...
You have two options: Either you can rebuild protobuf in `Debug` or `Release` mode (it's probably in `RelWithDebInfo` mode) *or* you can pass `-DLBANN_USE_PROTOBUF_MODULE=ON` to LBANN's CMake.
It *is* set by the module as of CMake 3.10 -- I should tick up the minimum version. Anyway, try a newer CMake. Also, try a clean build directory.
Nevermind, I see you're using CMake 3.13 -- good! I'm wondering if you had some old cached state from a previous run before you tried `-DLBANN_USE_PROTOBUF_MODULE=ON`. If that's the case,...
Hmmm. This might be ugly, but it would be good if we could get some more information out of the build. Can you please try the following: 1. Add `-DCMAKE_CXX_FLAGS=-v`...
This is strange. The salient line from the CMake output is: ```CMake Warning at /opt/cmake/share/cmake-3.13/Modules/FindProtobuf.cmake:495 (message): Protobuf compiler version 3.0.0 doesn't match library version 3.7.0 ``` From `make_err.txt`, it seems...
I'm not sure what the underlying problem is, but it definitely seems protobuf-centric. My suggestion would be to try a simple [superbuild](https://lbann.readthedocs.io/en/latest/build_with_superbuild.html) to build Protobuf and LBANN. Basically, ```bash cmake...
Also, please update the release notes. Thanks!