linjiaqin
linjiaqin
@tgaddair hello, I have reinstall horovod with the command you mentioned. But the problem still exists. Some nodes can work. But some nodes can't. There are 10 nodes in my...
@tgaddair @marcfielding1 after using " --verbose". I find thees error information. What should I do next? tensorflow.python.framework.errors_impl.NotFoundError: /usr/lib/x86_64-linux-gnu/libstdc++.so.6: version `CXXABI_1.3.8' not found (required by /software/hadoop3/anaconda3/envs/horovod/lib/python3.6/site-packages/horovod/tensorflow/mpi_lib.cpython-36m-x86_64-linux-gnu.so)
@marcfielding1 I have try your suggestion above, but it occurred the same errors. I use gcc-5.5, openmpi4.0.1. Are there any problem with the version?
@marcfielding1 My OS is ubuntu 14.04. When I delete the conda env, recreate a new one and install with "HOROVOD_WITH_MPI=1 HOROVOD_WITH_TENSORFLOW=1 pip install --no-cache-dir horovod" .It occur "horovod didn't build...
@marcfielding1 yes, I install tensorflow before installing horovod . But when I uninstall tensoflow and horovod. Then i install the horovod again. But It occurs no tensorflow module errors as...