matex
matex copied to clipboard
MPI AllReduce error
I got the following errors
2018-07-16 15:27:27.536541: W tensorflow/core/framework/op_kernel.cc:1192] Unknown: Exception: Message truncated, error stack: MPI_Allreduce(855)..................: MPI_Allreduce(sbuf=MPI_IN_PLACE, rbuf=0x2049aaa00, count=256, MPI_FLOAT, MPI_SUM, MPI_COMM_WORLD) failed MPIR_Allreduce_impl(712)............: MPIR_Allreduce_intra(357)...........: MPIC_Sendrecv(186)..................: MPIDI_CH3U_Request_unpack_uebuf(599): Message truncated; 1536 bytes received but buffer size is 1024
Any comments!