yangwenhuan
yangwenhuan
> I follow the [step-by-step-tutorial](https://github.com/bytedance/byteps/blob/master/docs/step-by-step-tutorial.md) to run distributed training with mxnet and tensorflow, both hang. > I have 3 nodes > and on first node I run scheduler and server...
@ymjiang , not yet, I try it now.
@ymjiang, it seems communication is successful. scheduler: BytePS launching scheduler [10:57:17] src/./zmq_van.h:61: BYTEPS_ZMQ_MAX_SOCKET set to 1024 [10:57:17] src/./zmq_van.h:66: BYTEPS_ZMQ_NTHREADS set to 4 [10:57:17] src/van.cc:357[: Bind to role=scheduler, id=1, ip=127.0.0.1, port=12345,...
> @tingweiwu Can you try to run the basic pslite benchmark? Here are the steps you need to follow: > > Install ps-lite > > ``` > git clone --single-branch...