Yimin Jiang
Yimin Jiang
This is weird. Can you share your cmd to reproduce? What is your byteps version?
You don't need to. MXNet-BytePS's implementation bypasses kvstore.
There will be performance difference even if using the same setup as you said. We did many performance optimizations on BytePS. For example, compared to mxnet native, BytePS-mxnet eliminates some...
@ZHAIXINGZHAIYUE I believe you won't have that problem if you configure byteps correctly. We never meet this when using byteps.
Your title is "Segmentation fault using multiple nodes with multi-gpu". Are you sure about this? From the log it seems only relevant to NCCL, and should also happen using a...
What is your version number of ps-lite?
The docker image may not have the latest BytePS code. Would you install with `pip3 install byteps==v0.2.5`? And could you check the result of `ulimit -l`? Registering memory region may...
The IPCTransport class: https://github.com/bytedance/ps-lite/blob/byteps/src/rdma_transport.h#L469
> In other words, the communication process of `summation service` and `communication service` can be equivalent to the communication of `server` and `worker`? Your understanding is right.
Seems like there are two questions. The first one is about API changes after the gradient compression PR. @vycezhong Can you please take a look? The second question is about...