Yibo Zhu

Results 134 comments of Yibo Zhu

We may implement it in a few months.

Below are some numbers. The following experiments are performed on a public cloud with 20 Gbps networks. Each machine has 8 Tesla V100 16GB GPUs (with NVLink-enabled). The batch size...

Yes. You can try them yourself. The original ps-lite implementation is pretty poor -- it is slower than Horovod, let alone BytePS.

There is a launcher you can try, see the README in the folder https://github.com/bytedance/byteps/tree/master/launcher

Hello @nowei , would you confirm that you can run EVAL_TYPE=benchmark with multiple GPUs? If so, we can narrow down the problem to be in `train_mnist_byteps.py`

Would you set NCCL_DEBUG=INFO and run again? You may also set BYTEPS_LOG_LEVEL=INFO or even BYTEPS_LOG_LEVEL=TRACE. Then paste us the logs (it may be very long if you set BYTEPS_LOG_LEVEL=TRACE). Thanks.

@nowei Thank you. You are right. INFO does not give anything new. The useful level is DEBUG. However, TRACE would include anything that DEBUG outputs, so what you have is...

@nowei If you repeat multiple times with TRACE logs, does it always die on the key `1048576`? From the logs you paste, you can see that the last few lines...

Thanks. This is very helpful. So, it's a deterministic bug. There has to be something special about this tensor `byteps.Parameter.dampening.0_0`

@nowei Would you do one more favor? Comment out this line and try again. https://github.com/bytedance/byteps/blob/master/example/pytorch/train_mnist_byteps.py#L109