KungFu
KungFu copied to clipboard
Fast and Adaptive Distributed Machine Learning for TensorFlow, PyTorch and MindSpore.
**The asynchronous collective communication layer also avoids having an expensive central coordinator, as used for invoking synchronous collective communication operations inexisting systems, such as Horovod.** I see the paper of...
Hello! I am looking for training distribution framework and did you try it on windows? Because I am facing linking issue with tensorflow (possibly MinGW and MSVC linking conflicts)
In the paper based on this project it is mentioned: "We implement an AP that adapts the batch size based on GNS when training the ResNet-56 model with the CIFAR-10...