More clarification is necessary for your comment

Open statcom opened this issue 7 years ago • 0 comments

First of all, thanks for sharing your code. But while reading your code, I found following comment in many-GPUs-MNIST.py could be easily misunderstood.

'''
Multi GPUs Usage
Results on P40
 * Single GPU computation time: 0:00:22.252533
 * 2 GPU computation time: 0:00:12.632623
 * 4 GPU computation time: 0:00:11.083071
 * 8 GPU computation time: 0:00:11.990167
 
Need to change batch size and learning rates
     for training more efficiently
'''

I don't think single GPU is much slower than multi-GPUs for computation of each iteration if it can handle a given batch size. What the paper suggested was that multiGPU can handle NGPU x batch size (e.g,., 20000 for 2 GPUs) and make the loss value converges faster - thus, can lead to faster training.

The code runs 10 iterations of training the model and measures the time of training. And then those times were compared between different number of GPUs. I am afraid that it is not showing anything the paper claimed.

As for the time measured, I suspect the reason single GPU took almost twice time than that of two GPUs might be something else. Practically, there is negligible difference between 1 GPU and multi-GPU in terms of calculation of each epoch. For example, your code runs 14.52 seconds with single 1080 and 14.48 sec with two 1080s.

Jan 03 '19 00:01 statcom