Using a placeholder for is_training instead of two seperate graphs for training and testing.

Open suryatejadev opened this issue 6 years ago • 0 comments

Hi! Thanks a lot for the repo! The code is really well written and helped me a lot in understanding tensorflow multi GPU code.

I was wondering why you chose to build two separate graphs for train and testing, instead of just using a placeholder for is_training.

So essentially, the following code snippet in https://github.com/aymericdamien/TensorFlow-Examples/blob/master/examples/6_MultiGPU/multigpu_cnn.py (lines 128 - 148):

# tf Graph input
X = tf.placeholder(tf.float32, [None, num_input])
Y = tf.placeholder(tf.float32, [None, num_classes])

# Loop over all GPUs and construct their own computation graph
for i in range(num_gpus):
    with tf.device(assign_to_device('/gpu:{}'.format(i), ps_device='/cpu:0')):

        # Split data between GPUs
        _x = X[i * batch_size: (i+1) * batch_size]
        _y = Y[i * batch_size: (i+1) * batch_size]

        # Because Dropout have different behavior at training and prediction time, we
        # need to create 2 distinct computation graphs that share the same weights.

        # Create a graph for training
        logits_train = conv_net(_x, num_classes, dropout,
                                    reuse=reuse_vars, is_training=True)
        # Create another graph for testing that reuse the same weights
        logits_test = conv_net(_x, num_classes, dropout,
                                   reuse=True, is_training=False)

can be changed to

# tf Graph input
X = tf.placeholder(tf.float32, [None, num_input])
Y = tf.placeholder(tf.float32, [None, num_classes])
is_training = tf.placeholder(tf.bool)

# Loop over all GPUs and construct their own computation graph
for i in range(num_gpus):
    with tf.device(assign_to_device('/gpu:{}'.format(i), ps_device='/cpu:0')):

        # Split data between GPUs
        _x = X[i * batch_size: (i+1) * batch_size]
        _y = Y[i * batch_size: (i+1) * batch_size]

        # Create a graph for training and testing
        logits = conv_net(_x, num_classes, dropout,
                            reuse=reuse_vars, is_training=is_training)

While training and testing, you can feed True and False to is_training respectively. In this way, you only need to create half the number of graphs. Dropout does take the tensor 'is_train' as an argument, so you can just use this to control the usage of dropout. Is there any particular reason you didn't implement this approach?

Thanks!

Aug 31 '19 00:08 suryatejadev