Using a placeholder for is_training instead of two seperate graphs for training and testing.
Hi! Thanks a lot for the repo! The code is really well written and helped me a lot in understanding tensorflow multi GPU code.
I was wondering why you chose to build two separate graphs for train and testing, instead of just using a placeholder for is_training.
So essentially, the following code snippet in https://github.com/aymericdamien/TensorFlow-Examples/blob/master/examples/6_MultiGPU/multigpu_cnn.py (lines 128 - 148):
# tf Graph input
X = tf.placeholder(tf.float32, [None, num_input])
Y = tf.placeholder(tf.float32, [None, num_classes])
# Loop over all GPUs and construct their own computation graph
for i in range(num_gpus):
with tf.device(assign_to_device('/gpu:{}'.format(i), ps_device='/cpu:0')):
# Split data between GPUs
_x = X[i * batch_size: (i+1) * batch_size]
_y = Y[i * batch_size: (i+1) * batch_size]
# Because Dropout have different behavior at training and prediction time, we
# need to create 2 distinct computation graphs that share the same weights.
# Create a graph for training
logits_train = conv_net(_x, num_classes, dropout,
reuse=reuse_vars, is_training=True)
# Create another graph for testing that reuse the same weights
logits_test = conv_net(_x, num_classes, dropout,
reuse=True, is_training=False)
can be changed to
# tf Graph input
X = tf.placeholder(tf.float32, [None, num_input])
Y = tf.placeholder(tf.float32, [None, num_classes])
is_training = tf.placeholder(tf.bool)
# Loop over all GPUs and construct their own computation graph
for i in range(num_gpus):
with tf.device(assign_to_device('/gpu:{}'.format(i), ps_device='/cpu:0')):
# Split data between GPUs
_x = X[i * batch_size: (i+1) * batch_size]
_y = Y[i * batch_size: (i+1) * batch_size]
# Create a graph for training and testing
logits = conv_net(_x, num_classes, dropout,
reuse=reuse_vars, is_training=is_training)
While training and testing, you can feed True and False to is_training respectively. In this way, you only need to create half the number of graphs. Dropout does take the tensor 'is_train' as an argument, so you can just use this to control the usage of dropout. Is there any particular reason you didn't implement this approach?
Thanks!