OOM when allocating tensor with shape[512,256,14,14]

Open Sharathnasa opened this issue 8 years ago • 1 comments

@souryuu When i took the latest version of your code for training, i got the below error. Did you faced this kind of issue?

tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[512,256,14,14] [[Node: pyramid_1/Conv2d_transpose/conv2d_transpose = Conv2DBackpropInput[T=DT_FLOAT, data_format="NHWC", padding="VALID", strides=[1, 2, 2, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/gpu:0"](pyramid_1/Conv2d_transpose/stack, pyramid/Conv2d_transpose/weights/read, pyramid_1/Conv_3/Relu)]] [[Node: pyramid_2/Reshape_72/_2085 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_10899_pyramid_2/Reshape_72", tensor_type=DT_INT64, _device="/job:localhost/replica:0/task:0/cpu:0"]]

Caused by op u'pyramid_1/Conv2d_transpose/conv2d_transpose', defined at: File "train/train.py", line 361, in train() File "train/train.py", line 211, in train loss_weights=[1.0, 1.0, 200.0, 2.0, 10.0]) File "train/../libs/nets/pyramid_network.py", line 631, in build is_training=is_training, gt_boxes=gt_boxes) File "train/../libs/nets/pyramid_network.py", line 395, in build_heads m = slim.conv2d_transpose(m, 256, 2, stride=2, padding='VALID', activation_fn=tf.nn.relu) File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/framework/python/ops/arg_scope.py", line 181, in func_with_args return func(*args, **current_args) File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/layers/python/layers/layers.py", line 1136, in convolution2d_transpose outputs = layer.apply(inputs) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/layers/base.py", line 320, in apply return self.call(inputs, **kwargs) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/layers/base.py", line 290, in call outputs = self.call(inputs, **kwargs) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/layers/convolutional.py", line 1102, in call data_format=utils.convert_data_format(self.data_format, ndim=4)) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/nn_ops.py", line 1104, in conv2d_transpose name=name) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_nn_ops.py", line 496, in conv2d_backprop_input data_format=data_format, name=name) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 768, in apply_op op_def=op_def) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2336, in create_op original_op=self._default_original_op, op_def=op_def) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1228, in init self._traceback = _extract_stack()

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[512,256,14,14] [[Node: pyramid_1/Conv2d_transpose/conv2d_transpose = Conv2DBackpropInput[T=DT_FLOAT, data_format="NHWC", padding="VALID", strides=[1, 2, 2, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/gpu:0"](pyramid_1/Conv2d_transpose/stack, pyramid/Conv2d_transpose/weights/read, pyramid_1/Conv_3/Relu)]] [[Node: pyramid_2/Reshape_72/_2085 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_10899_pyramid_2/Reshape_72", tensor_type=DT_INT64, _device="/job:localhost/replica:0/task:0/cpu:0"]]

Regards, Sharath

Aug 11 '17 10:08 Sharathnasa

use smaller batch size like 8 or 12 also smaller vocab. check https://github.com/tensorflow/nmt/issues/348 thats what i am doing right now.. i'll get back to this post if it doesnt work

Aug 12 '18 20:08 tqpgun