FastMaskRCNN icon indicating copy to clipboard operation
FastMaskRCNN copied to clipboard

OOM when allocating tensor with shape[512,256,14,14]

Open Sharathnasa opened this issue 8 years ago • 1 comments

@souryuu When i took the latest version of your code for training, i got the below error. Did you faced this kind of issue?

tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[512,256,14,14]          [[Node: pyramid_1/Conv2d_transpose/conv2d_transpose = Conv2DBackpropInput[T=DT_FLOAT, data_format="NHWC", padding="VALID", strides=[1, 2, 2, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/gpu:0"](pyramid_1/Conv2d_transpose/stack, pyramid/Conv2d_transpose/weights/read, pyramid_1/Conv_3/Relu)]]          [[Node: pyramid_2/Reshape_72/_2085 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_10899_pyramid_2/Reshape_72", tensor_type=DT_INT64, _device="/job:localhost/replica:0/task:0/cpu:0"]]

Caused by op u'pyramid_1/Conv2d_transpose/conv2d_transpose', defined at:   File "train/train.py", line 361, in     train()   File "train/train.py", line 211, in train     loss_weights=[1.0, 1.0, 200.0, 2.0, 10.0])   File "train/../libs/nets/pyramid_network.py", line 631, in build     is_training=is_training, gt_boxes=gt_boxes)   File "train/../libs/nets/pyramid_network.py", line 395, in build_heads     m = slim.conv2d_transpose(m, 256, 2, stride=2, padding='VALID', activation_fn=tf.nn.relu)   File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/framework/python/ops/arg_scope.py", line 181, in func_with_args     return func(*args, **current_args)   File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/layers/python/layers/layers.py", line 1136, in convolution2d_transpose     outputs = layer.apply(inputs)   File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/layers/base.py", line 320, in apply     return self.call(inputs, **kwargs)   File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/layers/base.py", line 290, in call     outputs = self.call(inputs, **kwargs)   File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/layers/convolutional.py", line 1102, in call     data_format=utils.convert_data_format(self.data_format, ndim=4))   File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/nn_ops.py", line 1104, in conv2d_transpose     name=name)   File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_nn_ops.py", line 496, in conv2d_backprop_input     data_format=data_format, name=name)   File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 768, in apply_op     op_def=op_def)   File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2336, in create_op     original_op=self._default_original_op, op_def=op_def)   File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1228, in init     self._traceback = _extract_stack()

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[512,256,14,14]          [[Node: pyramid_1/Conv2d_transpose/conv2d_transpose = Conv2DBackpropInput[T=DT_FLOAT, data_format="NHWC", padding="VALID", strides=[1, 2, 2, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/gpu:0"](pyramid_1/Conv2d_transpose/stack, pyramid/Conv2d_transpose/weights/read, pyramid_1/Conv_3/Relu)]]          [[Node: pyramid_2/Reshape_72/_2085 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_10899_pyramid_2/Reshape_72", tensor_type=DT_INT64, _device="/job:localhost/replica:0/task:0/cpu:0"]]

Regards, Sharath

Sharathnasa avatar Aug 11 '17 10:08 Sharathnasa

use smaller batch size like 8 or 12 also smaller vocab. check https://github.com/tensorflow/nmt/issues/348 thats what i am doing right now.. i'll get back to this post if it doesnt work

tqpgun avatar Aug 12 '18 20:08 tqpgun