CNNbasedMedicalSegmentation icon indicating copy to clipboard operation
CNNbasedMedicalSegmentation copied to clipboard

problem in running train.py

Open debbie-ly77 opened this issue 8 years ago • 4 comments

Hi,

I'm trying to run your examples on my machine through your guide like python train.py fcn_rffc4 brats_fold0 brats_fold0 600 -ch False and get the following errors:

Traceback (most recent call last): File "/home/liuyan/CNNbasedMedicalSegmentation/train.py", line 305, in start_training(model_code, data_code, checkpoint, max_passes=n_epochs, train_dir=train_dir) File "/home/liuyan/CNNbasedMedicalSegmentation/train.py", line 273, in start_training coach, model_def = setup_training(model_code, data_code, checkpoint, max_passes) File "/home/liuyan/CNNbasedMedicalSegmentation/train.py", line 162, in setup_training model, model_def, t_dic = build_model(model_code, checkpoint, info) File "/home/liuyan/CNNbasedMedicalSegmentation/train.py", line 122, in build_model perform_transform=model_def.perform_transform File "/home/liuyan/CNNbasedMedicalSegmentation/conv3d/model.py", line 104, in init self._init_exprs() File "/home/liuyan/CNNbasedMedicalSegmentation/conv3d/model.py", line 149, in _init_exprs declare=parameters.declare File "/home/liuyan/CNNbasedMedicalSegmentation/conv3d/cnn3d.py", line 56, in init super(SequentialModel, self).init(declare=declare, name=name) File "/home/liuyan/breze/breze/arch/construct/base.py", line 27, in init self._forward() File "/home/liuyan/CNNbasedMedicalSegmentation/conv3d/cnn3d.py", line 246, in _forward layer = self._make_layer(lv, inpt, height, width, depth, n_chans, i) File "/home/liuyan/CNNbasedMedicalSegmentation/conv3d/cnn3d.py", line 230, in _make_layer declare=self.declare File "/home/liuyan/CNNbasedMedicalSegmentation/conv3d/basic.py", line 398, in init super(BilinearUpsample3d, self).init(declare=declare, name=name) File "/home/liuyan/breze/breze/arch/construct/base.py", line 27, in init self._forward() File "/home/liuyan/CNNbasedMedicalSegmentation/conv3d/basic.py", line 441, in _forward pre_res = bilinear_upsampling(input=inpt, ratio=self.up_factor) File "/home/liuyan/anaconda2/lib/python2.7/site-packages/theano/tensor/nnet/abstract_conv.py", line 569, in bilinear_upsampling row * ratio, col * ratio)) File "/home/liuyan/anaconda2/lib/python2.7/site-packages/theano/tensor/var.py", line 327, in reshape return theano.tensor.basic.reshape(self, shape, ndim=ndim) File "/home/liuyan/anaconda2/lib/python2.7/site-packages/theano/tensor/basic.py", line 4526, in reshape newshape = as_tensor_variable(newshape) File "/home/liuyan/anaconda2/lib/python2.7/site-packages/theano/tensor/basic.py", line 208, in as_tensor_variable raise AsTensorError("Cannot convert %s to TensorType" % str_x, type(x)) theano.tensor.var.AsTensorError: ('Cannot convert (None, None, Elemwise{mul,no_inplace}.0, Elemwise{mul,no_inplace}.0) to TensorType', <type 'tuple'>) Exception TypeError: TypeError("'NoneType' object is not callable",) in <bound method CUDAMatrix.del of <cudamat.cudamat.CUDAMatrix object at 0x7fb2cf97f990>> ignored Exception TypeError: TypeError("'NoneType' object is not callable",) in <bound method CUDAMatrix.del of <cudamat.cudamat.CUDAMatrix object at 0x7fb2cf97fb50>> ignored

I already have all the dependencies except my theano version is 0.8.2 while the author's suggestion is 0.9.0. It is because when I use 0.9.0 I have another error which says : ImportError: cannot import name downsample

What can I do to fix the problem and make the example running ? Could you guys give me any suggestion or clue to solve my problem please?

Thank you.

debbie-ly77 avatar Nov 10 '17 04:11 debbie-ly77

It seems to me that the problem is related to bilinear upsampling. You can try the following to work around this issue: In model_defs.py, lines 421-428:

        {'i':55, 'type': 'skip', 'src': 33},
        {'i':56, 'type': 'conv', 'fs': (1, 1, 1), 'nkerns': 5},
        {'i':57, 'type': 'bint', 'up': 2},
        {'i':58, 'type': 'skip', 'src': 43},
        {'i':59, 'type': 'conv', 'fs': (1, 1, 1), 'nkerns': 5},
        {'i':60, 'type': 'shortcut', 'src': 57, 'dst': 59},
        {'i':61, 'type': 'bint', 'up': 2},
        {'i':62, 'type': 'shortcut', 'src': 54, 'dst': 61}

replace every {'i':x, 'type': 'bint', 'up': 2} with {'i':x, 'type': 'deconv', 'fs': (3, 3, 3), 'nkerns': 5, 'up': (2, 2, 2)}. This replaces all bilinear interpolation with deconvolution, which in theory should work just as well.

bkayalibay avatar Nov 10 '17 10:11 bkayalibay

Thank you. It could start training now. But a new problem is coming.

Building model, coach... input data dimensions: h: 160 w: 144 d: 128 set stats: train: 200, valid: 37, test: 37 No checkpoint available, using random initialization instead. Starting training... ERROR (theano.gof.opt): Optimization failure due to: local_useless_inc_subtensor ERROR (theano.gof.opt): node: IncSubtensor{Inc;int64:int64:}(Elemwise{add,no_inplace}.0, Reshape{1}.0, Constant{409936}, Constant{409952}) ERROR (theano.gof.opt): TRACEBACK: ERROR (theano.gof.opt): Traceback (most recent call last): File "/home/liuyan/anaconda2/lib/python2.7/site-packages/theano/gof/opt.py", line 1772, in process_node replacements = lopt.transform(node) File "/home/liuyan/anaconda2/lib/python2.7/site-packages/theano/tensor/opt.py", line 2313, in local_useless_inc_subtensor c = get_scalar_constant_value(node.inputs[0]) File "/home/liuyan/anaconda2/lib/python2.7/site-packages/theano/tensor/basic.py", line 662, in get_scalar_constant_value v.owner.op.perform(v.owner, const, ret) File "/home/liuyan/anaconda2/lib/python2.7/site-packages/theano/tensor/elemwise.py", line 839, in perform super(Elemwise, self).perform(node, inputs, output_storage) File "/home/liuyan/anaconda2/lib/python2.7/site-packages/theano/gof/op.py", line 769, in perform "Did you used Theano flags mode=FAST_COMPILE?" MethodNotDefined: ('perform', <class 'theano.tensor.elemwise.Elemwise'>, 'Elemwise', 'Did you used Theano flags mode=FAST_COMPILE? You can use optimizer=fast_compile instead.')

/home/liuyan/climin/climin/util.py:151: UserWarning: Argument named f is not expected by <class 'climin.adam.Adam'> % (i, klass)) /home/liuyan/breze/breze/learn/base.py:39: UserWarning: Implicilty converting numpy.ndarray to gnumpy.garray warnings.warn('Implicilty converting numpy.ndarray to gnumpy.garray') Error allocating 47185920 bytes of device memory (out of memory). Driver report 1638400 bytes free and 4238540800 bytes total Traceback (most recent call last): File "/home/liuyan/CNNbasedMedicalSegmentation/train.py", line 305, in start_training(model_code, data_code, checkpoint, max_passes=n_epochs, train_dir=train_dir) File "/home/liuyan/CNNbasedMedicalSegmentation/train.py", line 277, in start_training coach.fit() File "/home/liuyan/CNNbasedMedicalSegmentation/ash.py", line 350, in fit for i in self.iter_fit(*self.data['train']): File "/home/liuyan/CNNbasedMedicalSegmentation/ash.py", line 363, in iter_fit for info in self.model.iter_fit(*fit_data): File "/home/liuyan/breze/breze/learn/base.py", line 302, in iter_fit for i, info in enumerate(opt): File "/home/liuyan/climin/climin/base.py", line 84, in iter for info in self._iterate(): File "/home/liuyan/climin/climin/adam.py", line 177, in _iterate gradient = self.fprime(self.wrt, *args, **kwargs) File "/home/liuyan/breze/breze/arch/util.py", line 184, in inner res = f(*args) File "/home/liuyan/anaconda2/lib/python2.7/site-packages/theano/compile/function_module.py", line 871, in call storage_map=getattr(self.fn, 'storage_map', None)) File "/home/liuyan/anaconda2/lib/python2.7/site-packages/theano/gof/link.py", line 314, in raise_with_op reraise(exc_type, exc_value, exc_trace) File "/home/liuyan/anaconda2/lib/python2.7/site-packages/theano/compile/function_module.py", line 859, in call outputs = self.fn() MemoryError: Error allocating 47185920 bytes of device memory (out of memory). Apply node that caused the error: GpuElemwise{Composite{((i0 * i1) + i2)},no_inplace}(GpuReshape{5}.0, GpuElemwise{true_div,no_inplace}.0, GpuReshape{5}.0) Toposort index: 1631 Inputs types: [CudaNdarrayType(float32, (True, True, False, True, True)), CudaNdarrayType(float32, 5D), CudaNdarrayType(float32, (True, True, False, True, True))] Inputs shapes: [(1, 1, 32, 1, 1), (1, 64, 32, 80, 72), (1, 1, 32, 1, 1)] Inputs strides: [(0, 0, 1, 0, 0), (0, 184320, 5760, 72, 1), (0, 0, 1, 0, 0)] Inputs values: ['not shown', 'not shown', 'not shown'] Outputs clients: [[GpuElemwise{minimum,no_inplace}(GpuElemwise{Composite{((i0 * i1) + i2)},no_inplace}.0, CudaNdarrayConstant{[[[[[ 0.]]]]]}), GpuElemwise{maximum,no_inplace}(GpuElemwise{Composite{((i0 * i1) + i2)},no_inplace}.0, CudaNdarrayConstant{[[[[[ 0.]]]]]}), GpuElemwise{Composite{Cast{float32}(EQ(i0, i1))}}[(0, 0)](GpuElemwise{minimum,no_inplace}.0, GpuElemwise{Composite{((i0 * i1) + i2)},no_inplace}.0), GpuElemwise{Composite{Cast{float32}(EQ(i0, i1))},no_inplace}(GpuElemwise{maximum,no_inplace}.0, GpuElemwise{Composite{((i0 * i1) + i2)},no_inplace}.0)]]

HINT: Re-running with most Theano optimization disabled could give you a back-trace of when this node was created. This can be done with by setting the Theano flag 'optimizer=fast_compile'. If that does not work, Theano optimizations can be disabled with 'optimizer=None'. HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node. Exception TypeError: TypeError("'NoneType' object is not callable",) in <bound method CUDAMatrix.del of <cudamat.cudamat.CUDAMatrix object at 0x7f6702bc0d90>> ignored Exception TypeError: TypeError("'NoneType' object is not callable",) in <bound method CUDAMatrix.del of <cudamat.cudamat.CUDAMatrix object at 0x7f66d1d53f10>> ignored Exception TypeError: TypeError("'NoneType' object is not callable",) in <bound method CUDAMatrix.del of <cudamat.cudamat.CUDAMatrix object at 0x7f6702bc0f50>> ignored Exception TypeError: TypeError("'NoneType' object is not callable",) in <bound method CUDAMatrix.del of <cudamat.cudamat.CUDAMatrix object at 0x7f6701a01150>> ignored

Process finished with exit code 1

I'm not sure about whether it is caused by the memory of GPU. Currently, I only have 4G available. But I'm not sure about the type error. It seems like the cudamat doesn't work. Any suggestion?

debbie-ly77 avatar Nov 13 '17 04:11 debbie-ly77

I'm not sure if 4GB would be enough for that cnn with that input size. You can try running the code with fewer features or on inputs that have smaller spatial size to confirm that it's a memory problem.

bkayalibay avatar Nov 13 '17 14:11 bkayalibay

I'm working on decrease the input dimensions to see what will happen. Thank you.

debbie-ly77 avatar Nov 15 '17 07:11 debbie-ly77