How to lower the total needed shared memory?
Hi Chou,
I am encountering the following problem when trying to run convolutional neural network.
terminate called after throwing an instance of 'std::runtime_error'
what(): [Error] In function "getSuitableShmConfig" (at src/cnn-utility.cu:418): Exceeds maximum shared memory available. (49152 bytes)
kernel = (100, 92), grids = (8, 8, 40), threads = (4, 4, 1) => 75940 bytes of shared memory needed.
What is the standard way to fix this problem if I really don't have that much memory? (ex. lowering thread count)
Thanks in advance.
That's because the kernel is too large to fit into the shared memory.
Can you give me your command line arguments ? Such as --struct and --input-dim.
thanks
BTW, I'm considering changing all the convn codes to NVIDIA cuDNN API. But i'm still working on my master thesis...
The following is my command line arguments: --input-dim 122x105 --struct 40x23x14-6s-100x14x12-3s-6128-1024 --output-dim 32
Just curious about what these parameters (kernel = (100, 92), grids = (8, 8, 40), threads = (4, 4, 1)) means and how to lower them down.
Thank you for answering my question. :)
Ha, the error message is not what you think. But we usually choose a convolution kernel like 9x9, 7x7, 5x5 or 3x3. 23 x 14 is too big to fit into the share memory, so it'll take hell lots of time to train the network.
I think 122x105 is the image size in Machine Learning final contest. Scaling down the image to 64x64 and use a --struct like 10x5x5-2s-10x5x5-2s-10x4x4-2s-1023 will achieve around 80% accuracy.
As far as I know, you can achieve 7% error rate by using Caffe.
Actually the kernel size 23*14 is able to be run on my computer using theano. I am still wondering about what Caffe have done to get the 7% error rate. I can only achieve around 15% error rate with Lenet implemented by theano.
BTW, I found out the the nn-init would crash without any error message when the format of input training data is wrong. (I accidentally start my array from zero)
Can you please tell me the kernel size you choose in your LeNet ?
Okay, I'll add a guard clause at line 145 in src/data-io.cpp:
size_t j = stof(token.substr(0, pos)) - 1;
Thanks !!
The filter size is 23x14 and 14x12 number of kernel is 40 and 100
I ran it on GTX780, the implementation is pretty much like the code in the following link. http://deeplearning.net/tutorial/lenet.html I think the kernel size in libdnn is the same as LeNetConvPoolLayer.filter_shape in theano, right?
Yes, kernel is filter.
But the error message you mentioned before (kernel = (100, 92), grids = (8, 8, 40), threads = (4, 4, 1)) is the the way you think. XD
You see at line 772 in file src/feature-transform.cpp
Z[i] += convn_2(rot180(iImgs[i][k]), Y[k], this->get_output_img_size());
and line 442 in file src/cnn-utility.cu
mat convn_2(const mat& data, const mat& kernels, SIZE k) {
...
size_t SHM_SIZE = getSuitableShmConfig(grids, threads, k.m, k.n);
...
return output;
You can see that the third argument in convn_2 is this->get_output_img_size(), which is later passed to getSuitableShmConfig as variable k. And the size of output image in the first ConvolutionalLayer is [100, 92] ( = [122, 105] - [23, 14] + [1,1] )
Well, that doesn't means I won't fix this issue. I'm currently refactoring src/cnn-utility.cu and try to make it more friendly, readable, and faster. (it's 50% already 50% faster now). I'll change the runtime_error message to a warning message in the next release. But before that, I still suggest you use a smaller image by scaling down to 64 x 64, and try a smaller kernel.
I already heard that it's possible to achieve 95% in-sample accuracy and 92% out-of-sample accuracy using 3 layers of ConvolutionalLayer and SubsamplingLayer with 20% dropout. : )