libdnn icon indicating copy to clipboard operation
libdnn copied to clipboard

How to lower the total needed shared memory?

Open yangarbiter opened this issue 11 years ago • 7 comments

Hi Chou,

I am encountering the following problem when trying to run convolutional neural network.

terminate called after throwing an instance of 'std::runtime_error'
  what():  [Error] In function "getSuitableShmConfig" (at src/cnn-utility.cu:418): Exceeds maximum shared memory available. (49152 bytes)
kernel = (100, 92), grids = (8, 8, 40), threads = (4, 4, 1)  => 75940 bytes of shared memory needed.

What is the standard way to fix this problem if I really don't have that much memory? (ex. lowering thread count)

Thanks in advance.

yangarbiter avatar Dec 31 '14 06:12 yangarbiter

That's because the kernel is too large to fit into the shared memory. Can you give me your command line arguments ? Such as --struct and --input-dim. thanks

BTW, I'm considering changing all the convn codes to NVIDIA cuDNN API. But i'm still working on my master thesis...

poweic avatar Dec 31 '14 09:12 poweic

The following is my command line arguments: --input-dim 122x105 --struct 40x23x14-6s-100x14x12-3s-6128-1024 --output-dim 32

Just curious about what these parameters (kernel = (100, 92), grids = (8, 8, 40), threads = (4, 4, 1)) means and how to lower them down.

Thank you for answering my question. :)

yangarbiter avatar Dec 31 '14 11:12 yangarbiter

Ha, the error message is not what you think. But we usually choose a convolution kernel like 9x9, 7x7, 5x5 or 3x3. 23 x 14 is too big to fit into the share memory, so it'll take hell lots of time to train the network.

I think 122x105 is the image size in Machine Learning final contest. Scaling down the image to 64x64 and use a --struct like 10x5x5-2s-10x5x5-2s-10x4x4-2s-1023 will achieve around 80% accuracy.

As far as I know, you can achieve 7% error rate by using Caffe.

poweic avatar Dec 31 '14 18:12 poweic

Actually the kernel size 23*14 is able to be run on my computer using theano. I am still wondering about what Caffe have done to get the 7% error rate. I can only achieve around 15% error rate with Lenet implemented by theano.

BTW, I found out the the nn-init would crash without any error message when the format of input training data is wrong. (I accidentally start my array from zero)

yangarbiter avatar Jan 01 '15 01:01 yangarbiter

Can you please tell me the kernel size you choose in your LeNet ?

Okay, I'll add a guard clause at line 145 in src/data-io.cpp:

size_t j = stof(token.substr(0, pos)) - 1;

Thanks !!

poweic avatar Jan 01 '15 02:01 poweic

The filter size is 23x14 and 14x12 number of kernel is 40 and 100

I ran it on GTX780, the implementation is pretty much like the code in the following link. http://deeplearning.net/tutorial/lenet.html I think the kernel size in libdnn is the same as LeNetConvPoolLayer.filter_shape in theano, right?

yangarbiter avatar Jan 01 '15 05:01 yangarbiter

Yes, kernel is filter.

But the error message you mentioned before (kernel = (100, 92), grids = (8, 8, 40), threads = (4, 4, 1)) is the the way you think. XD You see at line 772 in file src/feature-transform.cpp

Z[i] += convn_2(rot180(iImgs[i][k]), Y[k], this->get_output_img_size());

and line 442 in file src/cnn-utility.cu

mat convn_2(const mat& data, const mat& kernels, SIZE k) {
...
size_t SHM_SIZE = getSuitableShmConfig(grids, threads, k.m, k.n);
...
return output;

You can see that the third argument in convn_2 is this->get_output_img_size(), which is later passed to getSuitableShmConfig as variable k. And the size of output image in the first ConvolutionalLayer is [100, 92] ( = [122, 105] - [23, 14] + [1,1] )

Well, that doesn't means I won't fix this issue. I'm currently refactoring src/cnn-utility.cu and try to make it more friendly, readable, and faster. (it's 50% already 50% faster now). I'll change the runtime_error message to a warning message in the next release. But before that, I still suggest you use a smaller image by scaling down to 64 x 64, and try a smaller kernel.

I already heard that it's possible to achieve 95% in-sample accuracy and 92% out-of-sample accuracy using 3 layers of ConvolutionalLayer and SubsamplingLayer with 20% dropout. : )

poweic avatar Jan 03 '15 03:01 poweic