caffe-jacinto icon indicating copy to clipboard operation
caffe-jacinto copied to clipboard

caffe jacinto sparsification

Open Wronskia opened this issue 8 years ago • 12 comments

Hello,

I am trying to fine-tune a model with sparsification. Actually, it corresponds to the alexnet architecture trained on imagenet data given here : https://github.com/cvjena/cnn-models . I am trying to finetune it with sparsification on a subset of imagenet data. My first step was to first test the accuracy of the model (Alexnet + caffemodel) provided on my subset of images using caffe only and I get results that are very close to what they obtained. My second step was to use caffe-jacinto to do the sparsification. Before doing that I tested the model the exact same way as in caffe with no sparsification using caffe-jacinto. Here is what I get :

W0811 14:14:31.177296 4905 net.cpp:811] Incompatible number of blobs for layer data/bn W0811 14:14:31.177886 4905 net.cpp:819] Copying from data/bn to data/bn target blob 0 W0811 14:14:31.178071 4905 net.cpp:832] Shape mismatch, param: 0 layer: data/bn source: 3 (3) target: 1 3 1 1 (3). W0811 14:14:31.178169 4905 net.cpp:819] Copying from data/bn to data/bn target blob 1 W0811 14:14:31.178247 4905 net.cpp:832] Shape mismatch, param: 1 layer: data/bn source: 3 (3) target: 1 3 1 1 (3). W0811 14:14:31.178308 4905 net.cpp:819] Copying from data/bn to data/bn target blob 2 W0811 14:14:31.178381 4905 net.cpp:825] Cannot copy param 2 weights from layer 'data/bn'; shape mismatch. Source param shape is 1 (1); target param shape is 1 3 1 1 (3). To learn this layer's parameters from scratch rather than copying from a saved net, rename the layer. W0811 14:14:31.178517 4905 net.cpp:811] Incompatible number of blobs for layer conv1/bn W0811 14:14:31.178555 4905 net.cpp:819] Copying from conv1/bn to conv1/bn target blob 0 W0811 14:14:31.178652 4905 net.cpp:832] Shape mismatch, param: 0 layer: conv1/bn source: 96 (96) target: 1 96 1 1 (96). W0811 14:14:31.178719 4905 net.cpp:819] Copying from conv1/bn to conv1/bn target blob 1 W0811 14:14:31.178793 4905 net.cpp:832] Shape mismatch, param: 1 layer: conv1/bn source: 96 (96) target: 1 96 1 1 (96). W0811 14:14:31.178854 4905 net.cpp:819] Copying from conv1/bn to conv1/bn target blob 2 W0811 14:14:31.178926 4905 net.cpp:825] Cannot copy param 2 weights from layer 'conv1/bn'; shape mismatch. Source param shape is 1 (1); target param shape is 1 96 1 1 (96). To learn this layer's parameters from scratch rather than copying from a saved net, rename the layer.

and so on for every batch normalization layer. however the testing moves forward to provide these poor results : I0811 14:14:33.100739 4905 caffe.cpp:312] Batch 199, accuracy = 0 I0811 14:14:33.100769 4905 caffe.cpp:312] Batch 199, loss = 87 I0811 14:14:33.100780 4905 caffe.cpp:312] Batch 199, top5 = 0 I0811 14:14:33.100786 4905 caffe.cpp:317] Loss: 86.826 I0811 14:14:33.100814 4905 caffe.cpp:329] accuracy = 0.002 I0811 14:14:33.100836 4905 caffe.cpp:329] loss = 86.826 (* 1 = 86.826 loss) I0811 14:14:33.100845 4905 caffe.cpp:329] top5 = 0.005

so it must be that the weights are not load correctly, my guess would be that the reshaping part is not set up correctly in the function void Net<Dtype>::CopyTrainedLayersFrom(const NetParameter& param) in net.cpp

PS: your full CIFAR example is perfectly working for me

Thank you, Best,

Yassine

Wronskia avatar Aug 11 '17 14:08 Wronskia

caffe-jacinto is derived from NVIDIA/caffe.

The branch caffe-0.15 has a known issue in BatchNormalization layers: The batch normalization is incompatible to BVLC/caffe batch normalization. So cannot load a pre-trained model that has batch normalization and trained in BVLC/caffe.

NVIDIA/caffe has fixed this issue in the branch caffe-0.16. We are in the process of migrating to that branch. It may take couple of weeks.

However, if this is urgent for you, I can suggest couple of options:

  1. you can use a model that doesn't have batch normalization. (eg. VGG16)
  2. you can use a pre-trained model that is trained in NVIDIA/caffe of caffe-jacinto (branch caffe-0.15). For example, I have made jacintonet11 pre-trained model available.
  3. you can start from scratch (without using pre-trained model)

mathmanu avatar Aug 11 '17 14:08 mathmanu

Hi,

Thank you for your quick answer!

Should it work with CPU_ONLY support?

Best, Yassine

Wronskia avatar Aug 11 '17 14:08 Wronskia

Part of the issue is related to CUDNN. If you use CUDNN, one part of the error will be solved. But it will still not be fully solved.

mathmanu avatar Aug 11 '17 15:08 mathmanu

Actually, I want to test the effect of your sparsification method on alexnet and resnet architectures (with imagenet dataset) which are both using batch normalization. I tried to fix the problem in your source code but nothing worked and from the commits of caffe-0.16 branch related to batch normalization the fix doesn't seem to be straightforward.

Thanks again, Best, Yassine

Wronskia avatar Aug 11 '17 15:08 Wronskia

We shall try to push the caffe-0.16 branch of caffe-jacinto soon. That will solve these incompatibilities.

mathmanu avatar Aug 11 '17 15:08 mathmanu

Hello Manu,

I am coming back to you to know when do you think guys you are going to release the caffe-0.16 branch of caffe-jacinto ?

Thanks, Best, Yassine

Wronskia avatar Sep 20 '17 12:09 Wronskia

Hi Yassine, It's almost ready. We shall target for 25th September 2017, Monday. Best regards, Manu.

mathmanu avatar Sep 20 '17 12:09 mathmanu

Perfect.

Thank you.

Wronskia avatar Sep 20 '17 12:09 Wronskia

@Wronskia branch caffe-0.16 is now avaialable for caffe-jacinto and caffe-jacinto-models Note that the default branch is still caffe-0.15. You have to manually switch to caffe-0.16 after clone or pull. git checkout caffe-0.16

See the example scripts located in caffe-jacinto-models/scripts/training folder. Also check the example trained models given in caffe-jacinto-models/trainied.

New Features - 2017 September:

  1. Based on NVIDIA/caffe branch caffe-0.16 - so it fixes the Bath Normalization backward compatibility issues.
  2. Additional features have been added for sparsity - for example improvements to reach the exact sparsity target specified. Also improvements to reduce accuracy drop during sparsification.
  3. Estimate the accuracy with quantization - all you have to do is to set quantize: enable in your network prototxt definition.
  4. Object detection using Single Shot Detector (SSD) has been integrated.

mathmanu avatar Sep 22 '17 10:09 mathmanu

Hello Manu,

Thank you very much, I will give it a try now.

Best, Yassine

Wronskia avatar Sep 26 '17 06:09 Wronskia

Keeping this issue open for some more time, as this is an interesting conversation and provides help to someone who is trying out the same.

mathmanu avatar Oct 06 '17 14:10 mathmanu

Note that I have changed the default branch in github to caffe-0.16

mathmanu avatar Oct 06 '17 15:10 mathmanu