innovation-engine Mismatch of the libprotoc version

Hi, when I use make all, it still tells me:

.build_release/src/caffe/proto/caffe.pb.h:17:2: error: #error This file was generated by an older version of protoc which is

Aug 11 '16 07:08 urumican

@urumican : Hmm, no luck with this. You'd probably need to try those protobuf versions.

Aug 11 '16 19:08 anguyen8

We are now really confused by the version issues. I am now using CUDA-6.0, gcc-4.6.4, boost-1.52.0 to work with caffe compiling process. It shows a problem in the file that for layers like "dropout_layer.cu". It tells me that:

src/caffe/layers/dropout_layer.cu: In member function ‘virtual void caffe::DropoutLayer<Dtype>::Forward_gpu(const std::vector<caffe::Blob<Dtype>*, std::allocator<caffe::Blob<Dtype>*> >&, const std::vector<caffe::Blob<Dtype>*, std::allocator<caffe::Blob<Dtype>*> >&)’:
src/caffe/layers/dropout_layer.cu:37:92: error: invalid use of qualified-name ‘::_result’
src/caffe/layers/dropout_layer.cu:37:297: error: ‘_result’ was not declared in this scope
src/caffe/layers/dropout_layer.cu: In member function ‘virtual void caffe::DropoutLayer<Dtype>::Backward_gpu(const std::vector<caffe::Blob<Dtype>*, std::allocator<caffe::Blob<Dtype>*> >&, const std::vector<bool, std::allocator<bool> >&, const std::vector<caffe::Blob<Dtype>*, std::allocator<caffe::Blob<Dtype>*> >&)’:

May I know where did you change in your caffe code? I will really appreciate. I think I am gonna try to change the code.

Aug 11 '16 19:08 urumican

@urumican : Yes, so I didn't modify Caffe at all. Instead I only used it to load a model (AlexNet) and make forward passes to get the softmax outputs back. e.g. here: https://github.com/Evolving-AI-Lab/innovation-engine/blob/master/sferes/exp/images/fit/fit_map_deep_learning.hpp#L133-L139

The only reason I keep the Caffe that I used here is that Caffe API keeps changing i.e. breaking my code. Hope you could update the current code to make it work with the latest Caffe.

Aug 11 '16 19:08 anguyen8

@anguyen8 Thank you so much.

Aug 11 '16 19:08 urumican

@anguyen8 Hi Anh, I have installed them all. And I choose to run it on my own server with 64 processors. should I use launchScript.sh directly?

Aug 16 '16 03:08 urumican

@anguyen8 Also do you know where do we configure the directory to store the result? I am not able to fine "./sferes/mmm" as stated in installation guide.

Aug 16 '16 03:08 urumican

@urumican : Hi, if you have compiled Caffe, Sferes (via waf) successfully, then you'd just need to run the executable with mpirun. launchScript.sh is an example of how we run it on our cluster, but that might not directly work for your case.

re: mmm it's a folder that can be manually set here: https://github.com/Evolving-AI-Lab/innovation-engine/blob/2c4642daeddeee1aa36ece4431937cc441f6fc92/sferes/exp/images/ea/ea_custom.hpp#L75 If the program runs, it will create that folder for you. I'd suggest make a short run (e.g. for 1 iteration) and see where things are.

Aug 16 '16 11:08 anguyen8

@anguyen8 Yesterday I used mpirun --mca mpi_leave_pinned 0 --mca mpi_warn_on_fork 0 -np 16 /scratch2/fuxin/fooling/sferes/build/default/exp/images/images on the server.

Then there is a segment fault error:

[[libprotobuf WARNING google/protobuf/io/coded_stream.cc:537] Reading dangerously large protocol message.  If the message turns out to be larger than 1073741824 bytes, parsing will be halted for security reasons.  To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:78] The total number of bytes read was 553432081
[steed:63413] *** Process received signal ***
[steed:63413] Signal: Segmentation fault (11)
[steed:63413] Signal code: Address not mapped (1)
[steed:63413] Failing at address: (nil)
[steed:63413] [ 0] /usr/lib64/libc.so.6(+0x35670)[0x7f6d016f9670]
[steed:63413] [ 1] /scratch2/fuxin/fooling/sferes/build/default/exp/images/images[0x434b74]
[steed:63413] [ 2] /scratch2/fuxin/fooling/sferes/build/default/exp/images/images[0x434f29]
[steed:63413] [ 3] /scratch2/fuxin/fooling/sferes/build/default/exp/images/images[0x43e972]
[steed:63413] [ 4] /scratch2/fuxin/fooling/sferes/build/default/exp/images/images[0x43eb16]
[steed:63413] [ 5] /scratch2/fuxin/fooling/sferes/build/default/exp/images/images[0x43fb5e]
[steed:63413] [ 6] /scratch2/fuxin/fooling/sferes/build/default/exp/images/images[0x43ff4a]
[steed:63413] [ 7] /scratch2/fuxin/fooling/sferes/build/default/exp/images/images[0x41b6a2]
[steed:63413] [ 8] /usr/lib64/libc.so.6(__libc_start_main+0xf5)[0x7f6d016e5b15]
[steed:63413] [ 9] /scratch2/fuxin/fooling/sferes/build/default/exp/images/images[0x41cc85]
[steed:63413] *** End of error message ***]

Aug 16 '16 14:08 urumican

@urumican : the mpirun command looks good to me. You could also just run on local machine for debugging.

If you try single-process job, by uncommenting L187 and commenting out L188 here, does it work? https://github.com/Evolving-AI-Lab/innovation-engine/blob/master/sferes/exp/images/x/dl_map_elites_images_test.cpp#L187-L188

Aug 16 '16 15:08 anguyen8

Hi @anguyen8 , single core still result in a similar problem, a core dump.

steed /scratch2/fuxin/fooling/sferes 163% /scratch2/fuxin/fooling/sferes/build/default/exp/images/images                                                                                                                                     [A]:
sferes2 version: (const char*)"0.1"
seed: 1471428776
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:537] Reading dangerously large protocol message.  If the me                                                                                                                          ssage turns out to be larger than 1073741824 bytes, parsing will be halted for security reasons.  To increase the l                                                                                                                          imit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.                                                                                                                          h.
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:78] The total number of bytes read was 553432081
Segmentation fault (core dumped)

Aug 16 '16 15:08 urumican

@urumican : hmm, I guess I don't know any other quick fix. Can you use gdb and trace down the problem? (make sure to run the debug-version executable in path-to-sferes/build/debug/exp/images...)

More about debug vs default: https://github.com/sferes2/sferes2/wiki/Tutorial1

Aug 16 '16 16:08 anguyen8

@anguyen8 Thank you. I have another question, why the pre-trained model file is not .caffemodel file? I cannot find a pre-trained model without any extension names.

Aug 16 '16 16:08 urumican

@urumican : Yeah, Caffe only recently introduced this .caffemodel extension. Previously, they didn't have any extensions. But both format should work the same.

Aug 16 '16 16:08 anguyen8

@anguyen8 Oh, Thank you!

Aug 16 '16 16:08 urumican

@anguyen8 The debug code only give me one message: images: /scratch/include/boost/smart_ptr/shared_ptr.hpp:687: typename boost::detail::sp_member_access<T>::type boost::shared_ptr<T>::operator->() const [with T = caffe::ImageDataLayer<float>; typename boost::detail::sp_member_access<T>::type = caffe::ImageDataLayer<float>*]: Assertionpx != 0' failed.`

Aug 16 '16 17:08 urumican

@anguyen8 I find there is a line: #ifdef LOCAL_RUN. Do you remember where is this defined?

Aug 16 '16 17:08 urumican

@anguyen8 I find people are not able to find

caffe_reference_imagenet_model

Do you know where to get it?

Aug 16 '16 22:08 urumican

@urumican : You can get the BVLC Caffe reference model from their code/website. About the code, a few groups have been able to reproduce the results using this code. So it should be good if your Caffe is compatible with the code.

Aug 17 '16 01:08 anguyen8