Mismatch of the libprotoc version
Hi, when I use make all, it still tells me:
.build_release/src/caffe/proto/caffe.pb.h:17:2: error: #error This file was generated by an older version of protoc which is
@urumican : Hmm, no luck with this. You'd probably need to try those protobuf versions.
We are now really confused by the version issues. I am now using CUDA-6.0, gcc-4.6.4, boost-1.52.0 to work with caffe compiling process. It shows a problem in the file that for layers like "dropout_layer.cu". It tells me that:
src/caffe/layers/dropout_layer.cu: In member function ‘virtual void caffe::DropoutLayer<Dtype>::Forward_gpu(const std::vector<caffe::Blob<Dtype>*, std::allocator<caffe::Blob<Dtype>*> >&, const std::vector<caffe::Blob<Dtype>*, std::allocator<caffe::Blob<Dtype>*> >&)’:
src/caffe/layers/dropout_layer.cu:37:92: error: invalid use of qualified-name ‘::_result’
src/caffe/layers/dropout_layer.cu:37:297: error: ‘_result’ was not declared in this scope
src/caffe/layers/dropout_layer.cu: In member function ‘virtual void caffe::DropoutLayer<Dtype>::Backward_gpu(const std::vector<caffe::Blob<Dtype>*, std::allocator<caffe::Blob<Dtype>*> >&, const std::vector<bool, std::allocator<bool> >&, const std::vector<caffe::Blob<Dtype>*, std::allocator<caffe::Blob<Dtype>*> >&)’:
May I know where did you change in your caffe code? I will really appreciate. I think I am gonna try to change the code.
@urumican : Yes, so I didn't modify Caffe at all. Instead I only used it to load a model (AlexNet) and make forward passes to get the softmax outputs back. e.g. here: https://github.com/Evolving-AI-Lab/innovation-engine/blob/master/sferes/exp/images/fit/fit_map_deep_learning.hpp#L133-L139
The only reason I keep the Caffe that I used here is that Caffe API keeps changing i.e. breaking my code. Hope you could update the current code to make it work with the latest Caffe.
@anguyen8 Thank you so much.
@anguyen8 Hi Anh, I have installed them all. And I choose to run it on my own server with 64 processors. should I use launchScript.sh directly?
@anguyen8 Also do you know where do we configure the directory to store the result? I am not able to fine "./sferes/mmm" as stated in installation guide.
@urumican : Hi, if you have compiled Caffe, Sferes (via waf) successfully, then you'd just need to run the executable with mpirun. launchScript.sh is an example of how we run it on our cluster, but that might not directly work for your case.
re: mmm it's a folder that can be manually set here: https://github.com/Evolving-AI-Lab/innovation-engine/blob/2c4642daeddeee1aa36ece4431937cc441f6fc92/sferes/exp/images/ea/ea_custom.hpp#L75 If the program runs, it will create that folder for you. I'd suggest make a short run (e.g. for 1 iteration) and see where things are.
@anguyen8 Yesterday I used mpirun --mca mpi_leave_pinned 0 --mca mpi_warn_on_fork 0 -np 16 /scratch2/fuxin/fooling/sferes/build/default/exp/images/images on the server.
Then there is a segment fault error:
[[libprotobuf WARNING google/protobuf/io/coded_stream.cc:537] Reading dangerously large protocol message. If the message turns out to be larger than 1073741824 bytes, parsing will be halted for security reasons. To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:78] The total number of bytes read was 553432081
[steed:63413] *** Process received signal ***
[steed:63413] Signal: Segmentation fault (11)
[steed:63413] Signal code: Address not mapped (1)
[steed:63413] Failing at address: (nil)
[steed:63413] [ 0] /usr/lib64/libc.so.6(+0x35670)[0x7f6d016f9670]
[steed:63413] [ 1] /scratch2/fuxin/fooling/sferes/build/default/exp/images/images[0x434b74]
[steed:63413] [ 2] /scratch2/fuxin/fooling/sferes/build/default/exp/images/images[0x434f29]
[steed:63413] [ 3] /scratch2/fuxin/fooling/sferes/build/default/exp/images/images[0x43e972]
[steed:63413] [ 4] /scratch2/fuxin/fooling/sferes/build/default/exp/images/images[0x43eb16]
[steed:63413] [ 5] /scratch2/fuxin/fooling/sferes/build/default/exp/images/images[0x43fb5e]
[steed:63413] [ 6] /scratch2/fuxin/fooling/sferes/build/default/exp/images/images[0x43ff4a]
[steed:63413] [ 7] /scratch2/fuxin/fooling/sferes/build/default/exp/images/images[0x41b6a2]
[steed:63413] [ 8] /usr/lib64/libc.so.6(__libc_start_main+0xf5)[0x7f6d016e5b15]
[steed:63413] [ 9] /scratch2/fuxin/fooling/sferes/build/default/exp/images/images[0x41cc85]
[steed:63413] *** End of error message ***]
@urumican : the mpirun command looks good to me. You could also just run on local machine for debugging.
If you try single-process job, by uncommenting L187 and commenting out L188 here, does it work? https://github.com/Evolving-AI-Lab/innovation-engine/blob/master/sferes/exp/images/x/dl_map_elites_images_test.cpp#L187-L188
Hi @anguyen8 , single core still result in a similar problem, a core dump.
steed /scratch2/fuxin/fooling/sferes 163% /scratch2/fuxin/fooling/sferes/build/default/exp/images/images [A]:
sferes2 version: (const char*)"0.1"
seed: 1471428776
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:537] Reading dangerously large protocol message. If the me ssage turns out to be larger than 1073741824 bytes, parsing will be halted for security reasons. To increase the l imit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream. h.
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:78] The total number of bytes read was 553432081
Segmentation fault (core dumped)
@urumican : hmm, I guess I don't know any other quick fix. Can you use gdb and trace down the problem? (make sure to run the debug-version executable in path-to-sferes/build/debug/exp/images...)
More about debug vs default: https://github.com/sferes2/sferes2/wiki/Tutorial1
@anguyen8 Thank you. I have another question, why the pre-trained model file is not .caffemodel file? I cannot find a pre-trained model without any extension names.
@urumican : Yeah, Caffe only recently introduced this .caffemodel extension. Previously, they didn't have any extensions. But both format should work the same.
@anguyen8 Oh, Thank you!
@anguyen8 The debug code only give me one message: images: /scratch/include/boost/smart_ptr/shared_ptr.hpp:687: typename boost::detail::sp_member_access<T>::type boost::shared_ptr<T>::operator->() const [with T = caffe::ImageDataLayer<float>; typename boost::detail::sp_member_access<T>::type = caffe::ImageDataLayer<float>*]: Assertionpx != 0' failed.`
@anguyen8 I find there is a line: #ifdef LOCAL_RUN. Do you remember where is this defined?
@anguyen8 I find people are not able to find
caffe_reference_imagenet_model
Do you know where to get it?
@urumican : You can get the BVLC Caffe reference model from their code/website. About the code, a few groups have been able to reproduce the results using this code. So it should be good if your Caffe is compatible with the code.