DeepTextSpotter icon indicating copy to clipboard operation
DeepTextSpotter copied to clipboard

"recurrent conv" in the paper

Open nmduc opened this issue 7 years ago • 7 comments

Hi,

As presented in Table 1 in the paper, I see some layers are called "recurrent conv". However, on the prototxt file specifying the models, I only see normal Caffe's "Convolution" layers. My question is what is the "recurrent conv" layer? and is it important? or a normal convolutional layer would still produce equally good results?

Thanks.

nmduc avatar Aug 28 '18 00:08 nmduc

Hi, On 28/08/18 02:34, Duc Minh Nguyen wrote:

Hi,

As presented in Table 1 in the paper, I see some layers are called "recurrent conv". However, on the prototxt file specifying the models, I only see normal Caffe's "Convolution" layers. My question is what is the "recurrent conv" layer?

Paper to read: https://www.cv-foundation.org/openaccess/content_cvpr_2016/html/Lee_Recursive_Recurrent_Nets_CVPR_2016_paper.html

easy way to implement recurrent convolution in caffe is by weight sharing - ie N times convolution layer with same weights.

something like:

layer {   name: "conv4_1"   type: "Convolution"   bottom: "bn4"   top: "conv4_1"   param {     name: "conv4_w"   }   param {     name: "conv4_b"   }   convolution_param {     num_output: 256     pad: 1     kernel_size: 3     stride: 1     weight_filler {       type: "xavier"     }     bias_filler {       type: "constant"     }   } }

layer {   name: "conv4_2"   type: "Convolution"   bottom: "conv4_1"   top: "conv4_2"   param {     name: "conv4_w"   }   param {     name: "conv4_b"   }   convolution_param {     num_output: 256     pad: 1     kernel_size: 3     stride: 1     weight_filler {       type: "xavier"     }     bias_filler {       type: "constant"     }   } }

  • work is done by : param {     name: "conv4_w"   }   param {     name: "conv4_b"   }

in tiny.proto see conv11 and conv11_2

and is it important? or a normal convolutional layer would still produce equally good results?

the paper Recursive Recurrent Nets With Attention Modeling for OCR in the Wild  says it is

Thanks.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/MichalBusta/DeepTextSpotter/issues/52, or mute the thread https://github.com/notifications/unsubscribe-auth/AD6jsB2T0iaZdr0FRyp8j5UOz6_b46PDks5uVJALgaJpZM4WOttp.

MichalBusta avatar Aug 28 '18 08:08 MichalBusta

Hi Michal, Thank you very much for your prompt response .

Paper to read: https://www.cv-foundation.org/openaccess/content_cvpr_2016/html/Lee_Recursive_Recurrent_Nets_CVPR_2016_paper.html easy way to implement recurrent convolution in caffe is by weight sharing - ie N times convolution layer with same weights.

Thanks for the pointer and explanation.

in tiny.proto see conv11 and conv11_2

I see it now. However, as I understand, tiny.prototxt defines the RPN instead of the recognition network. So it seems like even without recurrent conv, your recognition net still performs decently. I will try using recurrent conv as well.

I have another small question: the training code in this repo is not fully "end-to-end", right? The gradients from the recognition network do not flow to the RPN network. Right now, as I understand, the two networks are trained using different solvers, and there are some normalization steps being performed in numpy to connect the outputs from the RPN to the recognition model. Am I missing anything? Have you tried making it fully end-to-end?

Thank you again and have a nice day.

nmduc avatar Aug 28 '18 10:08 nmduc

On 28/08/18 12:44, Duc Minh Nguyen wrote:

Hi Michal, Thank you very much for your prompt response .

Paper to read:
https://www.cv-foundation.org/openaccess/content_cvpr_2016/html/Lee_Recursive_Recurrent_Nets_CVPR_2016_paper.html
easy way to implement recurrent convolution in caffe is by weight
sharing - ie N times convolution layer with same weights.

Thanks for the pointer and explanation.

in tiny.proto see conv11 and conv11_2

I see it now. However, as I understand, tiny.prototxt defines the RPN instead of the recognition network. So it seems like even without recurrent conv, your recognition net still performs decently. I will try using recurrent conv as well.

I have another small question: the training code in this repo is not fully "end-to-end", right? The gradients from the recognition network do not flow to the RPN network. Right now, as I understand, the two networks are trained using different solvers, and there are some normalization steps being performed in numpy to connect the outputs from the RPN to the recognition model. Am I missing anything? Have you tried making it fully end-to-end?

You are right - it is just 2 networks - OCR net is just learning on "inperfect" proposals so you can see it as additional data augmentation.

we tried full end-to-end with no success (usual  story - you make it fully differentiable and then you need to brake it - something like FOTS approach https://arxiv.org/abs/1801.01671v2 ).

Thank you again and have a nice day.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/MichalBusta/DeepTextSpotter/issues/52#issuecomment-416537952, or mute the thread https://github.com/notifications/unsubscribe-auth/AD6jsLwDQa7tNIqsIP9xA3NuHszpdqlFks5uVR8DgaJpZM4WOttp.

MichalBusta avatar Aug 28 '18 11:08 MichalBusta

You are right - it is just 2 networks - OCR net is just learning on "inperfect" proposals so you can see it as additional data augmentation. we tried full end-to-end with no success (usual story - you make it fully differentiable and then you need to brake it - something like FOTS approach https://arxiv.org/abs/1801.01671v2 ).

Got it. Thank you very much for your clarification and for kindly sharing the code and models of course.

nmduc avatar Aug 28 '18 12:08 nmduc

@MichalBusta hi,i think there is a problem in "recurrent conv". You said it appears in tiny.proto, but in paper "recurrent conv" is in Table 1. Fully-Convolutional Network for Text Recognition( this is the model_cv.proto). In model_cv.proto , there is not any conv with reuse param.

xxlxx1 avatar Oct 15 '18 06:10 xxlxx1

Hi, shared version is just fast demo (goal to fit 1GB GPU for demonstration almost 2 years ago). There are quite a lot of deviations from full "paper" version. (small detection network, smaller OCR ... )

MichalBusta avatar Oct 15 '18 08:10 MichalBusta

Hi, shared version is just fast demo (goal to fit 1GB GPU for demonstration almost 2 years ago). There are quite a lot of deviations from full "paper" version. (small detection network, smaller OCR ... )

So that's it, thank you.

xxlxx1 avatar Oct 15 '18 11:10 xxlxx1