DeepTextSpotter "recurrent conv" in the paper

Hi,

As presented in Table 1 in the paper, I see some layers are called "recurrent conv". However, on the prototxt file specifying the models, I only see normal Caffe's "Convolution" layers. My question is what is the "recurrent conv" layer? and is it important? or a normal convolutional layer would still produce equally good results?

Thanks.

Aug 28 '18 00:08 nmduc

Hi, On 28/08/18 02:34, Duc Minh Nguyen wrote:

Hi,

As presented in Table 1 in the paper, I see some layers are called "recurrent conv". However, on the prototxt file specifying the models, I only see normal Caffe's "Convolution" layers. My question is what is the "recurrent conv" layer?

Paper to read: https://www.cv-foundation.org/openaccess/content_cvpr_2016/html/Lee_Recursive_Recurrent_Nets_CVPR_2016_paper.html

easy way to implement recurrent convolution in caffe is by weight sharing - ie N times convolution layer with same weights.

something like:

layer { name: "conv4_1" type: "Convolution" bottom: "bn4" top: "conv4_1" param { name: "conv4_w" } param { name: "conv4_b" } convolution_param { num_output: 256 pad: 1 kernel_size: 3 stride: 1 weight_filler { type: "xavier" } bias_filler { type: "constant" } } }

layer { name: "conv4_2" type: "Convolution" bottom: "conv4_1" top: "conv4_2" param { name: "conv4_w" } param { name: "conv4_b" } convolution_param { num_output: 256 pad: 1 kernel_size: 3 stride: 1 weight_filler { type: "xavier" } bias_filler { type: "constant" } } }

work is done by : param { name: "conv4_w" } param { name: "conv4_b" }

in tiny.proto see conv11 and conv11_2

and is it important? or a normal convolutional layer would still produce equally good results?

the paper Recursive Recurrent Nets With Attention Modeling for OCR in the Wild says it is

Thanks.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/MichalBusta/DeepTextSpotter/issues/52, or mute the thread https://github.com/notifications/unsubscribe-auth/AD6jsB2T0iaZdr0FRyp8j5UOz6_b46PDks5uVJALgaJpZM4WOttp.

Aug 28 '18 08:08 MichalBusta

Hi Michal, Thank you very much for your prompt response .

Paper to read: https://www.cv-foundation.org/openaccess/content_cvpr_2016/html/Lee_Recursive_Recurrent_Nets_CVPR_2016_paper.html easy way to implement recurrent convolution in caffe is by weight sharing - ie N times convolution layer with same weights.

Thanks for the pointer and explanation.

in tiny.proto see conv11 and conv11_2

I see it now. However, as I understand, tiny.prototxt defines the RPN instead of the recognition network. So it seems like even without recurrent conv, your recognition net still performs decently. I will try using recurrent conv as well.

I have another small question: the training code in this repo is not fully "end-to-end", right? The gradients from the recognition network do not flow to the RPN network. Right now, as I understand, the two networks are trained using different solvers, and there are some normalization steps being performed in numpy to connect the outputs from the RPN to the recognition model. Am I missing anything? Have you tried making it fully end-to-end?

Thank you again and have a nice day.

Aug 28 '18 10:08 nmduc

On 28/08/18 12:44, Duc Minh Nguyen wrote:

Hi Michal, Thank you very much for your prompt response .
Paper to read:
https://www.cv-foundation.org/openaccess/content_cvpr_2016/html/Lee_Recursive_Recurrent_Nets_CVPR_2016_paper.html
easy way to implement recurrent convolution in caffe is by weight
sharing - ie N times convolution layer with same weights.
Thanks for the pointer and explanation.
in tiny.proto see conv11 and conv11_2
I see it now. However, as I understand, tiny.prototxt defines the RPN instead of the recognition network. So it seems like even without recurrent conv, your recognition net still performs decently. I will try using recurrent conv as well.

I have another small question: the training code in this repo is not fully "end-to-end", right? The gradients from the recognition network do not flow to the RPN network. Right now, as I understand, the two networks are trained using different solvers, and there are some normalization steps being performed in numpy to connect the outputs from the RPN to the recognition model. Am I missing anything? Have you tried making it fully end-to-end?

You are right - it is just 2 networks - OCR net is just learning on "inperfect" proposals so you can see it as additional data augmentation.

we tried full end-to-end with no success (usual story - you make it fully differentiable and then you need to brake it - something like FOTS approach https://arxiv.org/abs/1801.01671v2 ).

Thank you again and have a nice day.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/MichalBusta/DeepTextSpotter/issues/52#issuecomment-416537952, or mute the thread https://github.com/notifications/unsubscribe-auth/AD6jsLwDQa7tNIqsIP9xA3NuHszpdqlFks5uVR8DgaJpZM4WOttp.

Aug 28 '18 11:08 MichalBusta

You are right - it is just 2 networks - OCR net is just learning on "inperfect" proposals so you can see it as additional data augmentation. we tried full end-to-end with no success (usual story - you make it fully differentiable and then you need to brake it - something like FOTS approach https://arxiv.org/abs/1801.01671v2 ).

Got it. Thank you very much for your clarification and for kindly sharing the code and models of course.

Aug 28 '18 12:08 nmduc

@MichalBusta hi，i think there is a problem in "recurrent conv". You said it appears in tiny.proto, but in paper "recurrent conv" is in Table 1. Fully-Convolutional Network for Text Recognition( this is the model_cv.proto). In model_cv.proto , there is not any conv with reuse param.

Oct 15 '18 06:10 xxlxx1

Hi, shared version is just fast demo (goal to fit 1GB GPU for demonstration almost 2 years ago). There are quite a lot of deviations from full "paper" version. (small detection network, smaller OCR ... )

Oct 15 '18 08:10 MichalBusta

Hi, shared version is just fast demo (goal to fit 1GB GPU for demonstration almost 2 years ago). There are quite a lot of deviations from full "paper" version. (small detection network, smaller OCR ... )

So that's it， thank you.

Oct 15 '18 11:10 xxlxx1