a-PyTorch-Tutorial-to-Image-Captioning multi layer RNN

coded multi layer RNN. Using 2 layers as default, can change "num_layers" to adjust number of layers when calling DecoderWithAttention. Please check.

Jul 03 '19 09:07 WoodySG2018

Does it achieve improved results when compared to a single layer one? If yes, by what margin?

Jul 23 '19 11:07 kmario23

Does it achieve improved results when compared to a single layer one? If yes, by what margin?

Hi kmario23, the training is on-going, by now I saw clear different loss decreasing between 2-layer and 4-layer RNN (I can't trace back single layer RNN performance), but it's too early to tell relation of performance vs #layer on my working dataset.

Theoretically to capture highly hierarchical structure by just one layer is not optimal. So hopefully by integrating multilayer here we can enable users to test the model's power in another dimension on their own datasets.

Jul 24 '19 03:07 WoodySG2018

Is there any update on this issue? Will it be merged? A very nice functionality, I would say.

Just a quick question related to how initial states for multi-layer LSTM are constructed: I am wondering why we need to initialize hidden/cell states of the LSTM by passing image representation through FC layer? Is it somehow the standard way of doing it?

Jan 06 '20 18:01 nilinykh

Just a quick question related to how initial states for multi-layer LSTM are constructed: I am wondering why we need to initialize hidden/cell states of the LSTM by passing image representation through FC layer? Is it somehow the standard way of doing it?

That's how we pass the learned representation of the image by the CNN (encoder) to the RNN/LSTM (decoder). Usually the decoder dimension varies in size (i.e. a hyperparameter). So, to match the encoder dimension to the decoder dimension, we usually have to project the learned representation of image i.e. the encoder output, which is of dimension 2048 in this case, to the decoder dimension. Else it's not possible to feed the encoder output directly into the decoder.

Jan 07 '20 15:01 kmario23