DNC A variant of the deep LSTM?

In the page 7 of the article, the author write:

we have used the following variant of the deep LSTM architecture

But in your code, you are using LSTM model defined in Chain framework. So I wonder whether your model is the exactly representation of the model mentioned in the article. I am not arguing with you, just want to discuss this with you.

Nov 04 '16 01:11 Seraphli

I don't think the variant of the LSTM is different from the original version except for the input X_t is designed to include memory readings :)

Nov 08 '16 05:11 physicso

If you take a look at the first page of METHODS part, you can see that the formulation of input gate includes three input, X_t, h_{t-1}^l and h_t^{l-1}. I think the input gate of original LSTM only use X_t and h_{t-1}^l. So I have that problem.

Nov 12 '16 02:11 Seraphli

@Seraphli I think it is just the right expression for LSTM in a network with several hidden layers. The lth-layer cell gets input from its past (h_{t-1}^l), its lower layer (h_t^{l-1}) and the sample (X_t). You may also refer to Eq. (11) in [Graves, A., Mohamed, A.-r. & Hinton, G. Speech recognition with deep recurrent neural networks. In IEEE International Conference on Acoustics, Speech and Signal Processing (eds Ward, R. et al.) 6645–6649 (Curran Associates, 2013).] for an explicit expression for the LSTM, which is the same as in the Nature paper :)

Nov 14 '16 07:11 physicso

where is the dataset?

Jan 06 '17 07:01 vcchengcheng