seq2seq
seq2seq copied to clipboard
Adding the hard attention when predicting decoder outputs .
This paper (Show, Attend and Tell: Neural Image Caption Generation with Visual Attention) mention about the hard attention . But then we cannot use standard backprop because of the non differential property. If we add this will it increases accuracy of the seq2seq model ?