Adding the hard attention when predicting decoder outputs .

Open shamanez opened this issue 8 years ago • 0 comments

This paper (Show, Attend and Tell: Neural Image Caption Generation with Visual Attention) mention about the hard attention . But then we cannot use standard backprop because of the non differential property. If we add this will it increases accuracy of the seq2seq model ?

Aug 04 '17 20:08 shamanez