Why using relu to compute additaive attention

Open yuboona opened this issue 5 years ago • 0 comments

1、Attention's formula

score = v * tanh(W * [hidden; encoder_outputs])

score = v * relu(W * [hidden; encoder_outputs])

Is there some trick here? or this is a result after experimental comparision.

May 02 '20 06:05 yuboona