seq2seq
seq2seq copied to clipboard
Why using relu to compute additaive attention
1、Attention's formula
- In Normal Additive version, the attention score as follow:
score = v * tanh(W * [hidden; encoder_outputs])
- In your code
score = v * relu(W * [hidden; encoder_outputs])
2、question
Is there some trick here? or this is a result after experimental comparision.