Ivan Bilan
Ivan Bilan
I have exactly the same issue, building Cython from the github repo doesn't solve the problem either.
Great idea, I will look into a way to add this in an intuitive manner.
I also get lower performance overall with the new 0.4.1 code update. I am only using the Encoder part in my project, however.
Was this intentional by any chance?
how so? Should the ScaledDotProductAttention have a dropout set to 0.1 for the results of the paper to be reproduced? Or did they use dropout=0.1 throughout the whole model?
Just added it, will improve it over time.