Ivan Bilan

Results 6 comments of Ivan Bilan

I have exactly the same issue, building Cython from the github repo doesn't solve the problem either.

Great idea, I will look into a way to add this in an intuitive manner.

I also get lower performance overall with the new 0.4.1 code update. I am only using the Encoder part in my project, however.

Was this intentional by any chance?

how so? Should the ScaledDotProductAttention have a dropout set to 0.1 for the results of the paper to be reproduced? Or did they use dropout=0.1 throughout the whole model?

Just added it, will improve it over time.