powernorm
powernorm copied to clipboard
[ICML 2020] code for "PowerNorm: Rethinking Batch Normalization in Transformers" https://arxiv.org/abs/2003.07845
Hi! Thank you for releasing the paper code! I had some issues understanding the implementation that are solved by now. However, I expect that many of the people who decide...
Hi, first of all thank you for your work. I've been spending some time trying to understand what is happening in this script fairseq/modules/norms/mask_powernorm.py but I've been having some trouble....
The paper mentions that batch normalization can have large fluctuations in the batch statistics. This occurs in vanilla BN because it calculates the statistics over input of varying lengths padded...