Question regarding the batch norm vs masked batch norm

Open enhuiz opened this issue 3 years ago • 0 comments

The paper mentions that batch normalization can have large fluctuations in the batch statistics. This occurs in vanilla BN because it calculates the statistics over input of varying lengths padded with 0. I was wondering whether this fluctuation still occurs in the masked version of BN (where padding is ignored). Additionally, how much of a performance gain can be expected by switching from BN to masked BN? Thanks.

Dec 16 '22 11:12 enhuiz