Masknet replace batchnorm with layernorm
The paper from masknet uses layernorm. however the code implementation uses batchnorm.
Hi Marc,
Thanks for bringing this up! This is indeed a bug, and we are fixing it.
Hi Marc,
Upon checking, this is not a bug. When applying BatchNorm on the default axis (last dim), BatchNorm reduces to LayerNorm, and since the size of gamma/beta depends on the shape of input tensor, the original implementation is still correct.
However, for the clarity of the code, we updated the example (ref PR #816 ).
Thanks for the comment!
I am not sure I am following see this screenshot.
What am I missing?
Because your code isn't in trianing.
tf.layers.batch_normalization() will call to class BatchNormalizationBase
https://github.com/DeepRec-AI/DeepRec/blob/6bd822e4d05c6b2a005e58342c7661c387b417cb/tensorflow/python/keras/layers/normalization.py#L43
tf.keras.layers.LayerNormalization() will call to class LayerNormalization
https://github.com/DeepRec-AI/DeepRec/blob/6bd822e4d05c6b2a005e58342c7661c387b417cb/tensorflow/python/keras/layers/normalization.py#L898
In LayerNormalization, mean and var are computed by nn.moments
https://github.com/DeepRec-AI/DeepRec/blob/6bd822e4d05c6b2a005e58342c7661c387b417cb/tensorflow/python/keras/layers/normalization.py#L1025
then use nn.batch_normalization to get the result.
https://github.com/DeepRec-AI/DeepRec/blob/6bd822e4d05c6b2a005e58342c7661c387b417cb/tensorflow/python/keras/layers/normalization.py#L1040-L1046
It is the same with BN without other features. https://github.com/DeepRec-AI/DeepRec/blob/6bd822e4d05c6b2a005e58342c7661c387b417cb/tensorflow/python/keras/layers/normalization.py#L643-L652 https://github.com/DeepRec-AI/DeepRec/blob/6bd822e4d05c6b2a005e58342c7661c387b417cb/tensorflow/python/keras/layers/normalization.py#L736-L739 https://github.com/DeepRec-AI/DeepRec/blob/6bd822e4d05c6b2a005e58342c7661c387b417cb/tensorflow/python/keras/layers/normalization.py#L820-L825
But the difference is that when you are not in training, the mean and var of BN will be replaced. https://github.com/DeepRec-AI/DeepRec/blob/6bd822e4d05c6b2a005e58342c7661c387b417cb/tensorflow/python/keras/layers/normalization.py#L744-L750
you can add input param moving_mean_initializer='ones' which is defaulted to 'zeros' and find output is changed.
Thanks @Duyi-Wang it makes sense. I was confused by it as well but the doc clearly state it. Thanks for pointing out the code.
Adding a screenshot for posterity.
Feel free to close this one.