pluo911
pluo911
In fc layer, IN and LN should be the same. R50v2+SN converges much faster than R50v1+SN and produces better top-5 acc.
@GYxiaOH Try batch average when evaluating BN in SN. Batch average is stable than moving average for BN. In some tasks there could be difference, please see figure 8 in...
Thanks for your interest. SN benefits from adding 0.5 dropout in the last layer of hidden features, but GN and BN might not. The improvement depends on the generation error...
@Latou GN can be included in SN. You may try GN in your problem.
@eugenelawrence We are planning to do this. Welcome to contribute.