bert
bert copied to clipboard
why add exclude_from_weight_decay for norm-related weight?
https://github.com/google-research/bert/blob/f39e881b169b9d53bea03d2d341b31707a6c052b/optimization.py#L65
Is there any special reason we add exclude_from_weight_decay for norm-related weight?
I have the same question, can someone explain it?
also wonder about this