Gradient normalization lowers the maximum learning rate that can converge.

Open Handagot opened this issue 3 years ago • 0 comments

I found this problem while training ResNet18 on cifar100 for some experiment. I still haven't looked into this issue enough to find out what the cause is.

Apr 21 '22 02:04 Handagot