AdaBound
AdaBound copied to clipboard
An optimizer that trains as fast as Adam and as good as SGD.
I set the initial lr=0.0001, final_lr=0.1, but I still don't know when the optimizer will become SGD. Do I need to improve my learning rate to the final learning rate...
``` /home/xxxx/.local/lib/python3.7/site-packages/adabound/adabound.py:94: UserWarning: This overload of add_ is deprecated: add_(Number alpha, Tensor other) Consider using one of the following signatures instead: add_(Tensor other, *, Number alpha) (Triggered internally at /pytorch/torch/csrc/utils/python_arg_parser.cpp:766.)...
Hi, thanks a lot for sharing your excellent work. I wonder if I want to change learning rate with epoch increasing, how do I set parameter **lr** and **final_lr** in...
https://github.com/wayne391/Image-Super-Resolution/blob/master/src/models/RCAN.py Just change `optimizer = torch.optim.Adam(model.parameters(), lr=1e-4, amsgrad=False)` to `optimizer = adabound.AdaBound(model.parameters(), lr=1e-4, final_lr=0.1)` Nan loss in RCAN model, but Adam work fine.
Hello, can you please tell me what these two parameters in α / √Vt mean, especially Vt? Thank you
The provided new optimizer is sensitive on tiny batchsize (
Greetings, Thanks for your great paper. I am wondering about the hyperparameters you used for language modeling experiments. Could you provide information about that? Thank you!
https://github.com/lessw2020/Ranger-Deep-Learning-Optimizer https://github.com/mgrankin/over9000/issues/4 I strongly believe that AdaBound would be better if it used RAdam instead of Adam. It could merge with Lookahead too and LAMB. Then we would have the...
correct grammar would be "as well as adam" not sure if you care
 I tested three methods in a very simple problem, and got the result as above. Code are printed here: import torch import torch.nn as nn import matplotlib.pyplot as plt...