xyxie
xyxie
@CyrilYang The model still cannot run on the docker. But there is no problem on my own ubuntu. So I ran it on my local ubuntu instead.
Sorry for making something confused here. Adan indeed has the bias correction in the implementation, but we need to consist the algorithm presentation with the theoretical analysis. Hence, we did...
@lucidrains Thanks for updating, the following are some minor modifications. When we implement Adan, we refer to some optimizer's implementation in [timm](https://github.com/rwightman/pytorch-image-models/tree/master/timm/optim). Line 55: `state['prev_grad'] = grad ` Line 85-86:...
@lucidrains You're welcome. By increasing LR and tuning the warm steps, the performance may be further improved. Have fun using Adan.
Sorry that I am not quite familiar with TF, I have tried to add your Wechat to send our `adan.py` (implemented with PyTorch). No response has been received yet. @cpuimage...
Hi, @tcapelle It can be seen from the experimental results released by you [here](https://github.com/tcapelle/adan_opt/blob/main/Adan_explore.ipynb) that the Acc. of Adan's three trials are 71.8/ 75.5/ 74.0, while the results of Adam's...
Hi, @bonlime Thanks for your very valuable suggestions. We will try to implement this "torch._foreach" function. Of course, if you have time, you can also help. Many thanks!! Best
Hi, @iiSeymour Here are the results for Adan.   It seems that many optimizers can reach the optimal point, but the practical performance varies greatly, such as Adam and...
@haihai-00 Hi, I suggest referring to the HPs we use for ViT-B and ViT-S. At least, you may try the default betas (0.98,0.92,0.99) and set wd to 0.02. To make...