xyxie comments

Results 23 comments of


                                            xyxie

Message type "caffe.LayerParameter" has no field named "shuffle_channel_param"

@CyrilYang The model still cannot run on the docker. But there is no problem on my own ubuntu. So I ran it on my local ubuntu instead.

Message type "caffe.LayerParameter" has no field named "shuffle_channel_param"

@LiuRJun 是的

m0 / v1 init

Sorry for making something confused here. Adan indeed has the bias correction in the implementation, but we need to consist the algorithm presentation with the theoretical analysis. Hence, we did...

@lucidrains Thanks for updating, the following are some minor modifications. When we implement Adan, we refer to some optimizer's implementation in [timm](https://github.com/rwightman/pytorch-image-models/tree/master/timm/optim). Line 55: `state['prev_grad'] = grad ` Line 85-86:...

m0 / v1 init

@lucidrains You're welcome. By increasing LR and tuning the warm steps, the performance may be further improved. Have fun using Adan.

m0 / v1 init

Sorry that I am not quite familiar with TF, I have tried to add your Wechat to send our `adan.py` (implemented with PyTorch). No response has been received yet. @cpuimage...

m0 / v1 init

Hi, @tcapelle It can be seen from the experimental results released by you [here](https://github.com/tcapelle/adan_opt/blob/main/Adan_explore.ipynb) that the Acc. of Adan's three trials are 71.8/ 75.5/ 74.0, while the results of Adam's...

`torch._foreach...` implementation

Hi, @bonlime Thanks for your very valuable suggestions. We will try to implement this "torch._foreach" function. Of course, if you have time, you can also help. Many thanks!! Best

Adan optimiser: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models

Hi, @iiSeymour Here are the results for Adan. ![rastrigin_Adan](https://user-images.githubusercontent.com/10042844/187641113-0b896507-5379-427b-885b-b28cbcbcc151.png) ![rosenbrock_Adan](https://user-images.githubusercontent.com/10042844/187641147-5d422b3d-a503-4865-a019-8ebd0a4aa54b.png) It seems that many optimizers can reach the optimal point, but the practical performance varies greatly, such as Adam and...

About the convergence trend comparison with Adamw in ViT-H

@haihai-00 Hi, I suggest referring to the HPs we use for ViT-B and ViT-S. At least, you may try the default betas (0.98,0.92,0.99) and set wd to 0.02. To make...