Swin-Transformer
Swin-Transformer copied to clipboard
Why do you use different weight_decay ?
https://github.com/microsoft/Swin-Transformer/blob/main/optimizer.py#:~:text=def%20set_weight_decay(model,no_decay%2C%20%27weight_decay%27%3A%200.%7D%5D, in this code snippet you set some layers weight decay to zero. Is there any reason why you do this? Does it has a high impact on training results?
Btw thank you for your work :)
hope for your test result