Jiali MA

Results 4 issues of Jiali MA

May I ask what should be the correct dimension of grads? Suppose the output feature map size is 100, and the batch_size is 512, should the rep and grads be...

Thanks for the great work! May I ask how did you re-implement the code for BatchEnsemble? Since the original official implementation is in Tensorflow, is there any Pytorch resources to...

In case of distributed training, e.g. DDP, each gpu will only process a minibatch, and the bn statistics computed in each gpu are different. When SWA is adopted, we need...

I'm using SLS to train my own model, but I found it's different to train with plain SGD or SGD+wd+mom. When I use plain SGD, the step size increase at...