Jiali MA issues

Repositories
Issues
Comments

Results 4 issues of


                                            Jiali MA

Connection between fmp and task branches

May I ask what should be the correct dimension of grads? Suppose the output feature map size is 100, and the batch_size is 512, should the rep and grads be...

Batch ensemble

Thanks for the great work! May I ask how did you re-implement the code for BatchEnsemble? Since the original official implementation is in Tensorflow, is there any Pytorch resources to...

SWA with distributed training

In case of distributed training, e.g. DDP, each gpu will only process a minibatch, and the bn statistics computed in each gpu are different. When SWA is adopted, we need...

Train with weight decay and momentum

I'm using SLS to train my own model, but I found it's different to train with plain SGD or SGD+wd+mom. When I use plain SGD, the step size increase at...