Deep-Mutual-Learning
Deep-Mutual-Learning copied to clipboard
An unofficial implementation of 《Deep Mutual Learning》 by Pytorch to do classification on cifar100.
when updating the sub network, is there any need to retain graph like `loss.backward(retain_graph=True)` because when i reproduce the procedure, the code runs wrong, but i dont know if retaining...
hi, i have a problem when training recognition task, the kl loss become stable but getting larger , and other loss is getting lower, did you have this problem when...
https://github.com/weiaicunzai/pytorch-cifar100 resnet34 got 23.24 error rate and much higher in self distillation https://github.com/luanyunteng/pytorch-be-your-own-teacher
Hi, In trainer.py, Line 201-Line208, `for i in range(self.model_num): ce_loss = self.loss_ce(outputs[i], labels) kl_loss = 0 for j in range(self.model_num): if i!=j: kl_loss += self.loss_kl(F.log_softmax(outputs[i], dim = 1), F.softmax(Variable(outputs[j]), dim=1))...
How is the hyper-parameters like?