Deep-Mutual-Learning issues

How to add this kl in my model

Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time.

3

when updating the sub network, is there any need to retain graph like `loss.backward(retain_graph=True)` because when i reproduce the procedure, the code runs wrong, but i dont know if retaining...

shaoeric

the kl loss become stable but getting larger

1

hi, i have a problem when training recognition task, the kl loss become stable but getting larger , and other loss is getting lower, did you have this problem when...

niliusha123

why the baseline accuracy is much less than regular resnet on cifar100？

1

https://github.com/weiaicunzai/pytorch-cifar100 resnet34 got 23.24 error rate and much higher in self distillation https://github.com/luanyunteng/pytorch-be-your-own-teacher

JiyueWang

Are the items in KL divergence disordered?

1

Hi, In trainer.py, Line 201-Line208, `for i in range(self.model_num): ce_loss = self.loss_ce(outputs[i], labels) kl_loss = 0 for j in range(self.model_num): if i!=j: kl_loss += self.loss_kl(F.log_softmax(outputs[i], dim = 1), F.softmax(Variable(outputs[j]), dim=1))...

franciszchen

The results can not be reproduced

1

How is the hyper-parameters like?

Vincent-Hoo

Deep-Mutual-Learning
Deep-Mutual-Learning copied to clipboard

Metadata

How to add this kl in my model

Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time.

the kl loss become stable but getting larger

why the baseline accuracy is much less than regular resnet on cifar100？

Are the items in KL divergence disordered?

The results can not be reproduced

← Metadata

Owner

Metadata

Deep-Mutual-Learning Deep-Mutual-Learning copied to clipboard

Metadata

How to add this kl in my model

Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time.

the kl loss become stable but getting larger

why the baseline accuracy is much less than regular resnet on cifar100？

Are the items in KL divergence disordered?

The results can not be reproduced

← Metadata

Owner

Metadata

Deep-Mutual-Learning
Deep-Mutual-Learning copied to clipboard