Confusion about the kd_ce_loss

Open fengjiejiejiejie opened this issue 2 years ago • 0 comments

Hi, thanks for your impressive paper and codes, but I'm confused about the kd_ce_loss:

def FNKD(self, student_outputs, teacher_outputs, student_feature, teacher_feature): student_L2norm = torch.norm(student_feature) teacher_L2norm = torch.norm(teacher_feature) q_fn = F.log_softmax(teacher_outputs / teacher_L2norm, dim=1) to_kd = F.softmax(student_outputs / student_L2norm, dim=1) KD_ce_loss = self.ce( q_fn, to_kd[:, 0].long()) return KD_ce_loss

Why use self.ce after the softmax/log_softmax? Is to_kd[:, 0] just using the first channel of the student_outputs (background)? Maybe the following is right?

q_fn = F.log_softmax(teacher_outputs / T, dim=1) to_kd = F.softmax(student_outputs / T, dim=1) KD_ce_loss = -torch.mean(torch.sum(to_kd * q_fn, dim=1))

best, fj

Jun 25 '23 09:06 fengjiejiejiejie