InsightFace_Pytorch Loss of ArcFaceLoss became nan after the loss.backward

My training code like this, metric is ArcFace,:

feature, output = model(images, targets=labels) arc_output = metric(feature, labels) loss = loss_CE(arc_output, labels) print(metric.kernel) loss.backward() optimizer.step() print(metric.kernel)

Before the loss.backward, metric.kernel is the normal value, but when I perform the optimizer.step(), the value of metric.kernel will became nan and the loss will keep the same value forever. I notice that your torch version seems to be 0.4 My torch version is 1.0 with python3.5

Can you help me about this?

Aug 16 '19 06:08 graycrown

Have you solved the problem? Thanks for your reply.

Dec 31 '19 07:12 XiXiRuPan

You can solve it with changing a line in Arcface like this: sin_theta = torch.sqrt(sin_theta_2) to sin_theta = torch.sqrt(sin_theta_2 + 1e-8) . It is also useful to set torch.autograd.set_detect_anomaly(True) in such cases to find the source of the problem

Mar 23 '21 00:03 pollytur