Loss of ArcFaceLoss became nan after the loss.backward
My training code like this, metric is ArcFace,:
feature, output = model(images, targets=labels) arc_output = metric(feature, labels) loss = loss_CE(arc_output, labels) print(metric.kernel) loss.backward() optimizer.step() print(metric.kernel)
Before the loss.backward, metric.kernel is the normal value, but when I perform the optimizer.step(), the value of metric.kernel will became nan and the loss will keep the same value forever. I notice that your torch version seems to be 0.4 My torch version is 1.0 with python3.5
Can you help me about this?
Have you solved the problem? Thanks for your reply.
You can solve it with changing a line in Arcface like this: sin_theta = torch.sqrt(sin_theta_2) to sin_theta = torch.sqrt(sin_theta_2 + 1e-8) .
It is also useful to set torch.autograd.set_detect_anomaly(True) in such cases to find the source of the problem