SupContrast icon indicating copy to clipboard operation
SupContrast copied to clipboard

fix a bug, which has the little probability of producing nan in the loss

Open yuanlonghui opened this issue 3 years ago • 2 comments

This is the simplest way to prevent calculating log(0). And this is nesseceray when the embedding dim is large. When feature representations have very high dimensions, the maximum inner product is very likely to be the inner product with itself. After subtracting the maximum value, this will result in a lot of negative values in non-diagonal positions. This means that after exp(), it's very likely to be zero anywhere but the diagonal. In this case, since the diagonal position is not considered inside the log(), there is a probability that log(0) will be computed, resulting in nan.

yuanlonghui avatar May 09 '22 13:05 yuanlonghui

This is the simplest way to prevent calculating log(0). And this is nesseceray when the embedding dim is large. When feature representations have very high dimensions, the maximum inner product is very likely to be the inner product with itself. After subtracting the maximum value, this will result in a lot of negative values in non-diagonal positions. This means that after exp(), it's very likely to be zero anywhere but the diagonal. In this case, since the diagonal position is not considered inside the log(), there is a probability that log(0) will be computed, resulting in nan.

Thank you very much for solving the problem that loss is NaN. Will your loss become higher and higher when you train? I look forward to your reply!

Dara-to-win avatar Mar 13 '23 03:03 Dara-to-win

This is the simplest way to prevent calculating log(0). And this is nesseceray when the embedding dim is large. When feature representations have very high dimensions, the maximum inner product is very likely to be the inner product with itself. After subtracting the maximum value, this will result in a lot of negative values in non-diagonal positions. This means that after exp(), it's very likely to be zero anywhere but the diagonal. In this case, since the diagonal position is not considered inside the log(), there is a probability that log(0) will be computed, resulting in nan.

Thank you very much for solving the problem that loss is NaN. Will your loss become higher and higher when you train? I look forward to your reply!

Did you solve this?

yaoerqin avatar Mar 24 '24 06:03 yaoerqin