Rajeev Goel
Results
1
issues of
Rajeev Goel
Using torch.bfloat16 to prevent overflow. Float16 has three less integer bits compared to bfloat16 which causes NaN loss and NaN grad norms during AMP training. This seems to be a...