Rajeev Goel issues

Repositories
Issues
Comments

Results 1 issues of


                                            Rajeev Goel

Using torch.bfloat16 to prevent overflow instead of default fp16 in AMP

Using torch.bfloat16 to prevent overflow. Float16 has three less integer bits compared to bfloat16 which causes NaN loss and NaN grad norms during AMP training. This seems to be a...