FastConvMAE
FastConvMAE copied to clipboard
NaN question
During the pretraining for imageNet data, I got the Nan error for epoch 186. [10:41:36.508034] Loss is nan, stopping training
Can you explain how I should fix this error?
Switch to FP 32 optimization by resuming from the nearest checkpoint.
delete this line (with torch.cuda.amp.autocast():) to close fp16