question about fp16
Warning: multi_tensor_applier fused unscale kernel is unavailable, possibly because apex was installed without --cuda_ext --cpp_ext. Using Python fallback. Original ImportError was: ModuleNotFoundError("No module named 'amp_C'")
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 32768.0
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 16384.0
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 8192.0
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4096.0
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2048.0
Traceback (most recent call last):
File "train.py", line 85, in
File "/home/cvv/anaconda3/envs/pytorch15/lib/python3.7/contextlib.py", line 119, in exit
next(self.gen)
File "/home/cvv/anaconda3/envs/pytorch15/lib/python3.7/site-packages/apex/amp/handle.py", line 123, in scale_loss
optimizer._post_amp_backward(loss_scaler)
File "/home/cvv/anaconda3/envs/pytorch15/lib/python3.7/site-packages/apex/amp/_process_optimizer.py", line 249, in post_backward_no_master_weights
post_backward_models_are_masters(scaler, params, stashed_grads)
File "/home/cvv/anaconda3/envs/pytorch15/lib/python3.7/site-packages/apex/amp/_process_optimizer.py", line 135, in post_backward_models_are_masters
scale_override=(grads_have_scale, stashed_have_scale, out_scale))
File "/home/cvv/anaconda3/envs/pytorch15/lib/python3.7/site-packages/apex/amp/scaler.py", line 184, in unscale_with_stashed
out_scale/stashed_have_scale)
File "/home/cvv/anaconda3/envs/pytorch15/lib/python3.7/site-packages/apex/amp/scaler.py", line 148, in unscale_with_stashed_python
self.dynamic)
File "/home/cvv/anaconda3/envs/pytorch15/lib/python3.7/site-packages/apex/amp/scaler.py", line 22, in axpby_check_overflow_python
cpu_sum = float(model_grad.float().sum())
RuntimeError: CUDA error: an illegal memory access was encountered
same issue...