pix2pixHD icon indicating copy to clipboard operation
pix2pixHD copied to clipboard

question about fp16

Open najingligong1111 opened this issue 5 years ago • 1 comments

Warning: multi_tensor_applier fused unscale kernel is unavailable, possibly because apex was installed without --cuda_ext --cpp_ext. Using Python fallback. Original ImportError was: ModuleNotFoundError("No module named 'amp_C'") Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 32768.0 Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 16384.0 Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 8192.0 Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4096.0 Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2048.0 Traceback (most recent call last): File "train.py", line 85, in with amp.scale_loss(loss_G, optimizer_G) as scaled_loss: scaled_loss.backward()
File "/home/cvv/anaconda3/envs/pytorch15/lib/python3.7/contextlib.py", line 119, in exit next(self.gen) File "/home/cvv/anaconda3/envs/pytorch15/lib/python3.7/site-packages/apex/amp/handle.py", line 123, in scale_loss optimizer._post_amp_backward(loss_scaler) File "/home/cvv/anaconda3/envs/pytorch15/lib/python3.7/site-packages/apex/amp/_process_optimizer.py", line 249, in post_backward_no_master_weights post_backward_models_are_masters(scaler, params, stashed_grads) File "/home/cvv/anaconda3/envs/pytorch15/lib/python3.7/site-packages/apex/amp/_process_optimizer.py", line 135, in post_backward_models_are_masters scale_override=(grads_have_scale, stashed_have_scale, out_scale)) File "/home/cvv/anaconda3/envs/pytorch15/lib/python3.7/site-packages/apex/amp/scaler.py", line 184, in unscale_with_stashed out_scale/stashed_have_scale) File "/home/cvv/anaconda3/envs/pytorch15/lib/python3.7/site-packages/apex/amp/scaler.py", line 148, in unscale_with_stashed_python self.dynamic) File "/home/cvv/anaconda3/envs/pytorch15/lib/python3.7/site-packages/apex/amp/scaler.py", line 22, in axpby_check_overflow_python cpu_sum = float(model_grad.float().sum()) RuntimeError: CUDA error: an illegal memory access was encountered

najingligong1111 avatar Feb 17 '21 01:02 najingligong1111

same issue...

syfbme avatar Feb 25 '21 02:02 syfbme