[BUG]: Parameter `model.norm.weight` failed at the gradient reduction.
🐛 Describe the bug
File "/usr/local/lib/miniconda3/envs/colo/lib/python3.10/site-packages/colossalai/tensor/colo_tensor.py", line 81, in torch_function File "/usr/local/lib/miniconda3/envs/colo/lib/python3.10/site-packages/colossalai/tensor/colo_tensor.py", line 81, in torch_function return backward_tensor.backward(**tensor_kwargs) File "/usr/local/lib/miniconda3/envs/colo/lib/python3.10/site-packages/torch/_tensor.py", line 487, in backward result = torch_func_method(public_api, types, args, kwargs) File "/usr/local/lib/miniconda3/envs/colo/lib/python3.10/site-packages/colossalai/tensor/colo_tensor.py", line 81, in torch_function return backward_tensor.backward(**tensor_kwargs)return backward_tensor.backward(**tensor_kwargs)
File "/usr/local/lib/miniconda3/envs/colo/lib/python3.10/site-packages/torch/_tensor.py", line 487, in backward File "/usr/local/lib/miniconda3/envs/colo/lib/python3.10/site-packages/torch/_tensor.py", line 487, in backward return backward_tensor.backward(**tensor_kwargs) File "/usr/local/lib/miniconda3/envs/colo/lib/python3.10/site-packages/torch/_tensor.py", line 487, in backward torch.autograd.backward( File "/usr/local/lib/miniconda3/envs/colo/lib/python3.10/site-packages/torch/autograd/init.py", line 200, in backward torch.autograd.backward(torch.autograd.backward(
File "/usr/local/lib/miniconda3/envs/colo/lib/python3.10/site-packages/torch/autograd/init.py", line 200, in backward File "/usr/local/lib/miniconda3/envs/colo/lib/python3.10/site-packages/torch/autograd/init.py", line 200, in backward Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass File "/usr/local/lib/miniconda3/envs/colo/lib/python3.10/site-packages/colossalai/zero/gemini/gemini_ddp.py", line 344, in grad_handle torch.autograd.backward( File "/usr/local/lib/miniconda3/envs/colo/lib/python3.10/site-packages/torch/autograd/init.py", line 200, in backward Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward passVariable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
File "/usr/local/lib/miniconda3/envs/colo/lib/python3.10/site-packages/colossalai/zero/gemini/gemini_ddp.py", line 344, in grad_handle
File "/usr/local/lib/miniconda3/envs/colo/lib/python3.10/site-packages/colossalai/zero/gemini/gemini_ddp.py", line 344, in grad_handle
raise RuntimeError(
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward passRuntimeError
: File "/usr/local/lib/miniconda3/envs/colo/lib/python3.10/site-packages/colossalai/zero/gemini/gemini_ddp.py", line 344, in grad_handle
Parameter model.norm.weight failed at the gradient reduction. Some unsupported torch function is operated upon this parameter.
raise RuntimeError(raise RuntimeError(
RuntimeErrorRuntimeError: : Parameter model.norm.weight failed at the gradient reduction. Some unsupported torch function is operated upon this parameter.Parameter model.norm.weight failed at the gradient reduction. Some unsupported torch function is operated upon this parameter.
when I chose gemini and use_flash_attn at the same time, the error occurs.
Environment
python=10.8 pytorch=2.0.0 cuda=11.8 flash_attn=2.0.5