[BUG] max_grad_norm not effect
Describe the bug A clear and concise description of what the bug is. deepseed config gradient_clip set as auto max_grad_norm set as 1.0 but it not effects deepspeed version is 0.14.5,when i change to 0.15.3,0.15.4,it has the same quesiton. I use Firefly sft as the train repo To Reproduce Steps to reproduce the behavior:
- Go to '...'
- Click on '....'
- Scroll down to '....'
- See error
Expected behavior A clear and concise description of what you expected to happen.
ds_report output
Please run ds_report to give us details about your setup.
Screenshots If applicable, add screenshots to help explain your problem.
System info (please complete the following information):
- OS: [e.g. Ubuntu 18.04]
- GPU count and types [e.g. two machines with x8 A100s each]
- Interconnects (if applicable) [e.g., two machines connected with 100 Gbps IB]
- Python version
- Any other relevant info about your setup
Launcher context
Are you launching your experiment with the deepspeed launcher, MPI, or something else?
Docker context Are you using a specific docker image that you can share?
Additional context Add any other context about the problem here.
it seems that they do not implement it at all
Could some one look into this? I face the same issue.
Any progress? Same Problem.
Or is there any way to clip grads?
hello, @yiyepiaoling0715 @chengmengli06 , do you have solved this?
Same Problem