Biały Wilk
Biały Wilk
### Is there an existing issue for this? - [X] I have searched the existing issues ### Current Behavior 4张卡,一共100W数据 参数设置: per_device_train_batch_size=4 gradient_accumulation_steps=128 这样batch就能大概有2048,三四天能跑完这100W数据。 但是实际中发现,loss降不动,一直230多左右徘徊。有没有大神遇到过啊 ### Expected Behavior _No response_...
### Is there an existing issue for this? - [X] I have searched the existing issues ### Current Behavior 使用nf4量化载入显示: You are loading your model in 8bit or 4bit but...
### System Info ubuntu22.04 one Nvidia A800 driver info: 470.141.10 cuda: 12.3 tensorrt: 9.2.0.5 ### Who can help? _No response_ ### Information - [X] The official example scripts - [...
### System Info GPU: A100 ### Who can help? _No response_ ### Information - [ ] The official example scripts - [X] My own modified scripts ### Tasks - [...