Habel_Qing

Results 5 comments of Habel_Qing

Encounters the nan loss in stage1. my command is : torchrun --standalone --nproc_per_node=1 train_sft.py \ --pretrain "/home/qing/Yahui_Cai/remote_folder/pretrain/llama-7b" \ --model 'llama' \ --strategy naive \ --log_interval 10 \ --save_path /home/qing/Yahui_Cai/remote_folder/pretrain/Coati-7B \...

> My experience: model.half() adam(eps=1e-8) loss:nan model.half() sgd loss:normal, however, non convergence model.half() adam(eps=1-4) loss:normal, however, non convergence model.half() fp16 loss:normal, however, non convergence model adam(eps=1e-8) loss:normal, convergence Remove .half()...

> 字体的问题,可以安装一下 fonts-noto-cjk fonts-anonymous-pro 字体。 > > 如果是 ubuntu 的话可以: > > ```shell > apt install fonts-noto-cjk fonts-anonymous-pro > ``` > > macOS 的话,可以用 homebrew > > ```shell > brew...