ruihan0495 comments

Results 10 comments of


                                            ruihan0495

supervised finetune in chinese

It's under training/utils folder, called ds_utils.py

Model performance suprisingly bad

Makes sense :D @s-isaev Do you have any suggestions for inference parameters?

Could this tool apply for encoder-decoder model, like Flan-T5?

Hard to say. We changed the model to our pretrained GPT, it kept crash during training. I guess there are a lot of things to tune in ds_utils.py to make...

Could this tool apply for encoder-decoder model, like Flan-T5?

Our Model is a GPT model.

Step 3 failed for customized GPT

All right, in addition to this, after we turned off enable-hybrid-engine, we encounter another error "Exception: Current loss scale already at minimum - cannot decrease scale anymore. Exiting run." The...

Step 3 failed for customized GPT

Dear @LuciusMos , We found that our problem was due to the tokenizer encoding process. Say our tokenizer has maximum length of 6666, but sometimes it encode some input strings...

Step 3 failed for customized GPT

可以在main.py组batch的时候做。这个loss是inf可能是有bug了如果排查了没有bug的话试着调一调lr和batch size，还可以调一下ds_config里面的训练精度之类的

Step 3 failed for customized GPT

We tried using BF16 instead of FP16 during training, though there is no more loss scale error, the training is very unstable. So the best way is still stick to...

Step 3: RuntimeError: CUDA error: misaligned address

maybe see this issue https://github.com/microsoft/DeepSpeedExamples/issues/335#issuecomment-1521105300

Claude?

> anyone who has use transformers models such as ChatGLM, baichuan-7b run past? i tested many bugs i am about to try... have u fixed them already?