mynewstart

Results 12 issues of mynewstart

请问多任务微调时,预训练使用训练数据性能变差了怎么办(大概100M数据)。此外如果想在预训练时利用cls,预料应该如何组织?在源代码preprocess.py中,target参数没有了cls选项?请问target='mt'表示什么?

### 🐛 Describe the bug When I save model, have error: ``` Traceback (most recent call last): File "train_sft.py", line 190, in train(args) File "train_sft.py", line 160, in train trainer.save_model(path=args.save_path,...

bug

请问有人试过用DeepSpeed Chat的代码基于ZERO2+FP16微调吗,发现会下溢出, 换成BF16后同样发现embeddng 层某些参数变为0,导致后面梯度为nan,没法更新参数,请问怎么解决?

### Required prerequisites - [X] I have read the documentation . - [X] I have searched the [Issue Tracker](https://github.com/baichuan-inc/baichuan-7B/issues) and [Discussions](https://github.com/baichuan-inc/baichuan-7B/discussions) that this hasn't already been reported. (+1 or comment...

question

### Required prerequisites - [X] I have read the documentation . - [X] I have searched the [Issue Tracker](https://github.com/baichuan-inc/baichuan-7B/issues) and [Discussions](https://github.com/baichuan-inc/baichuan-7B/discussions) that this hasn't already been reported. (+1 or comment...

question

我使用没整合huggingface Trainer的code,使用deepspeed zero3训练llama-30b的模型,max_length=512, bz=4, gradient_accumulation_steps=8, 在A100 80G的卡上可以正常训练。但是我发现用现在的新代码,同样llama-30b的模型,max_length=512,同样使用zero3,但是bz=1都会报错OOM。 请问这主要是什么原因导致的? ### 旧code训练的commad: OUTPUT="./output/actor-models/llama-30b-blend-data-test" ZERO_STAGE=3 echo $OUTPUT echo $ZERO_STAGE data_output_path=./output/actor-models/data_files deepspeed main.py \ --sft_only_data_path xxx \ --model_name_or_path decapoda-research/llama-30b-hf \ --per_device_train_batch_size 4 \ --per_device_eval_batch_size...

Hi, Thanks for providing this work, I found I can't open this data download [link](https://cutt.ly/m3exam-data), could you tell me wgat should I do?

请问在使用评测数据(Vicuna80)时,需要先构造一个expert agent吗 还是这接将原问题喂入模型?