mynewstart issues

Results 12 issues of


                                            mynewstart

多任务微调时的预训练

请问多任务微调时，预训练使用训练数据性能变差了怎么办（大概100M数据）。此外如果想在预训练时利用cls，预料应该如何组织？在源代码preprocess.py中，target参数没有了cls选项？请问target='mt'表示什么？

[BUG]: AttributeError: 'LlamaForCausalLM' object has no attribute 'module'

### 🐛 Describe the bug When I save model, have error: ``` Traceback (most recent call last): File "train_sft.py", line 190, in train(args) File "train_sft.py", line 160, in train trainer.save_model(path=args.save_path,...

bug

FP16 微调overflow

请问有人试过用DeepSpeed Chat的代码基于ZERO2+FP16微调吗，发现会下溢出, 换成BF16后同样发现embeddng 层某些参数变为0，导致后面梯度为nan，没法更新参数，请问怎么解决？

[Question] DeepSpeed Zero3 save_checkpoint() got empty mode_states files

### Required prerequisites - [X] I have read the documentation . - [X] I have searched the [Issue Tracker](https://github.com/baichuan-inc/baichuan-7B/issues) and [Discussions](https://github.com/baichuan-inc/baichuan-7B/discussions) that this hasn't already been reported. (+1 or comment...

question

[Question] 关于数据处理的疑问

question

新的代码会导致OOM

我使用没整合huggingface Trainer的code，使用deepspeed zero3训练llama-30b的模型，max_length=512, bz=4, gradient_accumulation_steps=8, 在A100 80G的卡上可以正常训练。但是我发现用现在的新代码，同样llama-30b的模型，max_length=512，同样使用zero3,但是bz=1都会报错OOM。请问这主要是什么原因导致的？ ### 旧code训练的commad： OUTPUT="./output/actor-models/llama-30b-blend-data-test" ZERO_STAGE=3 echo $OUTPUT echo $ZERO_STAGE data_output_path=./output/actor-models/data_files deepspeed main.py \ --sft_only_data_path xxx \ --model_name_or_path decapoda-research/llama-30b-hf \ --per_device_train_batch_size 4 \ --per_device_eval_batch_size...

mynewstart

多任务微调时的预训练

请问预训练模型是你已经在25万数据集上预训练后的吗

[BUG]: AttributeError: 'LlamaForCausalLM' object has no attribute 'module'

FP16 微调overflow

[Question] DeepSpeed Zero3 save_checkpoint() got empty mode_states files

[Question] 关于数据处理的疑问

新的代码会导致OOM

Can't open the data link

评测数据

请问支持Baichuan 13B吗？