mynewstart

Results 27 comments of mynewstart

> 1. batch_size低的话,对模型训练会有一定的影响,不过不大。显存不够的话,可以尝试使用小模型。 > 2. 现在只训练了1000步,应该继续训练,acc_mlm到70甚至80以上,再去下游任务微调,可能会有更好的效果。mlm这个任务比nsp更难,一般在大语料上,acc_mlm不会超过90 > 3. 建议直接用一个经过shuffle的大语料进行训练,拆成多个小语料没有必要,效果可能会变差,因为模型会更偏向于学习后面训练的语料 请问预训练是不是不支持wwm,如果要对robert-wwm进行预训练,直接使用现在的版本对效果有影响吗?如果不用nsp任务,预训练数据的格式可以是一行吗?

try to use 'self_attn.o_proj', 'mlp.down_proj'?

My solution is to save checkpoints by myself or you can use [zero_to_fp32](https://github.com/microsoft/DeepSpeed/blob/master/deepspeed/utils/zero_to_fp32.py)

好的,请问你们也是有类似的体验吗?用之前deepspeed-chat的代码能训练更大的模型?

> > Changing that parameter fundamentally changes the model > > > same problem It's similar to #4094 > > > > > > 1. I modify the `num_experts_per_tok` to...

Will the inference speed slow down if I understand correctly, and will the model's performance deteriorate? > > > > Changing that parameter fundamentally changes the model > > >...

I can fully fine-tune Mistral7b*8 instruct with deepspeed zero3 on 2 A100-80GB instances, the code won't hook and run smoothly. I didn't change anything except disabling the evaluation part to...

> save_mp_checkpoint_path= Hi @RezaYazdaniAminabadi , Thanks for your contribution. I used this script and met the following issue. My environment is deepspeed=0.12.3, transformers=4.34.0,torch=2.0.1, instance is p4de. Could you help know...