hector comments

Results 5 comments of


                                            hector

Is there a pre-trained model we could query?

I want to know where to download the pre-trained model. Additionally, could you provide the model of Aristorobertav7 you used？The model downloaded on the HuggingFace is difficult to achieve the...

The difference between multilingual-e5-base and e5-base

Thanks for your reply! I found that only Multilingual-E5-base model is provided on HuggingFace. Whether the Multilingual-E5-large version has been open? If so, could you please provide me with the...

Qwen: Deepspeed(Zero3) + DPO error

> Hi, have you solved the issue? I have met the exact same problem. 我也刚在尝试，发现只要用_prepare_deepspeed载入ref_model情况下，就会报错，开始zero2/zero3都尝试过 : (

Qwen: Deepspeed(Zero3) + DPO error

> 可能就是offload问题，我用stage3 no offoad就可以了。 > > ``` > deepspeed --master_port 25002 --include "localhost:4,5,6,7" src/train.py \ > --model_name_or_path ${model_path} \ > --stage 'dpo' \ > --do_train \ > --finetuning_type 'full' \...

adapter_name_or_path 继续训练sft的adapter

可以去仔细看下代码，我的理解是如果create_new_adapter为True，则会在代码中将adapter_name_or_path与基模先merge起来，然后再添加新的adapter进行训练，反之create_new_adapter为False，则是在adapter_name_or_path基础上进一步训练，最后保存优化后的adapter，两种方法的本质效果是一致的，但第一种会产生两个adapter参数，第二种就只有一个adapter参数，不知道我的理解是否正确 : )