XuanRen4470 comments

Repositories
Issues
Comments

Results 3 comments of


                                            XuanRen4470

究竟怎么做dpo呀

> DPO 不是用来刷数据集准确率的但是我记得dpo是可以拿来提高模型能力的呀？还有dpo具体的流程究竟是什么呀？我现在加了一个merge sft lora的操作好像准确率有提高。可是readme的dpo example里没有提到merge lora。我现在inference和train的流程和readme里全都不一样但是准确率好像高了一些。

mat1 and mat2 must have the same dtype, but got Float and Half

same error, any solution?

Support for mutliturn online RL training

same for me. i am also working on multi turn rl . my wechat is x34ren. could you please add me to the group?