Ozawa

Results 5 comments of Ozawa

> hi @iMountTai The two models can be different as long as the actor is same as the initial model (the one trained in SFT stage), and the critic is...

I checked the huggingface data set, and it seems that there is no data. I don’t know if it is related to this.

> Hi @Ozawa333 , I have contacted the authors of the dataset and this issue has been resolved. Thank you so much!