Ozawa
Results
5
comments of
Ozawa
> hi @iMountTai The two models can be different as long as the actor is same as the initial model (the one trained in SFT stage), and the critic is...
I checked the huggingface data set, and it seems that there is no data. I don’t know if it is related to this.
> Hi @Ozawa333 , I have contacted the authors of the dataset and this issue has been resolved. Thank you so much!