gouqi_nju
gouqi_nju
hello, have you solved it? I also met this error and my torch version is 2.1.0 @chaofanl
how strange, my accelerate version is 0.25.0
It works, Thank you ver much!!!
yes @TobyGE , it works for me
hello, do you solve it? my average reward is still not increasing during training. > > According to readme, "We have found that it is very unstable to use different...
> I found there is a problem. The problem may be in the model.make_experenice. I printed the actor generate seq in training and found the actor sample was so bad....
听下model card给的example就知道了,效果确实要差很多啊,训练数据也少了很多
@CaishuiKing +1,分数很集中,有解决了吗
@wulongjian , hello, it still doesn't work, pip list: transformers 4.49.0.dev0 transformers-stream-generator 0.0.5 triton 3.1.0 trl 0.15.1 typeguard 4.4.1 typer 0.15.1 typing_extensions 4.12.2 tyro 0.9.13 tzdata 2025.1 uc-micro-py 1.0.3 urllib3...
@SharonJin422 hello,想问下解决了吗