wanghanbin
Results
2
issues of
wanghanbin
在使用transformers 4.35.0版本时,会产生错误:AttributeError: 'BaichuanTokenizer' object has no attribute 'sp_model'. 切换到4.30.0版本时可以正常运行~
Hello! When calculating policy loss, Contains [entropy loss](https://github.com/volcengine/verl/blob/65cceb3c9dbe6e18230bc3dc045af6e3280c9752/verl/workers/actor/dp_actor.py#L294), is Maximum Entropy Reinforcement Learning used? entropy loss curve is as follows: val/test score is as follows: entropy loss added does not...