Reward did not converge

Open tju-xuan opened this issue 7 months ago • 1 comments

Hello, author. As shown in the figure, when I tried to reproduce your work, I encountered the issue that the reward did not converge and the final metrics were quite different from those reported in the paper. Could you please let me know what might be causing this? Is the reward setting effective? And is the code in this repository completely correct?

Sep 19 '25 00:09 tju-xuan

the reward curve:

Sep 19 '25 01:09 tju-xuan