message histories in Agentic RL for reasoning model

Open GGGGxxxxxxxxr opened this issue 5 months ago • 0 comments

System Info

MULTI A100 * 8 Nodes

Information

[ ] The official example scripts
[x] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[x] My own task or dataset (give details below)

Reproduction

Hello!

I have been implementing the Agentic RL for a reasoning model these days.

And the ideal way for me to call a reasoning model for multi-turn interaction would be + <user_0> + <answer_only_0> + <user_1> + ... The previous reasoning tokens shall be omitted for the current step in order to maintain a healthy context management.

However, in the current "tool_agent_loop", I think the entire responses tokens (reasoning + answer) would be appended into the "prompt_ids", the same for various training-assisted masks.

I just would like to check whether I understand it correctly or not. Because I have been stuck at this issue for several days.

Expected behavior

As described above.

Nov 13 '25 00:11 GGGGxxxxxxxxr