Agentic RL Training example: KeyError: 'solve_equation'

Open PhenixZhang opened this issue 3 months ago • 1 comments

System Info

verl: 0.5.0 python: 3.12

(AgentLoopWorker pid=51457) Initialized tools: {'calc_gsm8k_reward': <verl.tools.gsm8k_tool.Gsm8kTool object at 0x7ecbd476e090>} (AgentLoopWorker pid=51453) 2025-11-14 03:43:10 ERROR [.../verl-0.5.0/verl/experimental/agent_loop/tool_agent_loop.py] Error when executing tool: 'solve_equation' (AgentLoopWorker pid=51453) Traceback (most recent call last): (AgentLoopWorker pid=51453) File ".../verl-0.5.0/verl/experimental/agent_loop/tool_agent_loop.py", line 143, in _call_tool (AgentLoopWorker pid=51453) tool = self.tools[tool_name] (AgentLoopWorker pid=51453) ~~~~~~~~~~^^^^^^^^^^^ (AgentLoopWorker pid=51453) KeyError: 'solve_equation' (AgentLoopWorker pid=51452) 2025-11-14 03:43:02 INFO [alembic.runtime.migration] Context impl SQLiteImpl. [repeated 14x across cluster] (AgentLoopWorker pid=51452) 2025-11-14 03:43:02 INFO [alembic.runtime.migration] Will assume non-transactional DDL. [repeated 14x across cluster] (AgentLoopWorker pid=51452) 2025/11/14 03:43:02 INFO mlflow.store.db.utils: Creating initial MLflow database tables... [repeated 6x across cluster] (AgentLoopWorker pid=51452) 2025/11/14 03:43:02 INFO mlflow.store.db.utils: Updating database tables [repeated 6x across cluster] (AgentLoopWorker pid=51454) 2025-11-14 03:43:10 ERROR [.../verl-0.5.0/verl/experimental/agent_loop/tool_parser.py] Failed to decode tool call: Expecting value: line 2 column 1 (char 1) (AgentLoopWorker pid=51457) 2025-11-14 03:43:16 ERROR [.../verl-0.5.0/verl/experimental/agent_loop/tool_parser.py] Failed to decode tool call: Expecting value: line 2 column 1 (char 1) [repeated 8x across cluster] (AgentLoopWorker pid=51458) 2025-11-14 03:43:19 ERROR [.../verl-0.5.0/verl/experimental/agent_loop/tool_parser.py] Failed to decode tool call: Expecting ',' delimiter: line 2 column 76 (char 76) (AgentLoopWorker pid=51458) 2025-11-14 03:43:19 ERROR [.../verl-0.5.0/verl/experimental/agent_loop/tool_parser.py] Failed to decode tool call: Expecting ',' delimiter: line 2 column 76 (char 76) (AgentLoopWorker pid=51458) 2025-11-14 03:43:20 ERROR [.../verl-0.5.0/verl/experimental/agent_loop/tool_agent_loop.py] Error when executing tool: 'solve_equation' (AgentLoopWorker pid=51458) Traceback (most recent call last): (AgentLoopWorker pid=51458) File ".../verl-0.5.0/verl/experimental/agent_loop/tool_agent_loop.py", line 143, in _call_tool (AgentLoopWorker pid=51458) tool = self.tools[tool_name] (AgentLoopWorker pid=51458) ~~~~~~~~~~^^^^^^^^^^^ (AgentLoopWorker pid=51458) KeyError: 'solve_equation' (AgentLoopWorker pid=51451) 2025-11-14 03:43:20 ERROR [.../verl-0.5.0/verl/experimental/agent_loop/tool_parser.py] Failed to decode tool call: Extra data: line 3 column 1 (char 61) (AgentLoopWorker pid=51452) 2025-11-14 03:43:21 ERROR [.../verl-0.5.0/verl/experimental/agent_loop/tool_parser.py] Failed to decode tool call: Expecting value: line 2 column 1 (char 1) [repeated 56x across cluster] (AgentLoopWorker pid=51457) 2025-11-14 03:43:22 ERROR [.../verl-0.5.0/verl/experimental/agent_loop/tool_parser.py] Failed to decode tool call: Expecting ',' delimiter: line 2 column 132 (char 132) [repeated 4x across cluster] (AgentLoopWorker pid=51452) 2025-11-14 03:43:21 ERROR [.../verl-0.5.0/verl/experimental/agent_loop/tool_agent_loop.py] Error when executing tool: 'solve_equation' (AgentLoopWorker pid=51452) Traceback (most recent call last): (AgentLoopWorker pid=51452) File ".../verl-0.5.0/verl/experimental/agent_loop/tool_agent_loop.py", line 143, in _call_tool (AgentLoopWorker pid=51452) tool = self.tools[tool_name] (AgentLoopWorker pid=51452) ~~~~~~~~~~^^^^^^^^^^^

Information

[ ] The official example scripts
[ ] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction

install mlflow to view toolcall and llm trace

pip install mlflow

This will download and preprocess the GSM8K dataset into ~/data/gsm8k/ and add the "agent_name" field.

python examples/data_preprocess/gsm8k_tool_agent_loop.py

Start training with tool calls and enabled mlflow based trace helping to debug the rollout details

bash examples/sglang_multiturn/run_qwen2.5-3b_gsm8k_tool_agent_mlflow.sh

Expected behavior

run success

Nov 14 '25 03:11 PhenixZhang

Actually, there is a try catch that avoid training failure, if the tool name not exist, the reward will be 0.0.

In this case, the model will be able to know why from the Tool Response, so my fix to validate if the tool name is in the list or not and return Failed Tool Response is not necessary. I will close my pr, wdyt? @wuxibin89

Nov 24 '25 04:11 JobQiu