Agentic RL Training example: KeyError: 'solve_equation'
System Info
verl: 0.5.0 python: 3.12
(AgentLoopWorker pid=51457) Initialized tools: {'calc_gsm8k_reward': <verl.tools.gsm8k_tool.Gsm8kTool object at 0x7ecbd476e090>} (AgentLoopWorker pid=51453) 2025-11-14 03:43:10 ERROR [.../verl-0.5.0/verl/experimental/agent_loop/tool_agent_loop.py] Error when executing tool: 'solve_equation' (AgentLoopWorker pid=51453) Traceback (most recent call last): (AgentLoopWorker pid=51453) File ".../verl-0.5.0/verl/experimental/agent_loop/tool_agent_loop.py", line 143, in _call_tool (AgentLoopWorker pid=51453) tool = self.tools[tool_name] (AgentLoopWorker pid=51453) ~~~~~~~~~~^^^^^^^^^^^ (AgentLoopWorker pid=51453) KeyError: 'solve_equation' (AgentLoopWorker pid=51452) 2025-11-14 03:43:02 INFO [alembic.runtime.migration] Context impl SQLiteImpl. [repeated 14x across cluster] (AgentLoopWorker pid=51452) 2025-11-14 03:43:02 INFO [alembic.runtime.migration] Will assume non-transactional DDL. [repeated 14x across cluster] (AgentLoopWorker pid=51452) 2025/11/14 03:43:02 INFO mlflow.store.db.utils: Creating initial MLflow database tables... [repeated 6x across cluster] (AgentLoopWorker pid=51452) 2025/11/14 03:43:02 INFO mlflow.store.db.utils: Updating database tables [repeated 6x across cluster] (AgentLoopWorker pid=51454) 2025-11-14 03:43:10 ERROR [.../verl-0.5.0/verl/experimental/agent_loop/tool_parser.py] Failed to decode tool call: Expecting value: line 2 column 1 (char 1) (AgentLoopWorker pid=51457) 2025-11-14 03:43:16 ERROR [.../verl-0.5.0/verl/experimental/agent_loop/tool_parser.py] Failed to decode tool call: Expecting value: line 2 column 1 (char 1) [repeated 8x across cluster] (AgentLoopWorker pid=51458) 2025-11-14 03:43:19 ERROR [.../verl-0.5.0/verl/experimental/agent_loop/tool_parser.py] Failed to decode tool call: Expecting ',' delimiter: line 2 column 76 (char 76) (AgentLoopWorker pid=51458) 2025-11-14 03:43:19 ERROR [.../verl-0.5.0/verl/experimental/agent_loop/tool_parser.py] Failed to decode tool call: Expecting ',' delimiter: line 2 column 76 (char 76) (AgentLoopWorker pid=51458) 2025-11-14 03:43:20 ERROR [.../verl-0.5.0/verl/experimental/agent_loop/tool_agent_loop.py] Error when executing tool: 'solve_equation' (AgentLoopWorker pid=51458) Traceback (most recent call last): (AgentLoopWorker pid=51458) File ".../verl-0.5.0/verl/experimental/agent_loop/tool_agent_loop.py", line 143, in _call_tool (AgentLoopWorker pid=51458) tool = self.tools[tool_name] (AgentLoopWorker pid=51458) ~~~~~~~~~~^^^^^^^^^^^ (AgentLoopWorker pid=51458) KeyError: 'solve_equation' (AgentLoopWorker pid=51451) 2025-11-14 03:43:20 ERROR [.../verl-0.5.0/verl/experimental/agent_loop/tool_parser.py] Failed to decode tool call: Extra data: line 3 column 1 (char 61) (AgentLoopWorker pid=51452) 2025-11-14 03:43:21 ERROR [.../verl-0.5.0/verl/experimental/agent_loop/tool_parser.py] Failed to decode tool call: Expecting value: line 2 column 1 (char 1) [repeated 56x across cluster] (AgentLoopWorker pid=51457) 2025-11-14 03:43:22 ERROR [.../verl-0.5.0/verl/experimental/agent_loop/tool_parser.py] Failed to decode tool call: Expecting ',' delimiter: line 2 column 132 (char 132) [repeated 4x across cluster] (AgentLoopWorker pid=51452) 2025-11-14 03:43:21 ERROR [.../verl-0.5.0/verl/experimental/agent_loop/tool_agent_loop.py] Error when executing tool: 'solve_equation' (AgentLoopWorker pid=51452) Traceback (most recent call last): (AgentLoopWorker pid=51452) File ".../verl-0.5.0/verl/experimental/agent_loop/tool_agent_loop.py", line 143, in _call_tool (AgentLoopWorker pid=51452) tool = self.tools[tool_name] (AgentLoopWorker pid=51452) ~~~~~~~~~~^^^^^^^^^^^
Information
- [ ] The official example scripts
- [ ] My own modified scripts
Tasks
- [ ] An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - [ ] My own task or dataset (give details below)
Reproduction
install mlflow to view toolcall and llm trace
pip install mlflow
This will download and preprocess the GSM8K dataset into ~/data/gsm8k/ and add the "agent_name" field.
python examples/data_preprocess/gsm8k_tool_agent_loop.py
Start training with tool calls and enabled mlflow based trace helping to debug the rollout details
bash examples/sglang_multiturn/run_qwen2.5-3b_gsm8k_tool_agent_mlflow.sh
Expected behavior
run success
Actually, there is a try catch that avoid training failure, if the tool name not exist, the reward will be 0.0.
In this case, the model will be able to know why from the Tool Response, so my fix to validate if the tool name is in the list or not and return Failed Tool Response is not necessary. I will close my pr, wdyt? @wuxibin89