Implement Structured Tool
This PR introduces a structured Tool type, addressing #308.
The implementation wraps ToolSpec ([String: Any]) inside a more type-safe Tool struct. I’ve added a basic unit test and refactored the LLMEval project to use the new Tool API.
I’m opening this PR early for discussion and feedback. Please feel free to test the changes and share thoughts.
Next steps for this PR:
- Add tool call handling
- Add
toolmessage role - Include an example app demonstrating multiple tools
- Expand unit test coverage
- Add documentation
This is an initial implementation and may change. Please feel free to share any feedback
This looks really good. I am interested to see how it hooks back to actually call the tool -- we pass the schemas in to the tokenizer but somehow we would have to hold on to the tools during the evaluation and make the calls (and I presume inject the output into the token stream?)
@davidkoski, thanks for the feedback!
I'm currently working on tool call parsing and handling. The current implementation pauses generation when it detects a <tool_call> tag, and attempts to parse it into a ToolCall struct. This is handled by a new ToolCallProcessor which monitors the stream for a < character. If found, it starts buffering chunks until it reaches a </tool_call> tag. If the collected toolCallBuffer forms a valid tool call, it decodes it into a ToolCall and yields it as Generation.toolCall.
I've also introduced a new tool role to the Message model to represent tool call results.
For now, I’ve integrated a few basic tools into the LLMEval app to demonstrate the functionality: a weather tool, an add tool, and a time tool. (You can try them out, they are functional tools (the weather tool simply returns random weather info)). These are temporary. I'm planning to remove them and instead create a dedicated example app to showcase tool usage more clearly.
This PR is still a work in progress. The current implementation reflects my initial approach, and I’m open to rethinking parts of it based on feedback. I haven’t tested all edge cases yet, so there may still be issues to resolve. I’ll continue refining it over the week.
Let me know what you think!