mlx-swift-examples Implement Structured Tool

This PR introduces a structured Tool type, addressing #308.

The implementation wraps ToolSpec ([String: Any]) inside a more type-safe Tool struct. I’ve added a basic unit test and refactored the LLMEval project to use the new Tool API.

I’m opening this PR early for discussion and feedback. Please feel free to test the changes and share thoughts.

Next steps for this PR:

Add tool call handling
Add tool message role
Include an example app demonstrating multiple tools
Expand unit test coverage
Add documentation

This is an initial implementation and may change. Please feel free to share any feedback

May 04 '25 10:05 ibrahimcetin

This looks really good. I am interested to see how it hooks back to actually call the tool -- we pass the schemas in to the tokenizer but somehow we would have to hold on to the tools during the evaluation and make the calls (and I presume inject the output into the token stream?)

May 12 '25 20:05 davidkoski

@davidkoski, thanks for the feedback!

I'm currently working on tool call parsing and handling. The current implementation pauses generation when it detects a <tool_call> tag, and attempts to parse it into a ToolCall struct. This is handled by a new ToolCallProcessor which monitors the stream for a < character. If found, it starts buffering chunks until it reaches a </tool_call> tag. If the collected toolCallBuffer forms a valid tool call, it decodes it into a ToolCall and yields it as Generation.toolCall.

I've also introduced a new tool role to the Message model to represent tool call results.

For now, I’ve integrated a few basic tools into the LLMEval app to demonstrate the functionality: a weather tool, an add tool, and a time tool. (You can try them out, they are functional tools (the weather tool simply returns random weather info)). These are temporary. I'm planning to remove them and instead create a dedicated example app to showcase tool usage more clearly.

This PR is still a work in progress. The current implementation reflects my initial approach, and I’m open to rethinking parts of it based on feedback. I haven’t tested all edge cases yet, so there may still be issues to resolve. I’ll continue refining it over the week.

Let me know what you think!

May 12 '25 23:05 ibrahimcetin