The streaming response does not take effect when configuring ChatClient Tools
The streaming response does not take effect when configuring ChatClient Tools
package com.ustc.myy.mcpclientserverdemo.config;
import org.springframework.ai.chat.client.ChatClient;
import org.springframework.ai.chat.client.advisor.MessageChatMemoryAdvisor;
import org.springframework.ai.chat.client.advisor.SimpleLoggerAdvisor;
import org.springframework.ai.chat.memory.InMemoryChatMemory;
import org.springframework.ai.ollama.OllamaChatModel;
import org.springframework.ai.tool.ToolCallbackProvider;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
@Configuration
public class ChatClientConfig {
private final OllamaChatModel ollamaChatModel;
private final ToolCallbackProvider tools;
@Autowired
public ChatClientConfig(OllamaChatModel ollamaChatModel, ToolCallbackProvider tools) {
this.ollamaChatModel = ollamaChatModel;
this.tools = tools;
}
@Bean
public ChatClient ollamaChatClient() {
return ChatClient.builder(ollamaChatModel)
.defaultSystem("你是一个可爱的助手,名字叫小糯米")
.defaultTools(tools)
.defaultAdvisors(new MessageChatMemoryAdvisor(new InMemoryChatMemory()),
new SimpleLoggerAdvisor())
.build();
}
}
package com.ustc.myy.mcpclientserverdemo.controller.ai;
import lombok.RequiredArgsConstructor;
import org.springframework.ai.chat.client.ChatClient;
import org.springframework.ai.chat.messages.UserMessage;
import org.springframework.ai.chat.model.ChatResponse;
import org.springframework.ai.chat.prompt.Prompt;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RequestParam;
import org.springframework.web.bind.annotation.RestController;
import reactor.core.publisher.Flux;
@RestController
@RequestMapping("/mcp-client-server-demo")
@RequiredArgsConstructor
public class ChatController {
private final ChatClient chatClient;
// @Autowired
// public ChatController(ChatClient chatClient) {
// this.chatClient = chatClient;
// }
@GetMapping("/ai/generate")
public String generate(@RequestParam(value = "message", defaultValue = "给我讲一个笑话") String message) {
return chatClient.prompt().user(message).call().content();
}
@GetMapping("/ai/generate-stream")
public Flux<String> generateFlux(@RequestParam(value = "message", defaultValue = "给我讲一个笑话") String message) {
return chatClient.prompt().user(message).stream().content();
}
}
The ‘Flux<String>’ stream response here does not take effect
不起作用的具体情况是什么?
不起作用的具体情况是什么?
不能正常流式输出,只能一次性返回
不起作用的具体情况是什么?
不能正常流式输出,只能一次性返回
这似乎是Ollama的问题,参见:https://github.com/ollama/ollama/issues/9946
Thanks. Will investigate. We do have a streaming test with ollama in OllamaChatModelFunctionCallingIT but it isn't going through ChatClient, it is using the OllamaChatModel directly.
what model is you use with Ollama to uncover this issue? qwen2.5:3b?
Thanks. Will investigate. We do have a streaming test with ollama in
OllamaChatModelFunctionCallingITbut it isn't going through ChatClient, it is using the OllamaChatModel directly.what model is you use with Ollama to uncover this issue? qwen2.5:3b?
Hi, just wanted to share an observation that might help with the investigation.
I tested this behavior directly using Postman (without involving Spring AI), and noticed that when the request includes the tools field, streaming does not work. However, when I remove tools, streaming functions as expected. This seems to suggest that the issue may not be related to the framework itself, but possibly to how Ollama handles tool calls with streaming.
Thanks. Will investigate. We do have a streaming test with ollama in
OllamaChatModelFunctionCallingITbut it isn't going through ChatClient, it is using the OllamaChatModel directly.what model is you use with Ollama to uncover this issue? qwen2.5:3b?
In addition, I tried running the OllamaChatModelFunctionCallingIT test that was referenced earlier. Based on my observation, the response doesn’t appear to be streamed — instead, the full content is returned all at once in a single chunk.
I’ve included a screenshot to illustrate what I’m seeing.
Thanks. Will investigate. We do have a streaming test with ollama in
OllamaChatModelFunctionCallingITbut it isn't going through ChatClient, it is using the OllamaChatModel directly.what model is you use with Ollama to uncover this issue? qwen2.5:3b?
I discover this issue while using the qwen2.5:7b model.
Hello! I have the same issue at the moment!
As pointe above, this issue is likely related to Ollama I'm closing it because, IMO, there is noting we can do on the Spring AI side. Feel free to re-open it if https://github.com/ollama/ollama/issues/7886 get resolved but the issue is still presant.
I have the same issue