spring-ai The streaming response does not take effect when configuring ChatClient Tools

The streaming response does not take effect when configuring ChatClient Tools

package com.ustc.myy.mcpclientserverdemo.config;

import org.springframework.ai.chat.client.ChatClient;
import org.springframework.ai.chat.client.advisor.MessageChatMemoryAdvisor;
import org.springframework.ai.chat.client.advisor.SimpleLoggerAdvisor;
import org.springframework.ai.chat.memory.InMemoryChatMemory;
import org.springframework.ai.ollama.OllamaChatModel;
import org.springframework.ai.tool.ToolCallbackProvider;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;

@Configuration
public class ChatClientConfig {

    private final OllamaChatModel ollamaChatModel;


    private final ToolCallbackProvider tools;

    @Autowired
    public ChatClientConfig(OllamaChatModel ollamaChatModel, ToolCallbackProvider tools) {
        this.ollamaChatModel = ollamaChatModel;
        this.tools = tools;
    }

    @Bean
    public ChatClient ollamaChatClient() {
        return ChatClient.builder(ollamaChatModel)
                .defaultSystem("你是一个可爱的助手，名字叫小糯米")
                .defaultTools(tools)
                .defaultAdvisors(new MessageChatMemoryAdvisor(new InMemoryChatMemory()),
                        new SimpleLoggerAdvisor())
                .build();
    }
}

package com.ustc.myy.mcpclientserverdemo.controller.ai;

import lombok.RequiredArgsConstructor;
import org.springframework.ai.chat.client.ChatClient;
import org.springframework.ai.chat.messages.UserMessage;
import org.springframework.ai.chat.model.ChatResponse;
import org.springframework.ai.chat.prompt.Prompt;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RequestParam;
import org.springframework.web.bind.annotation.RestController;
import reactor.core.publisher.Flux;

@RestController
@RequestMapping("/mcp-client-server-demo")
@RequiredArgsConstructor
public class ChatController {

    private final ChatClient chatClient;

//    @Autowired
//    public ChatController(ChatClient chatClient) {
//        this.chatClient = chatClient;
//    }

    @GetMapping("/ai/generate")
    public String generate(@RequestParam(value = "message", defaultValue = "给我讲一个笑话") String message) {
        return chatClient.prompt().user(message).call().content();
    }

    @GetMapping("/ai/generate-stream")
    public Flux<String> generateFlux(@RequestParam(value = "message", defaultValue = "给我讲一个笑话") String message) {
        return chatClient.prompt().user(message).stream().content();
    }
}

The ‘Flux<String>’ stream response here does not take effect

Apr 20 '25 15:04 yangyangmiao666

不起作用的具体情况是什么？

Apr 21 '25 08:04 yangtuooc

不起作用的具体情况是什么？

不能正常流式输出，只能一次性返回

Apr 21 '25 12:04 yangyangmiao666

不起作用的具体情况是什么？

不能正常流式输出，只能一次性返回

这似乎是Ollama的问题，参见：https://github.com/ollama/ollama/issues/9946

Apr 22 '25 03:04 yangtuooc

Thanks. Will investigate. We do have a streaming test with ollama in OllamaChatModelFunctionCallingIT but it isn't going through ChatClient, it is using the OllamaChatModel directly.

what model is you use with Ollama to uncover this issue? qwen2.5:3b?

Apr 22 '25 15:04 markpollack

Thanks. Will investigate. We do have a streaming test with ollama in OllamaChatModelFunctionCallingIT but it isn't going through ChatClient, it is using the OllamaChatModel directly.

what model is you use with Ollama to uncover this issue? qwen2.5:3b?

Hi, just wanted to share an observation that might help with the investigation.

I tested this behavior directly using Postman (without involving Spring AI), and noticed that when the request includes the tools field, streaming does not work. However, when I remove tools, streaming functions as expected. This seems to suggest that the issue may not be related to the framework itself, but possibly to how Ollama handles tool calls with streaming.

Apr 22 '25 15:04 yangtuooc

Thanks. Will investigate. We do have a streaming test with ollama in OllamaChatModelFunctionCallingIT but it isn't going through ChatClient, it is using the OllamaChatModel directly.

what model is you use with Ollama to uncover this issue? qwen2.5:3b?

In addition, I tried running the OllamaChatModelFunctionCallingIT test that was referenced earlier. Based on my observation, the response doesn’t appear to be streamed — instead, the full content is returned all at once in a single chunk.

I’ve included a screenshot to illustrate what I’m seeing.

Apr 22 '25 17:04 yangtuooc

Thanks. Will investigate. We do have a streaming test with ollama in OllamaChatModelFunctionCallingIT but it isn't going through ChatClient, it is using the OllamaChatModel directly.

what model is you use with Ollama to uncover this issue? qwen2.5:3b?

I discover this issue while using the qwen2.5:7b model.

Apr 23 '25 14:04 yangyangmiao666

Hello! I have the same issue at the moment!

Apr 30 '25 18:04 bruno-oliveira

As pointe above, this issue is likely related to Ollama I'm closing it because, IMO, there is noting we can do on the Spring AI side. Feel free to re-open it if https://github.com/ollama/ollama/issues/7886 get resolved but the issue is still presant.

May 07 '25 13:05 tzolov

I have the same issue

May 16 '25 07:05 JasonLonglong