autogen [Issue]: Documentation around streaming and websockets

Describe the issue

The docs don't make it clear how streaming is tied to websockets. And I don't see any examples addressing this either.

Say I want to build a Chat UI application that makes use of autogen agents. I want to stream the final agent's response to the user through a UI.

When we stream from the final, are we streaming from the LLM response directly? Or do we wait for the entire response to be generated and then stream it?

In essence if you pass stream=True parameter for OpenAI for example, what is happening internally?

Steps to reproduce

No response

Screenshots and logs

No response

Additional Information

No response

Sep 27 '24 16:09 vikigenius

Hello @vikigenius . I had this same issue too, I want to stream the final agent's response to the user through a UI. Unfortunately, as the code is currently written, the "silent=True" parameter works on printing, but not on streaming. I ended up having to write my own custom classes to disable the streaming for agents until the last message.

Sep 27 '24 19:09 Tylersuard

I have successfully built a multi-agent chat application for fastapi web using the latest version of autogen. I have used another way to implement the UI pattern of multi-agent chat, and at the same time, I can also implement streaming output generated by agents internally or even tool calls. Give it to the UI front end. But maybe this method is not what the official wants us to do. Do you want to refer to it?

Sep 29 '24 07:09 SuMiaoALi

@SuMiaoALi If possible could you please share your implementation? I'm trying to build something similar. Thanks

Oct 03 '24 07:10 EnGassa

Closing this as native UI integration support is not planned for v0.2.

We are working on v0.4 (preview) release, which may include documentation for UI integration. Please head to Discussion to discuss.

Oct 13 '24 04:10 ekzhu

@SuMiaoALi If possible could you please share your implementation? I'm trying to build something similar. Thanks如果可能的话您能分享一下您的实现吗？我正在尝试构建类似的东西。谢谢

@SuMiaoALi If possible could you please share your implementation? I'm trying to build something similar. Thanks

I apologize for only seeing your message now. Do you still need assistance?

Oct 21 '24 10:10 SuMiaoALi

Hi, no worries. If you could share your sample code, that would be great. Thank you!

Oct 21 '24 12:10 EnGassa

我的思路是，将UserProxyAgent和ConversableAgent都在本地缓存起来，例如保存在dict中，每次聊天的时候根据用户标识取出对应的agents进行聊天，这样聊天缓存会在agents里保持记忆。对于内容输入和大模型输出的问题，由于我采用的是群聊方式，这里我就展示一下群聊的取出模型回复的方式。对于nested或sequence聊天，从ChatResult中取出回复应该就可以。

async def chat(req: ChatReq, token_data: TokenData = Depends(get_bearer_token)):
    """
    统一聊天入口
    """
    token = token_data.token
    kunlun_token = token_data.kunlun_token
    _message = req.message
    _dept_id = req.dept_id
    logger.info(f'用户提问： token: {token}, dept_id: {_dept_id}, message: {_message}')
    # 缓存昆仑token
    await cache_kunlun_token(token, kunlun_token, _dept_id)
    # 意图识别
    user_intent = await analyze_user_intent(_dept_id, _message, kunlun_token, token)
    msg = await do_direct_stream(user_intent, token=token, query=_message)
    if isinstance(msg, ContentIter):
        return StreamingResponse(msg, media_type='text/event-stream')

    # 获取业务agent
    ui_user_agent: UIUserProxyAgent = await get_user_agent(token, kunlun_token, _dept_id, user_intent)
    msg: ContentIter = await ui_user_agent.a_do_initiate_chat(_message)
    return StreamingResponse(msg, media_type='text/event-stream')


class UIUserProxyAgent(UserProxyAgent):
    """
    业务代理
    """
    # 封装聊天消息
    user_props: UserProps
    # 聊天接收人，这里设计为群聊
    recipient: GroupChatManager
    # 是否流式
    stream: bool

    def groupchat_messages(self) -> list[dict]:
            return self.recipient.groupchat.messages

    async def a_do_initiate_chat(
            self,
            message: str,
            max_turns: Optional[int] = 1
    ) -> ContentIter:
        """
        发起聊天并缓存本次聊天记录
        :param max_turns:
        :param message: 用户消息
        :return:
        """
        # await self.a_do_resume(message)

        prompt_placeholder = self.user_props.prompt_placeholder()
        q_chat_message = {'content': self.user_props.format_message(message), 'role': 'user', 'name': self.name}

        loop = asyncio.get_running_loop()
        result: ChatResult = await loop.run_in_executor(None,
                                                        functools.partial(self.do_initiate_chat,
                                                                          **{'message': q_chat_message,
                                                                             'max_turns': max_turns}))

我会有不同业务模块，由不同的群聊GroupChatManager负责管理内部群聊并实现模块工具调用等内部能力。每次用户的http请求过来，先分析用户的意图，找到不同意图的UserAgent和GroupChat缓存，然后调用agent的initial_chat进行turns=1的聊天，目前是这样处理的。没有采用IOStream来实现，我觉得那样比较复杂。而是通过单次聊天，获取大模型的最终输出，这个输出可以是流式或非流式的。这个方式和autogen的Console聊天是一样的，都是一次完整的User和Assistant对话后，阻塞等待用户的下一次输入。由于我的agents主要是进行工具调用，所以我可以将工具调用的流式结果也通过流式响应给前端。但是如果是Agent直接的回复，目前还是没有办法流式，因为autogen的Agent聊天无法流式结果，但由于这个速度很快，所以其实影响不是很大。

以上是我的实现思路，已经在我所在的公司的生产环境部署上线，是经过验证可以满足企业生产应用的。希望能对你有所帮助。

Oct 22 '24 02:10 SuMiaoALi