graphrag Added streaming output support for local search & global search

Description

This PR introduces changes to enable streaming output in the LLM's generate method, as well as in the search methods of both LocalSearch and GlobalSearch classes. Intended to facilitate web application development or any other scenarios where streaming and real-time display of generated content is required.

Related Issues

none

Proposed Changes

Updated the generate method's return type to Generator[str, None, str] of BaseLLM and ChatOpenAI and implement it.
Updated the search method's return type to Generator[str, None, SearchResult] of LocalSearch, GlobalSearch and BaseSearch, and implement it.
Added a '--streaming' flag to the cli, allows users to specify whether they want to enable streaming output for local and global search.

Checklist

[x] I have tested these changes locally.
[x] I have reviewed the code changes.
[x] I have updated the documentation (if necessary).
[x] I have added appropriate unit tests (if applicable).

Additional Notes

A caveat with these changes is that when a method contains a yield statement, the Python interpreter automatically defines it as a generator function, meaning that it always returns a generator. If streaming is not needed, it becomes less straightforward to directly retrieve the final result, as you must use a try-except block to catch the StopIteration exception and extract the value from it. This can lead to less elegant code.

If necessary, the streaming functionality could be refactored into separate methods to maintain clarity and separation of concerns.

Aug 09 '24 12:08 6ixGODD

@6ixGODD please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the following information.
@microsoft-github-policy-service agree [company="{your company}"]
Options:

(default - no company specified) I have sole ownership of intellectual property rights to my Submissions and I am not making Submissions in the course of work for my employer.
@microsoft-github-policy-service agree
(when company given) I am making Submissions in the course of work for my employer (or my employer has intellectual property rights in my Submissions by contract or applicable law). I have permission from my employer to make Submissions and enter into this Agreement on behalf of my employer. By signing below, the defined term “You” includes me and my employer.
@microsoft-github-policy-service agree company="Microsoft"
Contributor License Agreement

@microsoft-github-policy-service agree

Aug 09 '24 12:08 6ixGODD

@6ixGODD This is great work. You caught us in the middle of defining an API layer between the CLI and library code. We just merged a PR that more explicitly defines the query API.

I agree with the separation of concerns issue. Can you update/refactor to use this existing query API and if there needs to be separate functions to maintain clarity, we're open to adding new streaming functions to the API.

Aug 13 '24 03:08 jgbradley1

@6ixGODD This is great work. You caught us in the middle of defining an API layer between the CLI and library code. We just merged a PR that more explicitly defines the query API.

I agree with the separation of concerns issue. Can you update/refactor to use this existing query API and if there needs to be separate functions to maintain clarity, we're open to adding new streaming functions to the API.

I've made the corresponding changes:

I kept the --streaming command-line argument, which controls whether the output is streamed.
I modified the two APIs in api.py by adding conditional logic to decide whether to use streaming output, and included a ⬤ as a progress indicator (free to remove it if it looks out of place).
I added stream_generate and astream_generate methods to the BaseLLM abstract class for synchronous and asynchronous streaming response generation, respectively, and implemented them in the ChatOpenAI class (I also added a few private methods, following the original code style).
I added an astream_search method to the BaseSearch abstract class for streaming search, and implemented it in both GlobalSearch and LocalSearch classes.
I made the necessary updates in the cli.py to support the --streaming option.

With these changes, apart from adding the streaming parameter and a few additional methods, I have made minimal modifications to the original code.

Aug 13 '24 15:08 6ixGODD

there's on_llm_new_token callback, why don't you use this method?

Aug 14 '24 10:08 KylinMountain

there's on_llm_new_token callback, why don't you use this method?

@KylinMountain I missed the on_llm_new_token callback😂 I check the code and youre right. But from an application development perspective, whether using websocket, sse, or others, typically needs to be a way to YIELD chunks of data.. I figured why not just handle it directly in the class instead of decoupling it. right?

Aug 14 '24 11:08 6ixGODD

@6ixGODD is right. The callback approach does work but it's quite messy for an external system to use and could be prone to errors that you might not think about (when considering dropped messages/tokens) depending on the use case you're targeting.

For example, in the solution accelerator, we had to implement some custom data structures and logic in order to create a proper streaming web API that would work like most programmatic clients expect (i.e. using the requests library to iterate over response chunks).

Adding a generator function to the API is much cleaner for everyone to use (my opinion). I copied this fork and made a new PR (under review) that cleaned up several other things that were needed in order to expose a proper streaming API as a generator function.

Aug 19 '24 18:08 jgbradley1

@6ixGODD is right. The callback approach does work but it's quite messy for an external system to use and could be prone to errors that you might not think about (when considering dropped messages/tokens) depending on the use case you're targeting.

For example, in the solution accelerator, we had to implement some custom data structures and logic in order to create a proper streaming web API that would work like most programmatic clients expect (i.e. using the requests library to iterate over response chunks).

Adding a generator function to the API is much cleaner for everyone to use (my opinion). I copied this fork and made a new PR (under review) that cleaned up several other things that were needed in order to expose a proper streaming API as a generator function.

@jgbradley1 Completely agree😄I saw your new PR, and I am honored to have the possibility of contributing to this repository.

Aug 20 '24 01:08 6ixGODD