feat(actions): enable streaming in custom actions
🚨 Updates in the discussions below
Fixes #646
Problem description
First of all, thanks for NeMo-Guardrails!
Given two consecutive actions. The first one is a custom RAG, and the second one analyzes the answer to render a disclaimer in case the answer is not grounded in the knowledge base. It is like fact-checking, but with streaming enabled. The bot should answer and finish like: "I learn something new every day, so my answers may not always be perfect."
Using streaming currently leads to two errors:
- The streaming handler finishes after the first action. The disclaimer is not streamed. This is because the variable
streaming_finished_eventis set, which in turn is caused by an empty chunk ("") that is passed toon_llm_new_token. The existing if statement checks for empty chunks, but only when they occur at the beginning. In our case, it happens at the end. I extended the check so that""is never being processed. - For downstream usage, the first action returns the final answer, which has also been streamed. When the action finishes, the accumulated result is emitted, which is why you end up with duplicate sentences in the result:
This one I solved in theQuestion: Hi Answer: I'm here to help with any questions you may have about the Employment Situation for April. Thanks for asking! I'm here to help with any questions you may have about the Employment Situation for April. Thanks for asking!I learn something new every day, so my answers may not always be perfect._processfunction by adding an early return in casechunk == self.completion
How to test
I've added an example under examples/configs/rag/custom_rag_streaming which you can test like so:
$ export OPENAI_API_KEY='sk-xxx'
$ python -m nemoguardrails.__main__ chat --config /<path_to>/examples/configs/rag/custom_rag_streaming --streaming
Please also follow the README.md I've included.
I'm happy to hear your feedback!
@drazvan @mikeolubode
Hello @drazvan @mikeolubode, I found a neat solution without altering the library. So, I'm just requesting my example be pulled. What do you think of the idea of using a local streaming handler that filters out and handles stream-stopping chunks ("" and None) while keeping the main streaming handler open?
Update: the duplicate chunks as a result of streaming within an action and returning its result must still be handled
Thanks for digging into this @niels-garve!
Let me think some more about this. I think we need a cleaner way to signal that a message returned from an action has already been streamed. And somehow have support for this in bot say. Something along the lines:
(using Colang 2.0 syntax, as it might not be possible easily with Colang 1)
flow answer report question
user said something
$answer = wait RagAction()
bot say text=$answer streamed=True
$disclaimer = await DisclaimerAction()
bot say text=$disclaimer streamed=True
Thanks for digging into this @niels-garve! Let me think some more about this. I think we need a cleaner way to signal that a message returned from an action has already been streamed. And somehow have support for this in
bot say. Something along the lines:(using Colang 2.0 syntax, as it might not be possible easily with Colang 1)
flow answer report question user said something $answer = wait RagAction() bot say text=$answer streamed=True $disclaimer = await DisclaimerAction() bot say text=$disclaimer streamed=True
Thanks for your prompt reply, @drazvan ! I pushed another approach: what if we leverage the possibility that ActionResult is defined with an optional return value and return None while ensuring inter-action communication via context? With None we signal, we streamed.
I had to alter the library code, though; removing the fallback “I'm not sure what to say.” But I also see a chance of reworking this, as an English default reply blocks multi-language support. What do you think?
I like your Colang 2.0 approach, too. Could the "fact-checking" approach work for Colang 1.0?
flow answer report question
user said something
$do_streaming = True
$answer = execute rag
bot $answer
(I'll gladly squash the commits in the end; just wanted to keep history while discussing)