NeMo-Guardrails icon indicating copy to clipboard operation
NeMo-Guardrails copied to clipboard

feat(actions): enable streaming in custom actions

Open niels-garve opened this issue 1 year ago • 4 comments

🚨 Updates in the discussions below

Fixes #646

Problem description

First of all, thanks for NeMo-Guardrails!

Given two consecutive actions. The first one is a custom RAG, and the second one analyzes the answer to render a disclaimer in case the answer is not grounded in the knowledge base. It is like fact-checking, but with streaming enabled. The bot should answer and finish like: "I learn something new every day, so my answers may not always be perfect."

Using streaming currently leads to two errors:

  1. The streaming handler finishes after the first action. The disclaimer is not streamed. This is because the variable streaming_finished_event is set, which in turn is caused by an empty chunk ("") that is passed to on_llm_new_token. The existing if statement checks for empty chunks, but only when they occur at the beginning. In our case, it happens at the end. I extended the check so that "" is never being processed.
  2. For downstream usage, the first action returns the final answer, which has also been streamed. When the action finishes, the accumulated result is emitted, which is why you end up with duplicate sentences in the result:
    Question: Hi
    
    Answer:
    I'm here to help with any questions you may have about the Employment Situation for April. Thanks for asking!
    I'm here to help with any questions you may have about the Employment Situation for April. Thanks for asking!I learn 
    something new every day, so my answers may not always be perfect.
    
    This one I solved in the _process function by adding an early return in case chunk == self.completion

How to test

I've added an example under examples/configs/rag/custom_rag_streaming which you can test like so:

$ export OPENAI_API_KEY='sk-xxx'
$ python -m nemoguardrails.__main__ chat --config /<path_to>/examples/configs/rag/custom_rag_streaming --streaming

Please also follow the README.md I've included.

I'm happy to hear your feedback!

@drazvan @mikeolubode

niels-garve avatar Sep 07 '24 17:09 niels-garve

Hello @drazvan @mikeolubode, I found a neat solution without altering the library. So, I'm just requesting my example be pulled. What do you think of the idea of using a local streaming handler that filters out and handles stream-stopping chunks ("" and None) while keeping the main streaming handler open?

niels-garve avatar Sep 09 '24 07:09 niels-garve

Update: the duplicate chunks as a result of streaming within an action and returning its result must still be handled

niels-garve avatar Sep 09 '24 16:09 niels-garve

Thanks for digging into this @niels-garve! Let me think some more about this. I think we need a cleaner way to signal that a message returned from an action has already been streamed. And somehow have support for this in bot say. Something along the lines:

(using Colang 2.0 syntax, as it might not be possible easily with Colang 1)

flow answer report question
  user said something
  $answer = wait RagAction()
  bot say text=$answer streamed=True
  $disclaimer = await DisclaimerAction()
  bot say text=$disclaimer streamed=True

drazvan avatar Sep 10 '24 09:09 drazvan

Thanks for digging into this @niels-garve! Let me think some more about this. I think we need a cleaner way to signal that a message returned from an action has already been streamed. And somehow have support for this in bot say. Something along the lines:

(using Colang 2.0 syntax, as it might not be possible easily with Colang 1)

flow answer report question
  user said something
  $answer = wait RagAction()
  bot say text=$answer streamed=True
  $disclaimer = await DisclaimerAction()
  bot say text=$disclaimer streamed=True

Thanks for your prompt reply, @drazvan ! I pushed another approach: what if we leverage the possibility that ActionResult is defined with an optional return value and return None while ensuring inter-action communication via context? With None we signal, we streamed.

I had to alter the library code, though; removing the fallback “I'm not sure what to say.” But I also see a chance of reworking this, as an English default reply blocks multi-language support. What do you think?

I like your Colang 2.0 approach, too. Could the "fact-checking" approach work for Colang 1.0?

flow answer report question
  user said something
  $do_streaming = True
  $answer = execute rag
  bot $answer

(I'll gladly squash the commits in the end; just wanted to keep history while discussing)

niels-garve avatar Sep 12 '24 04:09 niels-garve