agents icon indicating copy to clipboard operation
agents copied to clipboard

examples: added trigger-phrase agent example

Open s-hamdananwar opened this issue 1 year ago • 5 comments

s-hamdananwar avatar Sep 26 '24 22:09 s-hamdananwar

⚠️ No Changeset found

Latest commit: e09049f7a9a38f9284bf5c0d050d89752f5b337d

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

changeset-bot[bot] avatar Sep 26 '24 22:09 changeset-bot[bot]

  • it's a bit slow. worth looking into

I think it is mainly due to the 0.5 sec timeout set for the VAD, and maybe partly due to the computation that needs to happen on every END_OF_SPEECH event. I am not sure the best way to address them though. Since the primary goal of this example is to show the users a way to use transcribed words to trigger the LLM, I didn't go down the path of ensuring minimum possible latency like VoiceAssistant does.

  • semantically this should probably be inside the voice_assistant examples directory

Even though technically this is a voice assistant, since we are not using the VoiceAssistant class, I feel like it would be confusing and counter intuitive to the users if we placed in that directory and hence resorted to a stand alone example directory. What do you think?

s-hamdananwar avatar Oct 04 '24 00:10 s-hamdananwar

I think it is mainly due to the 0.5 sec timeout set for the VAD, and maybe partly due to the computation that needs to happen on every END_OF_SPEECH event.

in my testing i encountered closer to three or sometimes four seconds of silence before the response started playing. this doesn't need to be fully optimized as an example, but at this point it is hurting the effectiveness of the demo.

re: directory, disregard; did not notice this doesn't actually use VoicePipelineAgent.

nbsp avatar Oct 04 '24 05:10 nbsp

  • STT transcriptions is now added ✅
  • VAD is removed, both due to issues with adding StreamAdapter to Deepgram and also hopefully to reduce latency
  • first_participant constraint removed

s-hamdananwar avatar Oct 10 '24 04:10 s-hamdananwar

@s-hamdananwar this is how I was able to manage "multiple" participants in a single raise hand queue, check out the PR and let me know if this can help resolve the issue of still only listening to the first participant that joins the room.

PR

dsgolman avatar Oct 22 '24 19:10 dsgolman