Chenghao Mou
Chenghao Mou
This ensures we have silence pauses in the recording by: - tracking pause and resume wall times - insert silences once playback is finished Image: two false interruptions and pauses
- New barge-in detector under inference - Two stream implementation: - HTTP endpoints for working with hosted model - WS for working with gateway proxy Detailed spec can be found...
This should close #4407 Tested with the following setup. ```python } session = AgentSession( ... llm=google.LLM( model="gemini-2.5-flash", vertexai=True, location="global", # retrieval_config=types.RetrievalConfig( # lat_lng=types.LatLng(latitude=53.350140, longitude=-6.266155) # ), # or retrieval_config={ "lat_lng":...
Tested the ambiguous ones: "gemini-2.0-flash-exp" and "gemini-live-2.5-flash-preview-native-audio". Both still work with the right setting, but are not mentioned in any official docs or change logs.
- Add batch recognition flag in STT capabilities - Added manual workflow to test a PR/branch/revision - Updated tests to support all STT vendors except two of them: ```python #...
This should close #4413 What happened: - VAD received audio frames, changing user stage to speaking; - Uninterruptible speech created, discarding audio frames for both STT and VAD. User state...
- Update the user speech start time to include the VAD speech duration; - Audio capture in output is now synced with the audio source capture, based on @longcw's PR...
Things not compatible with 3.14: - [onnxruntime](https://github.com/microsoft/onnxruntime/issues/26547), probably Jan next year [roadmap](https://onnxruntime.ai/roadmap) This closes #3618