Prevent agent from speaking when the user in speaking state
Bug Description
Issue is when User transitions speaking->listening, then after that agent listening -> thinking, user transitions listening -> speaking. But while user is speaking, the agent transitions from thinking -> speaking but immediately after 100ms after uttering a blip of audio like "No" "I" it goes back into listening. How do I prevent agent from speaking when the user is speaking already?
Expected Behavior
Prevent agent from speaking when the user is speaking already. Even if it is for 100 ms
Reproduction Steps
Current session configuration:
User Interaction:
- allow_interruptions: True
- min_interruption_duration: 0.5s
- min_interruption_words: 2
- interruption_backoff_seconds: 1.0s
Turn Detection:
- turn_detection: DISABLED (enabled: false)
- min_endpointing_delay: 0.5s
- max_endpointing_delay: 1.8s
- preemptive_generation: False
- preemptive_synthesis: False
Other extra configs for your reference
- preemptive_generation: False
- user_away_timeout: 5s
Provider Configurations:
VAD (Silero):
- min_speech_duration: 0.05s
- min_silence_duration: 0.55s
- activation_threshold: 0.5
EOU (livekit_dm):
- min_speech_duration: 0.6s
- min_silence_duration: 0.6s
- interruption_threshold: 0.5
- interruption_min_duration: 0.3s
STT (Deepgram):
- endpointing_ms: 25
- language: hi (Hindi)
- no_delay: true
TTS (ElevenLabs):
- streaming_latency: 0
- chunk_length_schedule: [80, 120, 200, 260]
LLM: gpt-4.1-mini (Azure EA), temp: 0.7
Operating System
Mac
Models Used
STT (Deepgram), VAD (Silero), EOU (livekit_dm):, TTS (ElevenLabs): LLM: gpt-4.1-mini (Azure EA),
Package Versions
livekit-agents = {version = "1.2.17", extras = ["baseten", "cartesia", "deepgram", "openai", "silero", "elevenlabs", "azure", "google", "turn-detector", "aws", "mcp", "assemblyai", "hume", "gladia", "groq", "cerebras", "sarvam"]}
livekit = "^1.0.1"
livekit-api = "^1.0.1"
livekit-plugins-noise-cancellation = "^0.2.5"
Session/Room/Call IDs
No response
Proposed Solution
Additional Context
No response
Screenshots and Recordings
No response
yeah I think we can try to wait for the VAD state before starting agent palyout if it detects user is speaking.