agents icon indicating copy to clipboard operation
agents copied to clipboard

Prevent agent from speaking when the user in speaking state

Open rathodsid opened this issue 2 months ago • 1 comments

Bug Description

Issue is when User transitions speaking->listening, then after that agent listening -> thinking, user transitions listening -> speaking. But while user is speaking, the agent transitions from thinking -> speaking but immediately after 100ms after uttering a blip of audio like "No" "I" it goes back into listening. How do I prevent agent from speaking when the user is speaking already?

Expected Behavior

Prevent agent from speaking when the user is speaking already. Even if it is for 100 ms

Reproduction Steps

Current session configuration:

  User Interaction:
  - allow_interruptions: True
  - min_interruption_duration: 0.5s
  - min_interruption_words: 2
  - interruption_backoff_seconds: 1.0s

  Turn Detection:
  - turn_detection: DISABLED (enabled: false)
  - min_endpointing_delay: 0.5s
  - max_endpointing_delay: 1.8s
  - preemptive_generation: False
  - preemptive_synthesis: False


Other extra configs for your reference

  - preemptive_generation: False
  - user_away_timeout: 5s
  Provider Configurations:

  VAD (Silero):
  - min_speech_duration: 0.05s
  - min_silence_duration: 0.55s
  - activation_threshold: 0.5

  EOU (livekit_dm):
  - min_speech_duration: 0.6s
  - min_silence_duration: 0.6s
  - interruption_threshold: 0.5
  - interruption_min_duration: 0.3s

  STT (Deepgram):
  - endpointing_ms: 25
  - language: hi (Hindi)
  - no_delay: true

  TTS (ElevenLabs):
  - streaming_latency: 0
  - chunk_length_schedule: [80, 120, 200, 260]

  LLM: gpt-4.1-mini (Azure EA), temp: 0.7

Operating System

Mac

Models Used

STT (Deepgram), VAD (Silero), EOU (livekit_dm):, TTS (ElevenLabs): LLM: gpt-4.1-mini (Azure EA),

Package Versions

livekit-agents = {version = "1.2.17", extras = ["baseten", "cartesia", "deepgram", "openai", "silero", "elevenlabs", "azure", "google", "turn-detector", "aws", "mcp", "assemblyai", "hume", "gladia", "groq", "cerebras", "sarvam"]}
livekit = "^1.0.1"
livekit-api = "^1.0.1"
livekit-plugins-noise-cancellation = "^0.2.5"

Session/Room/Call IDs

No response

Proposed Solution


Additional Context

No response

Screenshots and Recordings

No response

rathodsid avatar Nov 21 '25 10:11 rathodsid

yeah I think we can try to wait for the VAD state before starting agent palyout if it detects user is speaking.

longcw avatar Nov 24 '25 08:11 longcw