Eleven Labs TTS Stream Doesn't return the text of the audio events being generated

Open andyprevalsky opened this issue 2 years ago • 0 comments

Just add this code to fix it please

          text = ''
          try:
              text = ''.join(msg['normalizedAlignment']['chars'])
          except Exception:
              pass

In this section of the code env/lib/python3.10/site-packages/livekit/plugins/elevenlabs/tts.py

LINE 286: msg = json.loads(msg.data)
if msg.get("audio"):
    data = base64.b64decode(msg["audio"])
    audio_frame = rtc.AudioFrame(
        data=data,
        sample_rate=self._config.sample_rate,
        num_channels=1,
        samples_per_channel=len(data) // 2,
    )
    text = ''
    try:
        text = ''.join(msg['normalizedAlignment']['chars'])
    except Exception:
        pass
    self._event_queue.put_nowait(
        tts.SynthesisEvent(
            type=tts.SynthesisEventType.AUDIO,
            audio=tts.SynthesizedAudio(text=text, data=audio_frame),
        )
    )

Apr 17 '24 22:04 andyprevalsky