TEN-Agent icon indicating copy to clipboard operation
TEN-Agent copied to clipboard

Supports OpenAI's TTS and STT APIs

Open zhanghx0905 opened this issue 1 year ago • 4 comments

I'm wondering if the project currently supports OpenAI's TTS and STT APIs, or if there are any plans to integrate them.

zhanghx0905 avatar Oct 11 '24 07:10 zhanghx0905

the one realtime api use? or a separate one.

plutoless avatar Oct 16 '24 21:10 plutoless

the one realtime api use? or a separate one.

Separate TTS and STT api,

  • TTS: https://platform.openai.com/docs/guides/text-to-speech/overview
  • STT: https://platform.openai.com/docs/guides/speech-to-text

zhanghx0905 avatar Oct 26 '24 13:10 zhanghx0905

@zhanghx0905 openai's STT/TTS is not stream based, they can only process files. so they are not that ideal in realtime cases.

plutoless avatar Nov 11 '24 02:11 plutoless

@zhanghx0905 openai's STT/TTS is not stream based, they can only process files. so they are not that ideal in realtime cases.

You may take a look at the livekit-agent GitHub repository. I tried their OpenAI plugin and adapt it to Chinese. I found it works just like a streaming service.

By the way, I have locally deployed TTS (Text-to-Speech) / STT (Speech-to-Text) services. In order to integrate them into applications compatible with the OpenAI API, I wrapped them in the OpenAI API format. Therefore, I hope you will also consider these APIs.

zhanghx0905 avatar Nov 13 '24 15:11 zhanghx0905