semantic-kernel icon indicating copy to clipboard operation
semantic-kernel copied to clipboard

.Net: Python: Realtime API

Open markwallace-microsoft opened this issue 1 year ago • 1 comments

Milestones

  • [x] ADR reviewed and decision agreed: #10355
  • [x] Preview implementation for Python: #10127
  • [x] Getting started samples
  • [x] Learn site documentation updated
  • [x] Blog post

Description

Integrate a real-time audio API within the Semantic Kernel to allow seamless interaction with OpenAI GPT, Gemini, and Anthropic models. This API will enable applications to send live audio streams for processing and receive responses in real time, facilitating enhanced conversational agents and multimedia applications. Especially targeting contact center scenarios

Scenarios

  • As a developer, I can integrate the real-time audio API into my SK application to interact with through voice commands including plugin and filter use.
  • As a developer, I can configure the audio input and output settings for optimal real-time performance.

Requirements:

API Development:

  • Design API endpoints capable of receiving live audio streams and connecting them to realtime audio endpoints
  • Enable the API to support multiple audio formats (e.g., PCM, WAV, MP3).
  • Allow customization of kernel parameters (e.g., temperature, response length).

Integration Support:

  • Provide seamless integration capabilities with the Semantic Kernel's existing features like plugins and filters.
  • Ensure compatibility with major programming languages and frameworks (e.g., Python, C#, Java).
  • Documentation and Samples:
  • Provide comprehensive API documentation, including usage guidelines, parameter descriptions, and example use cases.
  • Create sample projects demonstrating integration with OpenAI, Gemini, and Anthropic models.

markwallace-microsoft avatar Jan 06 '25 14:01 markwallace-microsoft

Would love to see the .NET Realtime API available for use!

Chryogenic avatar Jul 01 '25 12:07 Chryogenic

@eavanvalkenburg and @markwallace-microsoft: Are you planning to support the new real-time API and model which have just been announced?

Links:

  • https://platform.openai.com/docs/guides/realtime-websocket
  • https://openai.com/index/introducing-gpt-realtime/

There have been some changes which must be adopted by SK:

  • New API endpoint
  • New websocket session events
  • New objects in body
  • New model name

marvinbuss avatar Sep 02 '25 07:09 marvinbuss

@eavanvalkenburg and @markwallace-microsoft: Are you planning to support the new real-time API and model which have just been announced?

Links:

  • https://platform.openai.com/docs/guides/realtime-websocket
  • https://openai.com/index/introducing-gpt-realtime/

There have been some changes which must be adopted by SK:

  • New API endpoint
  • New websocket session events
  • New objects in body
  • New model name

I am waiting for this one as well!

r4hulp avatar Sep 02 '25 07:09 r4hulp

+1 to the new gpt-realtime model. Also compatibility/extensibility to integrate with the new Azure Voice Live API (which is similar in events interface to OpenAI realtime API) would be desirable as per https://github.com/microsoft/semantic-kernel/issues/12291

jjgriff93 avatar Sep 03 '25 11:09 jjgriff93