.Net: Python: Realtime API
Milestones
- [x] ADR reviewed and decision agreed: #10355
- [x] Preview implementation for Python: #10127
- [x] Getting started samples
- [x] Learn site documentation updated
- [x] Blog post
Description
Integrate a real-time audio API within the Semantic Kernel to allow seamless interaction with OpenAI GPT, Gemini, and Anthropic models. This API will enable applications to send live audio streams for processing and receive responses in real time, facilitating enhanced conversational agents and multimedia applications. Especially targeting contact center scenarios
Scenarios
- As a developer, I can integrate the real-time audio API into my SK application to interact with through voice commands including plugin and filter use.
- As a developer, I can configure the audio input and output settings for optimal real-time performance.
Requirements:
API Development:
- Design API endpoints capable of receiving live audio streams and connecting them to realtime audio endpoints
- Enable the API to support multiple audio formats (e.g., PCM, WAV, MP3).
- Allow customization of kernel parameters (e.g., temperature, response length).
Integration Support:
- Provide seamless integration capabilities with the Semantic Kernel's existing features like plugins and filters.
- Ensure compatibility with major programming languages and frameworks (e.g., Python, C#, Java).
- Documentation and Samples:
- Provide comprehensive API documentation, including usage guidelines, parameter descriptions, and example use cases.
- Create sample projects demonstrating integration with OpenAI, Gemini, and Anthropic models.
Would love to see the .NET Realtime API available for use!
@eavanvalkenburg and @markwallace-microsoft: Are you planning to support the new real-time API and model which have just been announced?
Links:
- https://platform.openai.com/docs/guides/realtime-websocket
- https://openai.com/index/introducing-gpt-realtime/
There have been some changes which must be adopted by SK:
- New API endpoint
- New websocket session events
- New objects in body
- New model name
@eavanvalkenburg and @markwallace-microsoft: Are you planning to support the new real-time API and model which have just been announced?
Links:
- https://platform.openai.com/docs/guides/realtime-websocket
- https://openai.com/index/introducing-gpt-realtime/
There have been some changes which must be adopted by SK:
- New API endpoint
- New websocket session events
- New objects in body
- New model name
I am waiting for this one as well!
+1 to the new gpt-realtime model. Also compatibility/extensibility to integrate with the new Azure Voice Live API (which is similar in events interface to OpenAI realtime API) would be desirable as per https://github.com/microsoft/semantic-kernel/issues/12291