.Net: Bring Support for Azure OpenAI gpt-4o audio responses
Describe the bug When using GPT-4o audio-preview with Semantic Kernel, the audio response isn't return The code runs without errors, but no sound is returned from the AI response.
To Reproduce Steps to reproduce the behavior:
- Create a new .NET 9 project
- Install Microsoft.SemanticKernel v1.47.0
- Configure SDK with Azure OpenAI and the gpt-4o-audio-preview model
- Set up audio input from microphone using NAudio
- Request both text and audio responses with ChatResponseModalities.Text | ChatResponseModalities.Audio
- No Audio response and only text.
Expected behavior The AI should respond both with text (which works) and audio (which doesn't return).
Screenshots not applicable.
Platform
- Language: C#
- Source: NuGet package Microsoft.SemanticKernel version 1.47.0
- AI model: Azure OpenAI gpt-4o-audio-preview
- IDE: Visual Studio
- OS: Windows
Additional context I followed the example from https://devblogs.microsoft.com/semantic-kernel/using-openais-audio-preview-model-with-semantic-kernel/ and implemented microphone input using NAudio. The text response works correctly.
I just wanted to add that it's working fine with OpenAI api
I can confirm this issue is still a problem. I'm using the same tutorial. I also tried gpt-4o-mini-audio-preview
@RogerBarreto I've created PR to support the audio similar to the OpenAI #12523
This still doesn't work for me. No audio is being returned.
This still doesn't work for me. No audio is being returned.
Could you please share your code ? I've tested it and it's working fine.
Make sure you set these values.
AzureOpenAIPromptExecutionSettings settings = new AzureOpenAIPromptExecutionSettings();
#pragma warning disable SKEXP0010
settings.Audio = new ChatAudioOptions(ChatOutputAudioVoice.Shimmer,ChatOutputAudioFormat.Wav);
settings.Modalities = ChatResponseModalities.Text | ChatResponseModalities.Audio;
#pragma warning restore SKEXP0010
var builderAudio = Kernel.CreateBuilder().AddAzureOpenAIChatCompletion(
configuration["AzureOpenAI:DeploymentNameAudio"],
configuration["AzureOpenAI:Endpoint"],
configuration["AzureOpenAI:ApiKey"]);
builderAudio.Services.AddLogging(services => services.AddConsole().SetMinimumLevel(LogLevel.Trace));
var kernelAudio = builderAudio.Build();
var chatCompletionServiceAudio = kernelAudio.GetRequiredService<IChatCompletionService>();
var openAiPromptExecutionSettingsAudio = new AzureOpenAIPromptExecutionSettings
{
Audio = new ChatAudioOptions(
ChatOutputAudioVoice.Shimmer, // Choose from available voices
ChatOutputAudioFormat.Wav // Choose output format
),
Modalities = ChatResponseModalities.Audio | ChatResponseModalities.Text
};
var result = await chatCompletionServiceAudio.GetChatMessageContentAsync(
chatHistory,
executionSettings: openAiPromptExecutionSettingsAudio,
kernel: kernelAudio);
GetChatMessageContentAsync throws the following exception:
An error occurred: HTTP 400 (invalid_request_error: invalid_value) Parameter: model
This model requires that either input content or output modality contain audio.
at Microsoft.SemanticKernel.Connectors.OpenAI.ClientCore.RunRequestAsync[T](Func`1 request)
at Microsoft.SemanticKernel.Connectors.OpenAI.ClientCore.GetChatMessageContentsAsync(String targetModel, ChatHistory chatHistory, PromptExecutionSettings executionSettings, Kernel kernel, CancellationToken cancellationToken)
at Microsoft.SemanticKernel.ChatCompletion.ChatCompletionServiceExtensions.GetChatMessageContentAsync(IChatCompletionService chatCompletionService, ChatHistory chatHistory, PromptExecutionSettings executionSettings, Kernel kernel, CancellationToken cancellationToken)
at TestProjectApi.Services.OpenAiService.GenerateAnswerAsync(String query) in /Users/samarmand/Development/TestProject/TestProjectApi/Services/OpenAIService.cs:line 114
Which AI model are you using ?
Which AI model are you using ?
never mind! Brain fart. It works great.