semantic-kernel icon indicating copy to clipboard operation
semantic-kernel copied to clipboard

.Net: Bring Support for Azure OpenAI gpt-4o audio responses

Open Cobra86 opened this issue 9 months ago • 1 comments

Describe the bug When using GPT-4o audio-preview with Semantic Kernel, the audio response isn't return The code runs without errors, but no sound is returned from the AI response.

To Reproduce Steps to reproduce the behavior:

  1. Create a new .NET 9 project
  2. Install Microsoft.SemanticKernel v1.47.0
  3. Configure SDK with Azure OpenAI and the gpt-4o-audio-preview model
  4. Set up audio input from microphone using NAudio
  5. Request both text and audio responses with ChatResponseModalities.Text | ChatResponseModalities.Audio
  6. No Audio response and only text.

Expected behavior The AI should respond both with text (which works) and audio (which doesn't return).

Screenshots not applicable.

Platform

  • Language: C#
  • Source: NuGet package Microsoft.SemanticKernel version 1.47.0
  • AI model: Azure OpenAI gpt-4o-audio-preview
  • IDE: Visual Studio
  • OS: Windows

Additional context I followed the example from https://devblogs.microsoft.com/semantic-kernel/using-openais-audio-preview-model-with-semantic-kernel/ and implemented microphone input using NAudio. The text response works correctly.

Cobra86 avatar Apr 24 '25 19:04 Cobra86

I just wanted to add that it's working fine with OpenAI api

Cobra86 avatar Apr 24 '25 20:04 Cobra86

I can confirm this issue is still a problem. I'm using the same tutorial. I also tried gpt-4o-mini-audio-preview

SamArmand avatar Jun 11 '25 01:06 SamArmand

@RogerBarreto I've created PR to support the audio similar to the OpenAI #12523

Cobra86 avatar Jun 18 '25 11:06 Cobra86

This still doesn't work for me. No audio is being returned.

SamArmand avatar Jul 01 '25 04:07 SamArmand

This still doesn't work for me. No audio is being returned.

Could you please share your code ? I've tested it and it's working fine.

Make sure you set these values.

                AzureOpenAIPromptExecutionSettings settings = new AzureOpenAIPromptExecutionSettings();
#pragma warning disable SKEXP0010
                settings.Audio = new ChatAudioOptions(ChatOutputAudioVoice.Shimmer,ChatOutputAudioFormat.Wav);
                settings.Modalities = ChatResponseModalities.Text | ChatResponseModalities.Audio;
#pragma warning restore SKEXP0010

Cobra86 avatar Jul 02 '25 18:07 Cobra86

var builderAudio = Kernel.CreateBuilder().AddAzureOpenAIChatCompletion(
                configuration["AzureOpenAI:DeploymentNameAudio"], 
                configuration["AzureOpenAI:Endpoint"], 
                configuration["AzureOpenAI:ApiKey"]);

builderAudio.Services.AddLogging(services => services.AddConsole().SetMinimumLevel(LogLevel.Trace));

var kernelAudio = builderAudio.Build();
var chatCompletionServiceAudio = kernelAudio.GetRequiredService<IChatCompletionService>();
          
var openAiPromptExecutionSettingsAudio = new AzureOpenAIPromptExecutionSettings
{
    Audio = new ChatAudioOptions(
        ChatOutputAudioVoice.Shimmer, // Choose from available voices
        ChatOutputAudioFormat.Wav     // Choose output format
    ),
    Modalities = ChatResponseModalities.Audio | ChatResponseModalities.Text
};

var result = await chatCompletionServiceAudio.GetChatMessageContentAsync(
    chatHistory,
    executionSettings: openAiPromptExecutionSettingsAudio,
    kernel: kernelAudio);

GetChatMessageContentAsync throws the following exception:

An error occurred: HTTP 400 (invalid_request_error: invalid_value) Parameter: model

This model requires that either input content or output modality contain audio.
at Microsoft.SemanticKernel.Connectors.OpenAI.ClientCore.RunRequestAsync[T](Func`1 request) 
at Microsoft.SemanticKernel.Connectors.OpenAI.ClientCore.GetChatMessageContentsAsync(String targetModel, ChatHistory chatHistory, PromptExecutionSettings executionSettings, Kernel kernel, CancellationToken cancellationToken) 
at Microsoft.SemanticKernel.ChatCompletion.ChatCompletionServiceExtensions.GetChatMessageContentAsync(IChatCompletionService chatCompletionService, ChatHistory chatHistory, PromptExecutionSettings executionSettings, Kernel kernel, CancellationToken cancellationToken) 
at TestProjectApi.Services.OpenAiService.GenerateAnswerAsync(String query) in /Users/samarmand/Development/TestProject/TestProjectApi/Services/OpenAIService.cs:line 114

SamArmand avatar Jul 11 '25 13:07 SamArmand

Which AI model are you using ?

Cobra86 avatar Jul 12 '25 00:07 Cobra86

Which AI model are you using ?

never mind! Brain fart. It works great.

SamArmand avatar Jul 12 '25 14:07 SamArmand