semantic-kernel Python: azure ai service model invoke fails with no_model

Code snippet

import os
import asyncio
from semantic_kernel.contents.chat_history import ChatHistory
from semantic_kernel.connectors.ai.prompt_execution_settings import PromptExecutionSettings
from semantic_kernel.connectors.ai.azure_ai_inference import AzureAIInferenceChatCompletion
from semantic_kernel.connectors.ai.azure_ai_inference import AzureAIInferenceChatPromptExecutionSettings
from azure.ai.inference import ChatCompletionsClient
from azure.core.credentials import AzureKeyCredential
from dotenv import load_dotenv

load_dotenv("../.env")

llm = AzureAIInferenceChatCompletion(
    endpoint=os.getenv("AZUREAI_INFERENCE_ENDPOINT"), # "https://{myresource}.services.ai.azure.com/models"
    api_key=os.getenv("AZUREAI_ENDPOINT_KEY"),
    ai_model_id="phi-4",
    service_id="phi-4",
)

# execution_settings = AzureAIInferenceChatPromptExecutionSettings(
#     max_tokens=100,
#     temperature=0.5,
#     top_p=0.9,
#     service_id="phi-4",
#     model_id="phi-4",
#     # extra_parameters={...},    # model-specific parameters
# )

execution_settings = PromptExecutionSettings(model="phi-4")

async def main():
    chat_history = ChatHistory(messages=[{"role": "user", "content": "Hello"}])
    
    response = await llm.get_chat_message_content(chat_history, 
                                                  execution_settings,
                                                  headers={"x-ms-model-mesh-model-name": "phi-4"},
                                                  )
    print(response)

if __name__ == "__main__":
    asyncio.run(main())

Error details

Traceback (most recent call last):
  File "/afh/projects/aiproj01-ea1be05d-2ca9-4751-94bd-64549ebf820f/shared/Users/pupanda/ai-foundation-models/ai-inference/phi/phi4-sk.py", line 41, in <module>
    asyncio.run(main())
  File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
    return future.result()
  File "/afh/projects/aiproj01-ea1be05d-2ca9-4751-94bd-64549ebf820f/shared/Users/pupanda/ai-foundation-models/ai-inference/phi/phi4-sk.py", line 34, in main
    response = await llm.get_chat_message_content(chat_history,
  File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/semantic_kernel/connectors/ai/chat_completion_client_base.py", line 197, in get_chat_message_content
    results = await self.get_chat_message_contents(chat_history=chat_history, settings=settings, **kwargs)
  File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/semantic_kernel/connectors/ai/chat_completion_client_base.py", line 142, in get_chat_message_contents
    return await self._inner_get_chat_message_contents(chat_history, settings)
  File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/semantic_kernel/connectors/ai/azure_ai_inference/services/azure_ai_inference_chat_completion.py", line 127, in _inner_get_chat_message_contents
    response: ChatCompletions = await self.client.complete(
  File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/azure/ai/inference/aio/_patch.py", line 670, in complete
    raise HttpResponseError(response=response)
azure.core.exceptions.HttpResponseError: (no_model_name) No model specified in request. Please provide a model name in the request body or as a x-ms-model-mesh-model-name header.
Code: no_model_name
Message: No model specified in request. Please provide a model name in the request body or as a x-ms-model-mesh-model-name header.
Unclosed client session
client_session: <aiohttp.client.ClientSession object at 0x7fb32ae6c850>
Unclosed connector
connections: ['deque([(<aiohttp.client_proto.ResponseHandler object at 0x7fb322838640>, 2558.922774265)])']
connector: <aiohttp.connector.TCPConnector object at 0x7fb32ae6c9a0>

This error of no_model_name comes from azure-ai-inference in SK. SK wrapper is not able to pass model vaue to ChatcompletionClient. So, when client.complete() tried, it just says model name info is missing.

It looks like a bug. Does anyone have any insights if thing can be addressed from client side?

Jan 29 '25 16:01 PurnaChandraPanda

@TaoChenOSU any thoughts on this?

Jan 29 '25 23:01 moonbox3

Hi @PurnaChandraPanda,

The ai_model_id doesn't point the connector to the specified model. The ai_model_id is a parameter users can use for identifying what model the connector is created with in the case of AzureAIInference.

For Azure AI Inference, your endpoint should point directly to your model. It should look something like this: https://Phi-4-xxxxx.southcentralus.models.ai.azure.com if you're using serverless compute.

Jan 30 '25 23:01 TaoChenOSU

Hello @TaoChenOSU, @moonbox3

Thank you for the response.

I am using the "azure ai model service endpoint" in ai foundry - https://learn.microsoft.com/en-us/azure/ai-foundry/model-inference/how-to/inference?tabs=python#using-the-routing-capability-in-the-azure-ai-model-inference-endpoint. As per which, the base_url would appear as https://{myresource}.services.ai.azure.com/models.

Could you please try the same in your end? You would notice the same error as me.

On ai-model-id, i just passed the model name here, which is phi-4 in my case. Shall i pass a different value here otherwise?

Feb 01 '25 16:02 PurnaChandraPanda

Hello,

I'm experiencing the same issue as @PurnaChandraPanda, I am following the instructions mentioned in the url, in my case I am using DeepSeek-R1.

Has there been any update on this?

thanks!

Feb 04 '25 15:02 amphitekne

Hi, also having this issue with the AI Inference SDK.

I a puliing the model endpoint directly from the Azue AI Foundry using the "Get Endpoint". I think ideally this should be the endpoint I need for the connector vs a different URL endpoint.

Feb 05 '25 14:02 koreyspace

Tried an older version without of the Azure Inference SDK - azure-ai-inference==1.0.0b5 without any luck.

The inference endpoint structure that @TaoChenOSU suggested looks different now as @PurnaChandraPanda suggested.

When using AI Foundry - https://.services.ai.azure.com/models and this when using GH Models: https://models.inference.ai.azure.com/ .

I can't seem to find where the "https://model-name.region.models.ai.azure.com/" endpoint comes from. W

When I try to use that structure I get - Content: {"status": "Serverless endpoint not found"}.

Feb 05 '25 17:02 koreyspace

Hi,

There are multiple model deployment methods on AI Foundry.

I was referring to the model-as-a-service deployment type (a.k.a serverless endpoint). You can read more about it here. This deployment type doesn't require a model id in the request as the endpoint itself only serves one model.

What @PurnaChandraPanda is using is a relatively new deployment type. In fact, it's still a preview feature on AI Foundry. It allows developers to link an Azure AI Service resource to an AI Foundry project. Once linked, future model deployments will be accessible from the AI service resource via an endpoint and a model id: https://learn.microsoft.com/en-us/azure/ai-foundry/model-inference/how-to/inference?tabs=python#using-the-routing-capability-in-the-azure-ai-model-inference-endpoint.

This PR addresses the issue: https://github.com/microsoft/semantic-kernel/pull/10427

When using a serverless endpoint, the ai_model_id is not used. When using an AI service endpoint, the ai_model_id will be used to route the request to the specified model.

Feb 05 '25 19:02 TaoChenOSU

@TaoChenOSU - this was supposed to be fixed with 1.21 release? any updates? this is still broken..

Feb 18 '25 00:02 ManojBableshwar

@TaoChenOSU - this was supposed to be fixed with 1.21 release? any updates? this is still broken..

Yes, it has been fixed with 1.21 release. Are you still seeing issues?

Mar 04 '25 23:03 TaoChenOSU

hey @TaoChenOSU,

I encounter the same issue. Let me share some insights from my side. I hope you can suggest a solution.

MODEL_NAME = "gpt-4o"
OAI_API_KEY = "<redacted>"
OAI_ENDPOINT = "<redacted>"

QUERY_FOR_SUMMARIZER = "Extract the main action items from the meeting minutes"
QUERY_FOR_CODE_OPTIMIZER = """
Review this recursive Fibonacci implementation and suggest an iterative alternative
"""

def build_manager() -> SKSimpleManagerAgent:
    kernel = Kernel()
    kernel.add_service(
        service=AzureAIInferenceChatCompletion(
            ai_model_id=MODEL_NAME,
            api_key=OAI_API_KEY,
            endpoint=OAI_ENDPOINT,
        )
    )

    agents = [
        ChatCompletionAgent(
            description="Condensed summarizer",
            id=uuid4().hex,
            instructions="Return key facts only.",
            kernel=kernel,
            name="summarizer-agent",
        ),
        ChatCompletionAgent(
            description="Python optimizer",
            id=uuid4().hex,
            instructions="Optimize code for performance/readability.",
            kernel=kernel,
            name="code-optimizer-agent",
        ),
    ]

    # This is a wrapper for creating a `GroupChatAgent` as follows;
    # self._manager = AgentGroupChat(
    #        agents=self._subordinate_agents,
    #        selection_strategy=self._selection_strategy,
    #        termination_strategy=self._termination_strategy,
    #    )
    return SKSimpleManagerAgent(
        model_name=MODEL_NAME,
        api_key=OAI_API_KEY,
        endpoint=OAI_ENDPOINT,
        subordinate_agents=agents,
        enable_langfuse=False,
        enable_telemetry=False,
    )


async def main() -> NoReturn:
    """
    async def select_agent(self, message: str) -> str:
        # Query the question
        await self._manager.add_chat_message(
            message=ChatMessageContent(role=AuthorRole.USER, content=message)
        )

        # Use a selection strategy to pick the agent
        selected = await self._selection_strategy.select_agent(
            agents=self._subordinate_agents, history=self._manager.history
        )
        return selected.name
    """
    mgr = build_manager()
    for q in (QUERY_FOR_SUMMARIZER, QUERY_FOR_CODE_OPTIMIZER):
        await mgr.select_agent(message=q)

Observed: INFO Telemetry inactive. Using standard AzureAIInferenceChatCompletion. INFO Initialized SKManagerAgentBase with 2 agents. DEBUG Selected Agent: summarizer-agent DEBUG Selected Agent: code-optimizer-agent

Unclosed client session client_session: <aiohttp.client.ClientSession object at 0x...> Unclosed connector connections: ['deque([(<aiohttp.client_proto.ResponseHandler object at 0x...>, ...)])'] connector: <aiohttp.connector.TCPConnector object at 0x...>

Environment:

python 3.12
azure-ai-inference 1.0.0b9
semantic-kernel 1.23.0

May 06 '25 14:05 anu43

I'm facing similar issue with a much simpler and bare bones implementation. Just wanted to hook up AzureOpenAi with my azure - deployment . The code is fairly simple.

load_dotenv()

azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT")
azure_key = os.getenv("AZURE_OPENAI_API_KEY")
azure_deployment_name = os.getenv("AZURE_OPENAI_DEPLOYMENT")
azure_api_version = os.getenv("AZURE_OPENAI_API_VERSION")

print(azure_endpoint, azure_key, azure_deployment_name, azure_api_version)

if not all([azure_endpoint, azure_key, azure_deployment_name, azure_api_version]):
    raise EnvironmentError("Env vars not set correctly.")

llm = AzureChatOpenAI(
    temperature=0.2,
    azure_deployment=azure_deployment_name,
    api_version=azure_api_version,
)

Upon execution, I am getting the following error -

Error processing user prompt: Error code: 400 - {'error': {'code': 'no_model_name', 'message': 'No model specified in request. Please provide a model name in the request body or as a x-ms-model-mesh-model-name header.', 'details': 'No model specified in request. Please provide a model name in the request body or as a x-ms-model-mesh-model-name header.'}}

This is completely unexpected as even the LangChain docs don't mention the model_name parameter. What's even more baffling is that when I pass model_name parameter, like this

llm = AzureChatOpenAI(
    temperature=0.2,
    azure_deployment=azure_deployment_name,
    api_version=azure_api_version,
    model_name="gpt-4o"
)

The python lsp/linter tells me this is not a recognised parameter, but apparently AzureChatOpenAi still recieves this and spits out the following error -

Error processing user prompt: Error code: 400 - {'error': {'code': 'unknown_model', 'message': 'Unknown model: gpt-4o', 'details': 'Unknown model: gpt-4o'}}

This is weird because in my azure deployment I can clearly see that the model is gpt-4o itself.

Any ideas how to move forward? I do not wish to use other providers like Groq, etc.

My system - Python 3.12 Langchain 0.3.25 langchain_openai 0.3.17

Any help or guidance would be appreciated.

Thanks

May 19 '25 09:05 mehulambastha

@mehulambastha

Use "model" key instead of "model_name". It will work.

        llm_config = {
            "deployment_name": self.settings.azure_openai_chat_deployment,
            "api_key": self.settings.azure_openai_api_key,
            "azure_endpoint": self.settings.azure_openai_endpoint,
            "api_version": self.settings.azure_openai_api_version,
            "temperature": 0.1,
            "max_tokens": 4000,
            "model": self.settings.azure_openai_chat_deployment
        }

        print(llm_config)

        return AzureChatOpenAI(**llm_config)

May 26 '25 17:05 yashness

Hi @mehulambastha , @yashness -- this is an issue in the Semantic Kernel repo. I am wondering why we've brought in issues related to LangChain? Can you please file the issue in the appropriate repo and keep this in the scope of Semantic Kernel? Thank you.

May 27 '25 00:05 moonbox3

hey @TaoChenOSU,

I encounter the same issue. Let me share some insights from my side. I hope you can suggest a solution.

MODEL_NAME = "gpt-4o" OAI_API_KEY = "" OAI_ENDPOINT = ""

QUERY_FOR_SUMMARIZER = "Extract the main action items from the meeting minutes" QUERY_FOR_CODE_OPTIMIZER = """ Review this recursive Fibonacci implementation and suggest an iterative alternative """

def build_manager() -> SKSimpleManagerAgent: kernel = Kernel() kernel.add_service( service=AzureAIInferenceChatCompletion( ai_model_id=MODEL_NAME, api_key=OAI_API_KEY, endpoint=OAI_ENDPOINT, ) )
agents = [
    ChatCompletionAgent(
        description="Condensed summarizer",
        id=uuid4().hex,
        instructions="Return key facts only.",
        kernel=kernel,
        name="summarizer-agent",
    ),
    ChatCompletionAgent(
        description="Python optimizer",
        id=uuid4().hex,
        instructions="Optimize code for performance/readability.",
        kernel=kernel,
        name="code-optimizer-agent",
    ),
]

# This is a wrapper for creating a `GroupChatAgent` as follows;
# self._manager = AgentGroupChat(
#        agents=self._subordinate_agents,
#        selection_strategy=self._selection_strategy,
#        termination_strategy=self._termination_strategy,
#    )
return SKSimpleManagerAgent(
    model_name=MODEL_NAME,
    api_key=OAI_API_KEY,
    endpoint=OAI_ENDPOINT,
    subordinate_agents=agents,
    enable_langfuse=False,
    enable_telemetry=False,
)
async def main() -> NoReturn: """ async def select_agent(self, message: str) -> str: # Query the question await self._manager.add_chat_message( message=ChatMessageContent(role=AuthorRole.USER, content=message) )
    # Use a selection strategy to pick the agent
    selected = await self._selection_strategy.select_agent(
        agents=self._subordinate_agents, history=self._manager.history
    )
    return selected.name
"""
mgr = build_manager()
for q in (QUERY_FOR_SUMMARIZER, QUERY_FOR_CODE_OPTIMIZER):
    await mgr.select_agent(message=q)
Observed: INFO Telemetry inactive. Using standard AzureAIInferenceChatCompletion. INFO Initialized SKManagerAgentBase with 2 agents. DEBUG Selected Agent: summarizer-agent DEBUG Selected Agent: code-optimizer-agent

Unclosed client session client_session: <aiohttp.client.ClientSession object at 0x...> Unclosed connector connections: ['deque([(<aiohttp.client_proto.ResponseHandler object at 0x...>, ...)])'] connector: <aiohttp.connector.TCPConnector object at 0x...>

Environment:

python 3.12

azure-ai-inference 1.0.0b9

semantic-kernel 1.23.0

Hi @anu43!

Besides the unclosed session warning, are there other errors that you observed?

Jun 05 '25 15:06 TaoChenOSU

No, I haven't @TaoChenOSU. I only encounter this with the AzureAIInferenceChatCompletion.

Jun 11 '25 14:06 anu43

Python: azure ai service model invoke fails with no_model_name error code in sk python

Code snippet

Error details