azure-cli-extensions ModuleNotFoundError when deploying Azure ML model to online endpoint

Describe the bug

I am trying to deploy my Azure ML model to an online endpoint hosted on Azure, as described in the documentation https://learn.microsoft.com/en-us/azure/machine-learning/how-to-deploy-online-endpoints?view=azureml-api-2&tabs=azure-cli. However, I'm failing to import one of my files that should be included in the code repository where the scoring script is located. The logic for inferencing is quite complicated, more so than in the above linked documentation, so additional files must be included apart from just the scoring script.

Related command

az ml online-deployment create -f azure/deployment.yml --resource-group <omitted> --workspace-name <omitted>

I have deployed the exact same scoring script, and everything else the same, locally and it works perfectly fine. The issue is only when trying to deploy to my Azure endpoint.

Errors

The deployment fails with ResourceNotReady. Upon inspection of the logs in Azure Studio, the error is:

2023-09-29 19:00:24,282 E [65] azmlinfsrv - Traceback (most recent call last): File "/azureml-envs/azureml_de0eccae1d7809be16f4cee8f4e60c52/lib/python3.10/site-packages/azureml_inference_server_http/server/user_script.py", line 77, in load_script main_module_spec.loader.exec_module(user_module) File "", line 883, in exec_module File "", line 241, in _call_with_frames_removed File "/var/azureml-app/rebel-core-deployment/scoring_script.py", line 8, in from utils.post_processing import post_process ModuleNotFoundError: No module named 'utils.post_processing'; 'utils' is not a package

The directory structure is as follows:

root:

__init__.py
scoring_script.py
utils/
- __init__.py
- post_processing.py
models/
- __init__.py
- StrengthNet.py
- RateNet.py
azure/
- deployment.yml

Issue script & Debug output

My scoring script is as follows:

import json
import pandas as pd
import torch

from models.RateNet import RateNet
from models.StrengthNet import StrengthNet
from utils.post_processing import post_process

strength_net = None
rate_net = None


def init():
    global config
    global strength_net
    global rate_net
    
    # Initialization
    # Omitted for business sensitivity


def predict(data, metadata):
    # Omitted ...

    return predictions


def run(request):
    request = json.loads(request)
    data = torch.tensor(request["data"])
    metadata = torch.tensor(request["metadata"])

    # Omitted

    predictions = predict(data, metadata)

    return predictions.tolist()

Note that the first two imports work, which are also local files. The program only crashes on the utils package.

Deployment YAML file:

$schema: https://azuremlschemas.azureedge.net/latest/managedOnlineDeployment.schema.json
name: deployment-name-omitted
endpoint_name: endpoint-name-omitted
model: azureml:ModelNameOmitted:2
code_configuration:
  code: ../
  scoring_script: scoring_script.py
environment: azureml:env-name-omitted:14
instance_type: Standard_DS4_v2
instance_count: 1

Expected behavior

The expected behaviour is that the deployment is successful, and that the utils.post_processing file is successfully imported. When deployed locally, with the --local tag, things work fine. Additionally, the deployment successfully imports from files in the models directory, so there should logically not be any difference to the utils directory.

Environment Summary

azure-cli 2

Additional context

No response

Sep 29 '23 20:09 haraldger

Thank you for opening this issue, we will look into it.

Sep 29 '23 20:09 yonzhan

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @azureml-github.

Sep 29 '23 20:09 microsoft-github-policy-service[bot]

Thanks for the feedback.

Could you confirm that the dependency missing at run time is indeed included in the environment specified in the deployment?

from utils.post_processing import post_process

Sep 29 '23 20:09 hugoaponte

Hi @hugoaponte , thanks for your response.

So maybe this is where I've done something wrong or misunderstood. The utils.post_processing dependency is not part of the environment specification, but rather one of my own files that's located in a child directory to the scoring script. (I've outlined the directory structure in the OP) I assumed that the yml specification for the deployment, which specifies the code location as well as the scoring script location within that code location, makes it so that the deployment knows where to find the code necessary? (I also posted my yml specification above where this code_configuration is shown, and it's aware of the directory structure).

Maybe this is what I've got wrong. However, I don't know how the environment, which is registered in Azure ML Studio, could include this directory and file as a dependency, when this code is only local to this specific deployment? They're not pip packages or anything like that, they're just my own python files

Sep 29 '23 20:09 haraldger

Got it. Thanks for clarifying.

If the package is not published, then it can't be added to the environment, you are right.

You could perfectly have the code in the folder specified for your code. It should work.

Could you double check the script "post_processing.py"?

If the issue was the folder structure, your scoring script should fail right when trying to import from "models".

Sep 29 '23 20:09 hugoaponte

@hugoaponte

After double checking, I can't find anything that seems wrong with it. I have double checked its name, which is indeed post_processing, and the function name, confirmed is post_process, and finally double checked that the directory is called utils. So the import line should work. And besides, like I mentioned above, the script works when I deploy locally.

So the import itself works when I deploy locally, but not when deploying to Azure. So it's probably nothing wrong with the referencing or naming. Again, this makes me think directory structure, but like you mentioned importing from the models directory works. So perhaps its something specific to the utils directory, for whatever reason. I have doubled checked that there is an __init__.py file in there, so that shouldn't be the issue. I'm at loss here - I can't find anything wrong on my end, and it works when making a local deployment. I think that perhaps the utils directory is not getting transferred/uploaded to Azure correctly.

Oct 02 '23 13:10 haraldger

As an additional to the above, I found in blob storage the specific uploaded code for this deployment, and what's strange is that the code and the utils package does seem to be included. So now I'm left even more confused as to why it can't recognize utils as a package. Pic 1 shows the root folder, pic 2 shows inside the utils folder.

Oct 02 '23 13:10 haraldger

Hi, I came across a similar issue and this is what I was able to figure out.

Given your file structure:

Your "source_directory" is called "root", your inference config should look like this inference_config = InferenceConfig(runtime = "python", source_directory="root", entry_script = "score.py", conda_file = "deploymentEnv.yml")

Now all your file imports need to use root as the parent directly and need to reference from that, so your import would have to look like:

from root.utils.post_processing import post_process

And the import should work fine.

Dec 19 '23 07:12 shrey-c