azure-sdk-for-python icon indicating copy to clipboard operation
azure-sdk-for-python copied to clipboard

[Azure ML SDK v2] Issue while registering new data asset and then getting it back right after registration

Open glebrh opened this issue 3 years ago • 1 comments

  • Package Name: azure-ai-ml
  • Package Version: 1.0.0
  • Operating System: Windows Server 2022 Standard
  • Python Version: 3.9.13

Describe the bug When I try to register a new data asset using MLClient.data.create_or_update() and then try to get newly registered entity by name using MLClient.data.get() right after creation, it throws me an error like:

ValidationException                       Traceback (most recent call last)
---> 12 registered_data_asset = ml_client.data.get(name='new_dataset_name', label='latest')

File ...\site-packages\azure\ai\ml\operations\_data_operations.py:135, in DataOperations.get(self, name, version, label)
    126     raise ValidationException(
    127         message=msg,
    128         target=ErrorTarget.DATA,
   (...)
    131         error_type=ValidationErrorType.INVALID_VALUE,
    132     )
    134 if label:
--> 135     return _resolve_label_to_asset(self, name, label)
    137 if not version:
    138     msg = "Must provide either version or label."

File ...\site-packages\azure\ai\ml\_utils\_asset_utils.py:797, in _resolve_label_to_asset(assetOperations, name, label)
    790     msg = "Asset {} with version label {} does not exist in workspace."
    791     raise ValidationException(
    792         message=msg.format(name, label),
    793         no_personal_data_message=msg.format("[name]", "[label]"),
    794         target=ErrorTarget.ASSET,
...
    700         error_type=ValidationErrorType.RESOURCE_NOT_FOUND,
    701     )
    702 return latest

ValidationException: Asset new_dataset_name does not exist in workspace workspace_name.

However, after couple of seconds get method will work. Seems that data asset creation is asynchronous and there is a small time lag between data asset creation and ability to get this data asset from the workspace.

CLI version works fine, but it az ml data create... takes significantly more time than SDK version.

To Reproduce Steps to reproduce the behavior:

dataset = Data(
            path='azureml://datastores/...',
            type='uri_folder',
            description='Test',
            name='new_dataset_name',
        )
dataset = ml_client.data.create_or_update(dataset)

registered_data_asset = ml_client.data.get(name='new_dataset_name', label='latest')

Expected behavior The last command in the sequence should finish successfully and return registered dataset details.

glebrh avatar Nov 10 '22 17:11 glebrh

Label prediction was below confidence level 0.6 for Model:ServiceLabels: 'Service Bus:0.11162622,Storage:0.06302417,Tables:0.0483478'

azure-sdk avatar Nov 10 '22 17:11 azure-sdk

@azureml-github

xiangyan99 avatar Nov 14 '22 16:11 xiangyan99

@glebrh thanks for reporting this. We're working on a fix now, in the mean time you can add a small (1s) sleep between create and get

derekehyatt avatar Nov 29 '22 22:11 derekehyatt

Hi @glebrh. Thank you for opening this issue and giving us the opportunity to assist. We believe that this has been addressed. If you feel that further discussion is needed, please add a comment with the text “/unresolve” to remove the “issue-addressed” label and continue the conversation.

ghost avatar Dec 01 '22 20:12 ghost

Hi @glebrh, since you haven’t asked that we “/unresolve” the issue, we’ll close this out. If you believe further discussion is needed, please add a comment “/unresolve” to reopen the issue.

ghost avatar Dec 08 '22 22:12 ghost