OutputDatasetConfig.register_on_complete registers dataset if the step finish with error

Open javitovarv opened this issue 4 years ago • 1 comments

While I was running a pipeline a step finished with Error: AzureMLCompute job failed. DiskFullError: Disk full while running job. Reduce amount of data accessed, or upgrade VM Sku.

As a result of this step I had defined an OutputDatasetConfig with the properties "as_upload" and "register_on_complete". What I was expecting was not to upload dataset neither register it because the step finished with error, so the output is not right, but the situation was that the dataset was upload an registered, and this implies that a tagged version of the dataset is corrupted.

I recommend not to register a dataset if the step finishes with an error that it's what I would expect from documentation.

Regards

Document Details

⚠ Do not edit this section. It is required for docs.microsoft.com ➟ GitHub issue linking.

ID: 02631223-bb1d-f9de-2536-23d753c98508
Version Independent ID: f524ca56-5419-b233-b67a-a1b3d10408e7
Content: azureml.data.output_dataset_config.OutputDatasetConfig class - Azure Machine Learning Python
Content Source: AzureML-Docset/stable/docs-ref-autogen/azureml-core/azureml.data.output_dataset_config.OutputDatasetConfig.yml
Service: machine-learning
Sub-service: core
GitHub Login: @DebFro
Microsoft Alias: debfro

Jan 19 '22 13:01 javitovarv

To upload data based on step successful completion (and not pipeline :)

Would be great to have it in docs to understand how a registration "register_on_complete" depends on step/pipeline status? Similar to, ex:

def as_upload(self, overwrite=False, source_globs=None):
    """Set the mode of the output to upload.

    **For upload mode, files written to the output directory will be uploaded at the end of the job. If the job
    fails or gets canceled, then the output directory will not be uploaded.**

Feb 02 '23 11:02 WiktorHawrylik