MachineLearningNotebooks icon indicating copy to clipboard operation
MachineLearningNotebooks copied to clipboard

OutputDatasetConfig.register_on_complete registers dataset if the step finish with error

Open javitovarv opened this issue 4 years ago • 1 comments

While I was running a pipeline a step finished with Error: AzureMLCompute job failed. DiskFullError: Disk full while running job. Reduce amount of data accessed, or upgrade VM Sku.

As a result of this step I had defined an OutputDatasetConfig with the properties "as_upload" and "register_on_complete". What I was expecting was not to upload dataset neither register it because the step finished with error, so the output is not right, but the situation was that the dataset was upload an registered, and this implies that a tagged version of the dataset is corrupted.

I recommend not to register a dataset if the step finishes with an error that it's what I would expect from documentation.

Regards


Document Details

Do not edit this section. It is required for docs.microsoft.com ➟ GitHub issue linking.

javitovarv avatar Jan 19 '22 13:01 javitovarv

+1

To upload data based on step successful completion (and not pipeline :)

Would be great to have it in docs to understand how a registration "register_on_complete" depends on step/pipeline status? Similar to, ex:

def as_upload(self, overwrite=False, source_globs=None):
    """Set the mode of the output to upload.

    **For upload mode, files written to the output directory will be uploaded at the end of the job. If the job
    fails or gets canceled, then the output directory will not be uploaded.**

WiktorHawrylik avatar Feb 02 '23 11:02 WiktorHawrylik