OutputDatasetConfig.register_on_complete registers dataset if the step finish with error
While I was running a pipeline a step finished with Error: AzureMLCompute job failed. DiskFullError: Disk full while running job. Reduce amount of data accessed, or upgrade VM Sku.
As a result of this step I had defined an OutputDatasetConfig with the properties "as_upload" and "register_on_complete". What I was expecting was not to upload dataset neither register it because the step finished with error, so the output is not right, but the situation was that the dataset was upload an registered, and this implies that a tagged version of the dataset is corrupted.
I recommend not to register a dataset if the step finishes with an error that it's what I would expect from documentation.
Regards
Document Details
⚠ Do not edit this section. It is required for docs.microsoft.com ➟ GitHub issue linking.
- ID: 02631223-bb1d-f9de-2536-23d753c98508
- Version Independent ID: f524ca56-5419-b233-b67a-a1b3d10408e7
- Content: azureml.data.output_dataset_config.OutputDatasetConfig class - Azure Machine Learning Python
- Content Source: AzureML-Docset/stable/docs-ref-autogen/azureml-core/azureml.data.output_dataset_config.OutputDatasetConfig.yml
- Service: machine-learning
- Sub-service: core
- GitHub Login: @DebFro
- Microsoft Alias: debfro
+1
To upload data based on step successful completion (and not pipeline :)
Would be great to have it in docs to understand how a registration "register_on_complete" depends on step/pipeline status? Similar to, ex:
def as_upload(self, overwrite=False, source_globs=None):
"""Set the mode of the output to upload.
**For upload mode, files written to the output directory will be uploaded at the end of the job. If the job
fails or gets canceled, then the output directory will not be uploaded.**