modelstore icon indicating copy to clipboard operation
modelstore copied to clipboard

Feature: Allow adding additional information to metadata on model upload

Open sourcehawk opened this issue 3 years ago • 3 comments

There is to my knowledge no straight forward way of retrieving additional data sent on model upload other than downloading the entire artifact and knowing the exact name of the file that it was stored in. It would be nice to be able to add additional information to the model metadata when uploading a new model in order to have direct access to any important information needed for further processing of models.

This could be an optional parameter to the upload method which provides an easy way to add something to the metadata. This could accept a python dictionary and would then be placed in the metadata under a specific key such as "extra".

Use case


# Custom information that a user wants to have available as metadata when calling `get_model_info`
important_info = {
    'required_columns': ["yay", "nay"],
    'data_transforms': ["std", "mean"],
    'training_data_marker': {
        'index_column': 'some_id',
        'index_value': 'some_value',
    },
    'replication_storage_information': {
        "actual_creation_date": "2021-11-23T10:10:23",
        "archived_date": "2022-1-14T12:14:23",
    }
}

metadata = model_store.upload(
       domain="my-domain", 
       state_name="archived", 
       model=lr_model, 
       extra_metadata=important_info
)

print(metadata)
>> 
{
    'model': {
        'domain': {...}, 
        'data': {...}, 
        'storage': {...},
        'code': {...}, 
        'git': {...}, 
        'extra': {
            'required_columns': ["yay", "nay"],
            'data_transforms': ["std", "mean"],
            'training_data_marker': {
                'index_column': 'some_id',
                'index_value': 'some_value',
            },
            'replication_storage_information': {
                "actual_creation_date": "2021-11-23T10:10:23",
                "archived_date": "2022-1-14T12:14:23",
            }
        }
    }
}

The extra parameter would have to be validated which could be done by checking whether the object is json serializable in the update method

if extra_metadata:
    try:
        json.dumps(extra_metadata)
    except Exception:
       raise ValueError("extra_metadata field must be json serializable")

The value of the field could be defaulted to an empty dict i.e 'extra': {} and should not break any existing functionality.

Any opinions on this?

sourcehawk avatar Jun 11 '22 23:06 sourcehawk

Great idea! I've wanted to do this for some time, and you suggesting it might just be the motivation I needed 😄

I'm currently in the middle of moving the meta data implementation to use dataclasses:

  • https://github.com/operatorai/modelstore/pull/178
  • https://github.com/operatorai/modelstore/pull/182

Once that is done, I can definitely add this in and bundle it all together for the next release 🙌

nlathia avatar Jun 12 '22 10:06 nlathia

👋 @hauks96 I've now added this in, and so it will go out with the next release. Thank you for the suggestion, and if you have any more ideas feel free to open more issues or reach out to me directly!

  • https://github.com/operatorai/modelstore/pull/185

nlathia avatar Jun 18 '22 13:06 nlathia

@nlathia Brilliant! Thank you so much, looking forward to use it 😄

sourcehawk avatar Jun 18 '22 15:06 sourcehawk

✅ This was released as part of modelstore==0.0.75

  • https://github.com/operatorai/modelstore/pull/201

Let me know if you see any other issues!

nlathia avatar Sep 08 '22 14:09 nlathia