modelstore icon indicating copy to clipboard operation
modelstore copied to clipboard

Storing model parameters, lineage data and maybe even pre- or post-processing methods?

Open nlathia opened this issue 3 years ago • 2 comments

Adding this issue here for visibility (I received it via email 📥 ):


Currently, our models are deployed by being baked into a docker image, partly because of legacy, but also because our experiment tracking system is not stable enough to be always up, so it is nice to just dump the model into the docker image and then not have the need to connect to multiple services a serving time. So basically I just need to be able to dump my model onto the filesystem.

But, what constitutes a model? If I use this package I would only be able to dump the model file, but information about how the input and output should be interpreted/transformed for this specific model will not be there. This information is often a subset of the training parameters.

Also information for traceability would be nice: Eg. (1.) training run ID (MLFlow run id in my case) + (2.) Epoch and/or step information.

Lastly, in my abstract understanding of what a model really is, I would say that it would even be nice to be able to package pre- and post-processing functions (a model class actually) with the model as well, but this is a different discussion :)

nlathia avatar Jun 24 '22 17:06 nlathia

Thanks for adding this. Not sure why I wasn't able to do so, maybe I wasn't loggede in properly, but at least now I am able to comment. So let continue the conversation here.

You replied my email, but I think you misunderstood me. You wrote a lot about Docker, but my issue is more about storing a model on disk with all the necessary information needed to load/use the model, which is not currently possible with this package in my opinion.

elgehelge avatar Jun 25 '22 10:06 elgehelge

@elgehelge Thanks, apologies for the misunderstanding. Could you share an example of what you would like modelstore to save?

One example of where this works is if you save an sklearn pipeline (which can include preprocessing steps) rather than just an sklearn model. Another example is that saving a PyTorch Lightning model saves the model class name in the meta data, so that modelstore can ensure the custom class is present when loading the data.

In other cases, modelstore does not do much outside of saving the "model" itself- any example you can provide will help!

nlathia avatar Jun 26 '22 11:06 nlathia