whitebox Copy information of a model from MLFlow

Description

Since MLFlow is an industry standard and a lot of people use it, it makes sense that whitebox integrates with it and uses it as a data store, or something similar providing missing functionality in the monitoring field of MLOps

Dec 04 '22 15:12 momegas

@stavrostheocharis @gcharis @NickNtamp @sinnec

Here are some thoughts about the implementation of an MLFlow integration. Give your feedback with numbers as below, please. This will help me have other views because I gave a lot of thought to this, I think I'm short-sighted now.

We need a way for users to copy information of a model from MLFlow. This can be directly implemented in our SDK.
Since we save a lot of info similar to MLFlow we could migrate our database to a (non-accessible) MLFlow instance running behind Whitebox. This will save a lot of implementation time (artefacts, S3, integrations, etc) since we can reuse MLFlow's integrations. Also, we can use exp tracking for saving our time series as well.
If we go with number 2, would it make sense to give the user the ability to use an existing MLFlow instance? This would result in having both the monitoring and experiments in one dashboard (did some tests there. It's not as ugly as i thought).

Jan 03 '23 14:01 momegas

Bump 👋

Jan 12 '23 14:01 momegas

Bump 💥

Jan 20 '23 22:01 momegas

Aren't we talking about an mlflow plugin? It makes sense to me

https://mlflow.org/docs/latest/plugins.html

Jan 24 '23 17:01 gcharis

MLFlow plugins are for mlflow to integrate with other tools, right? We need the opposite, I guess. Whitebox should be getting data from mlflow

Jan 25 '23 07:01 momegas

Some thoughts also from me regarding the above points:

Correct. I don't know if they or we have to copy something, but in any case we need a way of using the client's models which are stocked under the MLflow.
I don't know about the optimal implementation in terms of databases, but yes, I totally agree that we have to take advantage of the data which are created from the MLflow.
Generally, my thought is that we can have Whitebox as an "extension" of MLflow, taking advantage of all the data which are used and created there!

Jan 25 '23 07:01 NickNtamp

After all these days I think we only need number 1 btw

Jan 25 '23 08:01 momegas

For anyone taking this issue, lets go with number one option for now:

We need a way for users to copy information of a model from MLFlow. This can be directly implemented in our SDK.

I propose a method in the SDK that requests the model from MLFlow. Take into account that we may need to point the SDK to MLFlow. Renaming this issue to something more relevant

Feb 17 '23 10:02 momegas

I think @NickNtamp can share some thoughts on that since he is going to check a bit Mlflow

Feb 17 '23 12:02 stavrostheocharis

Based on my investigation till now mlflow is capable of saving/tracking the followings per different experiments:

artifacts: Mainly requirements in a yaml file and dependencies. Here we can find also the model in a pkl format.
metrics: Metrics regarding the experiment. Metrics could be either custom or standardized based on libraries (sklearn etc.)
params: The combination of hyperparameters used for the experiment
tags: Various files that are consumed by mlflow api (versions, timestamps etc.)
some metadata in a yaml file

Based on my knowledge the only thing (at least for now) that can be used is number 2 by replacing the functions which calculate the evaluation metrics here from row 65 and below.

Feb 21 '23 18:02 NickNtamp