whitebox icon indicating copy to clipboard operation
whitebox copied to clipboard

Copy information of a model from MLFlow

Open momegas opened this issue 3 years ago • 10 comments

Description

Since MLFlow is an industry standard and a lot of people use it, it makes sense that whitebox integrates with it and uses it as a data store, or something similar providing missing functionality in the monitoring field of MLOps

momegas avatar Dec 04 '22 15:12 momegas

@stavrostheocharis @gcharis @NickNtamp @sinnec

Here are some thoughts about the implementation of an MLFlow integration. Give your feedback with numbers as below, please. This will help me have other views because I gave a lot of thought to this, I think I'm short-sighted now.

  1. We need a way for users to copy information of a model from MLFlow. This can be directly implemented in our SDK.
  2. Since we save a lot of info similar to MLFlow we could migrate our database to a (non-accessible) MLFlow instance running behind Whitebox. This will save a lot of implementation time (artefacts, S3, integrations, etc) since we can reuse MLFlow's integrations. Also, we can use exp tracking for saving our time series as well.
  3. If we go with number 2, would it make sense to give the user the ability to use an existing MLFlow instance? This would result in having both the monitoring and experiments in one dashboard (did some tests there. It's not as ugly as i thought).

momegas avatar Jan 03 '23 14:01 momegas

Bump 👋

momegas avatar Jan 12 '23 14:01 momegas

Bump 💥

momegas avatar Jan 20 '23 22:01 momegas

Aren't we talking about an mlflow plugin? It makes sense to me

https://mlflow.org/docs/latest/plugins.html

gcharis avatar Jan 24 '23 17:01 gcharis

MLFlow plugins are for mlflow to integrate with other tools, right? We need the opposite, I guess. Whitebox should be getting data from mlflow

momegas avatar Jan 25 '23 07:01 momegas

Some thoughts also from me regarding the above points:

  1. Correct. I don't know if they or we have to copy something, but in any case we need a way of using the client's models which are stocked under the MLflow.
  2. I don't know about the optimal implementation in terms of databases, but yes, I totally agree that we have to take advantage of the data which are created from the MLflow.
  3. Generally, my thought is that we can have Whitebox as an "extension" of MLflow, taking advantage of all the data which are used and created there!

NickNtamp avatar Jan 25 '23 07:01 NickNtamp

After all these days I think we only need number 1 btw

momegas avatar Jan 25 '23 08:01 momegas

For anyone taking this issue, lets go with number one option for now:

  1. We need a way for users to copy information of a model from MLFlow. This can be directly implemented in our SDK.

I propose a method in the SDK that requests the model from MLFlow. Take into account that we may need to point the SDK to MLFlow. Renaming this issue to something more relevant

momegas avatar Feb 17 '23 10:02 momegas

I think @NickNtamp can share some thoughts on that since he is going to check a bit Mlflow

stavrostheocharis avatar Feb 17 '23 12:02 stavrostheocharis

Based on my investigation till now mlflow is capable of saving/tracking the followings per different experiments:

  1. artifacts: Mainly requirements in a yaml file and dependencies. Here we can find also the model in a pkl format.
  2. metrics: Metrics regarding the experiment. Metrics could be either custom or standardized based on libraries (sklearn etc.)
  3. params: The combination of hyperparameters used for the experiment
  4. tags: Various files that are consumed by mlflow api (versions, timestamps etc.)
  5. some metadata in a yaml file

Based on my knowledge the only thing (at least for now) that can be used is number 2 by replacing the functions which calculate the evaluation metrics here from row 65 and below.

NickNtamp avatar Feb 21 '23 18:02 NickNtamp