greykite icon indicating copy to clipboard operation
greykite copied to clipboard

MLFLOW support for the SilverKite Algorithm

Open ikegwukc opened this issue 2 years ago • 1 comments

Hello, I am trying to log the sklearn pipeline in the model attribute to MLFLOW on databricks with the silverkite template. See the code snippet below. When attrempting to log the sklearn.pipeline that is in the model attribute of the "greykite.framework.pipeline.pipeline.ForecastResult" object I recieve an error message stating: NotImplementedError: Sorry, pickling not yet supported. See https://github.com/pydata/patsy/issues/26 if you want to help.

Any idea on how I can log this model with the silverkite template to MLFLOW?

 from collections import defaultdict
 import warnings

 warnings.filterwarnings("ignore")

 import pandas as pd
 import plotly

 from greykite.common.data_loader import DataLoader
 from greykite.framework.templates.autogen.forecast_config import ForecastConfig
 from greykite.framework.templates.autogen.forecast_config import MetadataParam
 from greykite.framework.templates.forecaster import Forecaster
 from greykite.framework.templates.model_templates import ModelTemplateEnum
 from greykite.framework.utils.result_summary import summarize_grid_search_results

 import mlflow 

 # Loads dataset into pandas DataFrame
 dl = DataLoader()
 df = dl.load_peyton_manning()

 # specify dataset information
 metadata = MetadataParam(
     time_col="ts",  # name of the time column ("date" in example above)
     value_col="y",  # name of the value column ("sessions" in example above)
     freq="D"  # "H" for hourly, "D" for daily, "W" for weekly, etc.
               # Any format accepted by `pandas.date_range`
 )

 forecaster = Forecaster()  # Creates forecasts and stores the result
 result = forecaster.run_forecast_config(  # result is also stored as `forecaster.forecast_result`.
     df=df,
     config=ForecastConfig(
         model_template=ModelTemplateEnum.SILVERKITE.name,
         forecast_horizon=365,  # forecasts 365 steps ahead
         coverage=0.95,         # 95% prediction intervals
         metadata_param=metadata
     )
 )

mlflow.sklearn.log_model(result.model, "model")

I'd like to add that if I change if I change the model_template from ModelTemplateEnum.SILVERKITE.name to ModelTemplateEnum.PROPHET.name the code works fine and I am able to log the model and read the model just fine.

Any advice on how to utlize MLFLLOW with the silverkite template?

ikegwukc avatar Jun 27 '23 20:06 ikegwukc

This is related to #73.

Long story short, the Silverkite template is tricky to serialize, due to the internal use of patsy. On the other hand, Prophet is pickable.

To log model artifacts to mlflow using the Silverkite template, you can dump them in a local path (using forecaster.dump_forecast_result), then the whole path can be logged to mlflow via mlflow.log_artifact.

Beware that forecaster.dump_forecast_result, as far as I know, does not work on Windows.

More info on model storing and loading are available here.

samuelefiorini avatar Jun 29 '23 05:06 samuelefiorini