feast icon indicating copy to clipboard operation
feast copied to clipboard

Add a family of Model objects to feast

Open tokoko opened this issue 1 year ago • 20 comments

This PR is part of an effort to improve integration of feast with model serving. Also see #4139 and accompanying draft RFC for a wider context.

Feast currently only handles feature retrieval, but stays away from model execution. This PR proposes extending the set of feast objects with a family of Model objects that represent actual machine learning models. There are two primary benefits that this new object is targeting:

  • Feast models will enable users to explicitly hook up ML models with a feast feature service.
  • Feast online store can be used as a temporary cache for recently computed model results to avoid unnecessary recomputation.
  • Data and metadata regarding both feature retrieval and model inference can be logged together.
  • (Maybe) also utilize this objects for training (?)

There will be two initial variations of models (classes extending Model interface)

  • PythonModel - a python function invoking model inference that will be stored as a pickle object in the registry.
  • OipModel - an object describing externally deployed ML model that satisfies Open Inference Protocol spec.

New infer method in FeatureStore class will enable users to call feature retrieval and model computation in a single method. This method will optionally allow the user to specify whether cached values are acceptable.

Example Usage:

store = FeatureStore()

example_feature_service = ...

def example_model(data_input) -> data_output:
	data_input['result'] = data_input['param1'] + 1
	return data_input

example_python_model = PythonModel(
	name = 'example_python_model',
	features = example_feature_service,
	model = example_model
)

example_oip_model = OipModel(
	name = 'example_oip_model',
	features = example_feature_service,
	url = 'http://...',
	model_name = 'example_oip_model',
	model_version = 1
)

store.apply([..,, example_feature_service, example_python_model, example_oip_model])

entity_df = pd.DataFrame(...)

store.infer(
	model='example_python_model', // or example_oip_model
	entities=entity_df,
	log_features=True,
	force_recompute=True,
)

tokoko avatar Jun 17 '24 13:06 tokoko

@franciscojavierarceo Your PR prompted me to draw this up. wdyt?

tokoko avatar Jun 17 '24 13:06 tokoko

Yeah I like this. I think we should use both infer and predict that just aliases to infer. I think that will be intuitive for traditional MLEs working with tabular data.

franciscojavierarceo avatar Jun 17 '24 14:06 franciscojavierarceo

yup, an alias sounds fine. Another thing that came to my mind, we will probably have to come up with an appropriate abstraction to put this functionality in (behind FeatureStore). I'm thinking of InferenceEngine or something similar with LocalInferenceEngine and RemoteInferenceEngine as (probably only) two implementations. That would enable us to apply security model to these methods as well. (fyi @dmartinol @redhatHameed)

tokoko avatar Jun 17 '24 18:06 tokoko

Yeah, that makes sense.

We can use KServe's protobuf definition too.

franciscojavierarceo avatar Jun 18 '24 16:06 franciscojavierarceo

FYI @rimolive

franciscojavierarceo avatar Jun 18 '24 16:06 franciscojavierarceo

@HaoXuAI @shuchu

franciscojavierarceo avatar Jun 18 '24 17:06 franciscojavierarceo

Yeah, that makes sense.

We can use KServe's protobuf definition too.

I'm pretty sure that's the same thing as OIP, it just had a couple of name changes along the way. V2 Inference Protocol, KServe V2 Protocol, Open Inference Protocol... they are all the same.

tokoko avatar Jun 18 '24 17:06 tokoko

Yup! I just wanted to provide the link to it.

franciscojavierarceo avatar Jun 19 '24 09:06 franciscojavierarceo

@tokoko (thanks for sharing!) why don't you also look at the Model Registry capabilities? This component provides a central repository to store and manage models, together with the serving endpoints for each deployment of a model artifact (for this I need to find exact doc references).

In short, instead of abstracting an InferenceEngine, we could abstract a ModelRegistry and provide an initial implementation for this Kubeflow component, WDYT?

BTW: This requirement appears more aligned with an external add-on for handling both feature retrieval and model execution together, rather than a core component expected within the 'feature store'. Wouldn't this introduce too many (potentially unnecessary) dependencies?

  • @jeremyary @etirelli

dmartinol avatar Jun 19 '24 10:06 dmartinol

@dmartinol thanks for joining in. (This will probably be lengthy) Let me start from your last point, it's probably crucial that we agree on high-level what we're trying to build here. Yes, this is definitely going out of 'feature store' territory and I'm also a bit uncomfortable about that. Having said that, if we find some utility in extending feast this way that's overlooked by other oss tools out there, we should still proceed. I know we might be committing a terrible case of a feature creep here, happy to be called out if you think we are 😄.

The RFC linked above goes into this, but let me briefly recap here as well... I think there are two general ways to bridge feature retrieval and model execution:

  1. If a user already has a somewhat complex serving architecture in place (I'm thinking of a model mesh here, either kserve or seldon) then the best way to go about it would be to treat feature retrieval as a sort of a transformer/preprocessor "model". The control plane of kserve (or seldon) would take turns to call feast first and model endpoint after that passing retrieved feature values. One idea I floated around in the RFC to support this flow is to modify our http/grpc feature servers to make them OIP-compliant effectively treating our FeatureService objects as models. If you have a feature service 'example_servicein the registry, callinginferendpoint of a feature server for a "model"example_service` would return necessary feature values. Since both kserve and seldon talk OIP, integration should be pretty simple. This flow is probably appropriate when an org has hundreds of models running and relying on kserve/seldon makes sense.

  2. This ticket comes into play for a simpler deployment flow, when you are able to spin up a model endpoint somewhere and also have feast deployment in place, but have no easy way to bridge the gap between the two. feast itself takes on the role of a very simple control plane and tries to abstract away model endpoints from the caller. There are also some seemingly beneficial side-effects, mainly the ability to rely on a feast online store for the storage of precomputed (meaning batch) or cached (written out after online calls with some ttl) results of particular models. Not sure if the same can be achieved with an external add-on, but that's also a possibility.

Regarding model registry, if registry indeed contains info about current deployment endpoints (not just trained artifacts), that might be a good idea, although I'm not sure where would be keep additional information regarding models (whether we want to enable cache or not, permissions and so on) if we don't have individual Model objects in our registry and instead rely on a list of models supplied by the registry IIUC.

tokoko avatar Jun 19 '24 13:06 tokoko

This ticket comes into play for a simpler deployment flow, when you are able to spin up a model endpoint somewhere and also have feast deployment in place, but have no easy way to bridge the gap between the two. feast itself takes on the role of a very simple control plane and tries to abstract away model endpoints from the caller. There are also some seemingly beneficial side-effects, mainly the ability to rely on a feast online store for the storage of precomputed (meaning batch) or cached (written out after online calls with some ttl) results of particular models. Not sure if the same can be achieved with an external add-on, but that's also a possibility.

This was one of my goals when drafting the RFC. (1) Creating some standard abstractions that gives guidance to the community and (2) enabling faster retrieval of predictions.

franciscojavierarceo avatar Jun 19 '24 16:06 franciscojavierarceo

FYI here's the link to the PR @tokoko mentioned: https://github.com/feast-dev/feast/issues/4288

franciscojavierarceo avatar Jun 21 '24 02:06 franciscojavierarceo

@tokoko @franciscojavierarceo thank you for the write-up/spec. I have concerns around the overall vision for the project and our strategy to achieve that vision, in light of this RFE. KubeFlow, as the open source MLOps platform, already supports and develops KFP, MR and KServe. There is clearly a gap there for the feature store that Feast can fill. However, in that architecture, Feast would be a service to both KFP and KServe, while MR would be responsible for recording the model metadata necessary to connect serving/inference with the feature store. Adding model metadata and API into the feature store, as proposed here, seems to invert the dependency graph in that architecture and create an overlap with the responsibilities of those other components. I am afraid that if we move forward with this RFC, it will cause confusion for users and make our integration with kubeflow more complicated, creating friction both from a technical as well as an organizational perspective. I would advise that we focus on the integration with these other components in the ecosystem instead.

etirelli avatar Jun 24 '24 14:06 etirelli

@etirelli thanks for the feedback here! Agree we shouldn't cause confusion and we should have clean and separate responsibilities from the other components.

I don't want to invert the dependency graph but I do want to enable faster model retrieval and I think there's a way to accomplish both of those things. I'll do more research and get back.

franciscojavierarceo avatar Jun 25 '24 20:06 franciscojavierarceo

Maybe the way to go without blurring the lines too much would be to somehow build this as a 3rd party tool (inside feast-dev organization if possible???). Initially that was my plan a while ago, already have a repository named hefeasto internally 😄.

We could base that 3rd party tool on existing FeatureStore interface or even expose some new abstractions if needed (for storing cached values in online store, checking whether cache is stale or not, etc...) Some of those new abstractions might even prove useful while integrating with kserve or seldon.

tokoko avatar Jun 27 '24 11:06 tokoko

To be clear, by better integration with kserve I mean that the flow on kserve side might look instead of like this (get_features_from_feast -> get_model_prediction_from_model_endpoint) more like this (check_if_feast_has_cached_result -(if not)> get_features_from_feast -> get_model_prediction_from_model_endpoint -> update_feast_cache)

tokoko avatar Jun 27 '24 11:06 tokoko

So I'm going to write a new spec that starts with two goals:

  1. Batch Models: Serving Pre-computed Predictions for some entity
  2. Cached Online Models: Caching predictions and updating them only when data sources change

This requires:

a. Retrieving a precomputed score from an online database b. Recomputing the score somehow (e.g., calling a Kserve endpoint) and retrieving the computed score from an online database

I think that means we can do this in 3 ways:

  1. KServe orchestrates the call to Feast and the KServe endpoint
    • This is how the KServe/Feast demo operates today but it's not meant for Batch Models
      • Modifying it to support Batch Models is straight forward but it'd basically just be a call to Feast
  2. Feast orchestrates
    • This is similar to my PR (if instead of an ODFV, we actually called KServe) and satisfies (1) and (2)
  3. New library to handle orchestration
    • This would be a light-weight library that would basically be what (2) does but outside of Feast

There are pros and cons to each and it'd be good to discuss them with the Kubeflow community to work through them to come to a consensus.

I think the solution recommended may end up depending upon the needs of the users. At Affirm, we wanted very low latency since we were serving a customer checkout, so an additional network hop was suboptimal and therefore calling Feast first would make more sense. Having KServe call Feast requires an additional RPC but it is a more intuitive architecture and makes the API call more intuitive.

franciscojavierarceo avatar Jul 08 '24 15:07 franciscojavierarceo

I'm going to create a new issue outlining my note above so we can discuss this with the Feast community there.

franciscojavierarceo avatar Jul 08 '24 15:07 franciscojavierarceo

@tokoko this is a useful blog post https://medium.com/marvelous-mlops/getting-started-with-databricks-feature-serving-bf590b318c31

franciscojavierarceo avatar Jul 10 '24 14:07 franciscojavierarceo

And it has this diagram:

image

franciscojavierarceo avatar Jul 10 '24 14:07 franciscojavierarceo

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Apr 26 '25 04:04 stale[bot]