feast icon indicating copy to clipboard operation
feast copied to clipboard

What's the status of Feast's metrics system?

Open yanghua opened this issue 3 years ago • 6 comments

Is your feature request related to a problem? Please describe. When diving into Feast's serving module(java), I found a Metrics class, it seems we have implemented a metric reporter for Prometheus. However, I did not find any usage of it. Just found a monitor section that statement Feast integrates with the StatsD ecosystem. But still do not find more documentation.

Any more information?

Describe the solution you'd like A clear and concise description of what you want to happen.

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

Additional context Add any other context or screenshots about the feature request here.

yanghua avatar Jun 14 '22 07:06 yanghua

cc @adchia Can you chime in?

yanghua avatar Jun 14 '22 07:06 yanghua

Hey! yeah there is no usage of that Metrics class today in Java and that documentation is a bit misleading (though I think it used to be used back when the Java library was Spring based and also exposed an HTTP endpoint). AFAICT, the metrics are still being populated, just not exposed via an endpoint. Should be not too bad to incorporate it though (e.g. expose a Jetty server and add the MetricsServlet)

The python feature server doesn't yet use any StatsD client to push metrics.

adchia avatar Jun 15 '22 14:06 adchia

Hi @adchia Thanks for sharing the details. IMO, the metric system is the key component of the "Observability" and is very important for the core online service like feature serving.

AFAIK, many big data projects (build on JVM) use dropwizard-metrics to collect and report their metrics to many metrics-store(e.g. Prometheus, JMX, graphite, and so on). So WDYT about using this library to build the metric system for the feature serving(Java module).

However, yes, about the runtimes build with other languages, I do not find a general and universal metric library. Any thoughts or suggestions?

yanghua avatar Jun 16 '22 07:06 yanghua

Hey @yanghua,

Thanks for raising this issue!

I think it may be worth considering how most user will want to consume metrics. I don't know if we have very good data on this, but what I see most commonly is Prometheus and Grafana being used for operational metrics. Often complemented with StatsD for short-lived jobs that must push metrics out.

How does dropwizard compare to using Prom/Statsd or even opentelemetry, if we assume the above stack?

woop avatar Jun 16 '22 15:06 woop

Hey @yanghua,

Thanks for raising this issue!

I think it may be worth considering how most user will want to consume metrics. I don't know if we have very good data on this, but what I see most commonly is Prometheus and Grafana being used for operational metrics. Often complemented with StatsD for short-lived jobs that must push metrics out.

How does dropwizard compare to using Prom/Statsd or even opentelemetry, if we assume the above stack?

Hi @woop thanks for sharing your thought.

Let me do a more clear clarification.

The dropwizard can be treated as a general and powerful JVM-based metric toolkit. It has a rich ecosystem. Over this, it provided many reporters e.g. Prometheus, StatsD.

There are many projects that I have joined (e.g. Apache Hudi,Flink, Kyuubi(incubating)) that all use it.

but what I see most commonly is Prometheus and Grafana being used for operational metrics

AFAIK, dropwizard is absolutely compatible with the Prometheus/Grafana technology stack.

yanghua avatar Jun 17 '22 06:06 yanghua

@woop Any thoughts?

yanghua avatar Jun 20 '22 07:06 yanghua

Any updates on this? We're getting some strange timeouts in our stack, and figuring out where it comes from is quite hard without visibility into what's happening :)

creativedutchmen avatar Nov 01 '22 16:11 creativedutchmen

Any updates on this? We're getting some strange timeouts in our stack, and figuring out where it comes from is quite hard without visibility into what's happening :)

Hi @creativedutchmen it needs more discussion IMO.

yanghua avatar Nov 02 '22 00:11 yanghua

Got it. To provide one extra data-point in how I'd like to consume the metrics: the feature server is a long-lived instance, so having a prometheus /metrics endpoint would be great. I'd really only need to see some basic things like timings and request stats - that would already help a great deal.

creativedutchmen avatar Nov 02 '22 07:11 creativedutchmen

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Mar 18 '23 05:03 stale[bot]