api-layer icon indicating copy to clipboard operation
api-layer copied to clipboard

Metrics Dashboard for ML Services

Open CarsonCook opened this issue 5 years ago • 0 comments

Is your feature request related to a problem? Please describe. As a system administrator, I would like to be able to monitor the overall security, health, and performance of my installed mediation layer and its services.

Describe the solution you'd like A single dashboard that clearly shows individual services as well as aggregate system information, for transport-level information like request rates, error rates, etc., and lower-level information like CPU usage. This dashboard should have interactions available to fine-tune what metrics are shown.

The dashboard should be as implementation-agnostic as possible, so administrators can plug-in any data tracker solution they desire to be output on the dashboard.

Minimum Viable Product

  • Create a new optional "Metrics Service" that is linked in the Gateway's homepage
  • Retrieve HTTP metrics for each endpoint in each APIML service (core and onboarded) using Netflix Hystrix
  • Aggregate Hystrix data using Neftlix Turbine
  • Display HTTP metrics using the pre-built Hystrix and Turbine dashboards
  • Create proof-of-concept system metrics collection using a specific system metrics collection service. Collect system metrics for the core APIML services, broken down on a per-service basis.
  • Require standard Gateway authentication to access new endpoints (includes "Metrics Service" endpoints as well as Hystrix and Turbine endpoints)
  • Run sanity tests for the "Metrics Service" on marist
  • (From UX proposal) A high level overview of the metrics dashboard with a quick written FAQ/tutorial
  • (From UX proposal) A prominent link to the appropriate Github location prompting users to give feedback/comment on desired metrics-related features. Alternatively/additionally, a link to a survey to get richer and lower friction feedback from users.
  • (Stretch goal) Display system metrics using a pre-built UI integration
  • (Stretch goal) Allow static configuration to enable/disable specific routes for metrics collection

Extensions to MVP

  • Complete static configuration
  • Make metrics collection configuration dynamic, to allow administrators to change metrics collection without downtime
  • Generalize the proof-of-concept system metrics collection to allow users to integrate any system metrics collection service they desire

Describe alternatives you've considered Alternative is to not show metrics information. Could also use a partial solution, where there is less customization/integration/information available.

Additional context Can use Hystrix/Turbine dashboards as a Spring Cloud solution for transport-level information.

More detailed proposal is here. UX proposal from @IlyaKreynin here.

CarsonCook avatar Sep 04 '20 13:09 CarsonCook