memefactory icon indicating copy to clipboard operation
memefactory copied to clipboard

feature/monitoring-and-alerting

Open fbielejec opened this issue 6 years ago • 0 comments

Problem

We need better monitoring of separate services, such that we know immediately about the health of our environment, with the ability to raise alerts based on the applied alerting rules over the input data.

Implementation

docker health checks

  • define 0/1 health checks for our base services (RPC node, ipfs, application servers)
  • curl for simplest use-cases or a custom health-check app for complex services:
    • repository for custom health checks: https://github.com/district0x/monitoring
    • for all reg-entries votes_total = votes_against+votes_for
  • define restart triggers

monitoring and alerting

  • Research and compare following tools:
    • cAdvisor
    • Prometheus
    • DataDog
  • Dashboards should be available on their own DNS and stay private
  • Alerts should be sent to a slack channel.

Acceptance Criteria

  • QA instance should be fully monitored
  • dashboard is available and requires authorized access
  • Single service going down triggers an alert visible in the slack channel
  • updated documentation in the deployments repository

fbielejec avatar Jul 12 '19 07:07 fbielejec