feature/monitoring-and-alerting

Open fbielejec opened this issue 6 years ago • 0 comments

Problem

We need better monitoring of separate services, such that we know immediately about the health of our environment, with the ability to raise alerts based on the applied alerting rules over the input data.

Implementation

docker health checks

define 0/1 health checks for our base services (RPC node, ipfs, application servers)
curl for simplest use-cases or a custom health-check app for complex services:
- repository for custom health checks: https://github.com/district0x/monitoring
- for all reg-entries votes_total = votes_against+votes_for
define restart triggers

monitoring and alerting

Research and compare following tools:
- cAdvisor
- Prometheus
- DataDog
Dashboards should be available on their own DNS and stay private
Alerts should be sent to a slack channel.

Acceptance Criteria

QA instance should be fully monitored
dashboard is available and requires authorized access
Single service going down triggers an alert visible in the slack channel
updated documentation in the deployments repository

Jul 12 '19 07:07 fbielejec