memefactory
memefactory copied to clipboard
feature/monitoring-and-alerting
Problem
We need better monitoring of separate services, such that we know immediately about the health of our environment, with the ability to raise alerts based on the applied alerting rules over the input data.
Implementation
docker health checks
- define 0/1 health checks for our base services (RPC node, ipfs, application servers)
- curl for simplest use-cases or a custom health-check app for complex services:
- repository for custom health checks: https://github.com/district0x/monitoring
- for all reg-entries votes_total = votes_against+votes_for
- define restart triggers
monitoring and alerting
- Research and compare following tools:
- cAdvisor
- Prometheus
- DataDog
- Dashboards should be available on their own DNS and stay private
- Alerts should be sent to a slack channel.
Acceptance Criteria
- QA instance should be fully monitored
- dashboard is available and requires authorized access
- Single service going down triggers an alert visible in the slack channel
- updated documentation in the deployments repository