zos metrics: zos metrics for system and user containers

metrics will be available for prometheus, the node will still do a push (no polling) of the metrics to a configured prometheus endpoint

https://app.mindmup.com/map/_v2/90ceb3f0346c11ebb8fbcddb6bd8c75c

pdf download of the mindmup: zosmonitoring.pdf

the requires metrics are:

metrics
cpu
memory
disks
io
sizes
subvolumes ?
actual disk usage
error rates ?
number of reservations
... ?

Process:

[x] build collectors for all basic metrics
- [ ] Question: percentage of some values like context switches
- [ ] Question: disk health status
[x] use aggregation with redis lua script.

Dec 02 '20 11:12 muhamadazmy

Proposal:

Cook up a new deamon that opens a connection on zbus to other deamons. Periodically fetch all metrics from the different deamons: provisiond for reservations, storaged for disks / usage, etc .. When these metrics are computed, push them to prometheus. I think sending data every 10-15 minutes is sufficient.

This way we can create dashboard for farmers on grafana, they can for example aggregate data from all there nodes, set alerts when there are disk failures, check how many stuff is running, ...

Dec 03 '20 15:12 DylanVerstraete

I am sure we already can leverage on this https://github.com/prometheus/node_exporter or something similar this already does all the monitoring for you and is prometheus compatible.

Dec 03 '20 16:12 muhamadazmy