zos icon indicating copy to clipboard operation
zos copied to clipboard

metrics: zos metrics for system and user containers

Open muhamadazmy opened this issue 5 years ago • 2 comments

metrics will be available for prometheus, the node will still do a push (no polling) of the metrics to a configured prometheus endpoint

https://app.mindmup.com/map/_v2/90ceb3f0346c11ebb8fbcddb6bd8c75c

pdf download of the mindmup: zosmonitoring.pdf

the requires metrics are:

  • metrics
  • cpu
  • memory
  • disks
  • io
  • sizes
  • subvolumes ?
  • actual disk usage
  • error rates ?
  • number of reservations
  • ... ?

Process:

  • [x] build collectors for all basic metrics
    • [ ] Question: percentage of some values like context switches
    • [ ] Question: disk health status
  • [x] use aggregation with redis lua script.

muhamadazmy avatar Dec 02 '20 11:12 muhamadazmy

Proposal:

Cook up a new deamon that opens a connection on zbus to other deamons. Periodically fetch all metrics from the different deamons: provisiond for reservations, storaged for disks / usage, etc .. When these metrics are computed, push them to prometheus. I think sending data every 10-15 minutes is sufficient.

This way we can create dashboard for farmers on grafana, they can for example aggregate data from all there nodes, set alerts when there are disk failures, check how many stuff is running, ...

DylanVerstraete avatar Dec 03 '20 15:12 DylanVerstraete

I am sure we already can leverage on this https://github.com/prometheus/node_exporter or something similar this already does all the monitoring for you and is prometheus compatible.

muhamadazmy avatar Dec 03 '20 16:12 muhamadazmy