substrate icon indicating copy to clipboard operation
substrate copied to clipboard

Prometheus metrics: add node name label

Open BulatSaif opened this issue 3 years ago • 1 comments

Is there an existing issue?

  • [X] I have searched the existing issues

Experiencing problems? Have you tried our Stack Exchange first?

  • [X] This is not a support question.

Description of bug

Substrate repo has excellent grafana dashboards but they assume you run one node per VM. It is the case for Kusama/Polkadot validators, however, testnet may have more than 1 node per VM or don't run on VM at all (for example Kubernetes). instance label used in dashboards may not be available:

"expr": "sum by (instance) (${metric_namespace}_sub_libp2p_pending_connections{instance=~\"${nodename}\"})",

For example, for nodes deployed in Kubernetes, we have to replace the instance label with the pod label.

Steps to reproduce

  1. Deploy 2 nodes in one vm.
  2. Open grafana dashboards
  3. Panels are not available.

Proposed solution

Current metrics don't have any unique labels.

substrate_sync_peers{chain="rococo_v2_2"} 150

I am suggesting adding a new label to each metric. The good candidate for the new label is nodeid, it is unique for each node, but it will not be usable in Grafana UI. The better alternative is the node name. It can be set by the --name flag and if it is not set the name is random. Example of metrics with new label:

substrate_sync_peers{chain="rococo_v2_2", node="rococo-validator-1"} 150

Grafana dashboard:

"expr": "sum by (node) (${metric_namespace}_sub_libp2p_pending_connections{node=\"${nodename}\"})",

BulatSaif avatar Dec 15 '22 14:12 BulatSaif

We have noticed that there is an issue present in the alerting rules where the instance label in the alert should be replaced with the name label.

BulatSaif avatar Jan 26 '23 13:01 BulatSaif