Prometheus metrics: add node name label
Is there an existing issue?
- [X] I have searched the existing issues
Experiencing problems? Have you tried our Stack Exchange first?
- [X] This is not a support question.
Description of bug
Substrate repo has excellent grafana dashboards but they assume you run one node per VM. It is the case for Kusama/Polkadot validators, however, testnet may have more than 1 node per VM or don't run on VM at all (for example Kubernetes).
instance label used in dashboards may not be available:
"expr": "sum by (instance) (${metric_namespace}_sub_libp2p_pending_connections{instance=~\"${nodename}\"})",
For example, for nodes deployed in Kubernetes, we have to replace the instance label with the pod label.
Steps to reproduce
- Deploy 2 nodes in one vm.
- Open grafana dashboards
- Panels are not available.
Proposed solution
Current metrics don't have any unique labels.
substrate_sync_peers{chain="rococo_v2_2"} 150
I am suggesting adding a new label to each metric.
The good candidate for the new label is nodeid, it is unique for each node, but it will not be usable in Grafana UI. The better alternative is the node name. It can be set by the --name flag and if it is not set the name is random.
Example of metrics with new label:
substrate_sync_peers{chain="rococo_v2_2", node="rococo-validator-1"} 150
Grafana dashboard:
"expr": "sum by (node) (${metric_namespace}_sub_libp2p_pending_connections{node=\"${nodename}\"})",
We have noticed that there is an issue present in the alerting rules where the instance label in the alert should be replaced with the name label.