node_exporter icon indicating copy to clipboard operation
node_exporter copied to clipboard

[Feature request] Add sensor label to node_hwmon_temp_celsius

Open pengbins opened this issue 2 years ago • 7 comments

When I add node_hwmon_temp_celsius to a dashboard, I found that it is very hard to map those metrics to the corresponding physical devices.

There are only chip names and sensor names, and sensor names always show as "temp%d".

node_hwmon_temp

It seems the label label in node_hwmon_sensor_label is more "human readable".

Is it possible to also add the label into node_hwmon_temp_celsius?

An expected example:

node_hwmon_temp_celsius{chip="pci0000:00_0000:00:18_3", label="tdie", sensor="temp1"}
node_hwmon_temp_celsius{chip="pci0000:00_0000:00:18_3", label="tctl", sensor="temp2"}
node_hwmon_temp_celsius{chip="pci0000:00_0000:00:18_3", label="tccd1", sensor="temp3"}
node_hwmon_temp_celsius{chip="pci0000:00_0000:00:18_3", label="tccd2", sensor="temp4"}

And maybe we can remove node_hwmon_sensor_label as a metric after adding the label to node_hwmon_temp_celsius. Will the label of a sensor change over time?

pengbins avatar Apr 21 '23 11:04 pengbins

You can simply join the metrics in your promql query to get the label label included in the results, e.g.:

node_hwmon_temp_celsius * on (instance, chip, sensor) group_left (label) node_hwmon_sensor_label

dswarbrick avatar Apr 21 '23 15:04 dswarbrick

You can simply join the metrics in your promql query to get the label label included in the results, e.g.:

node_hwmon_temp_celsius * on (instance, chip, sensor) group_left (label) node_hwmon_sensor_label

Thanks, this would work.

But, would it be more user-friendly if we do this in hwmon collector?

And node_hwmon_sensor_label can be removed, less traffic and storage.

pengbins avatar Apr 23 '23 01:04 pengbins

The relationship between node_hwmon_temp_celsius and node_hwmon_sensor_label is not always 1:1.

Most systems have more node_hwmon_temp_celsius metrics (usually with chip="thermal_thermal_zoneNN") than node_hwmon_sensor_label metrics. In other words, not all hwmon temperature sensors have a "sensor label".

dswarbrick avatar Apr 23 '23 02:04 dswarbrick

The relationship between node_hwmon_temp_celsius and node_hwmon_sensor_label is not always 1:1.

Most systems have more node_hwmon_temp_celsius metrics (usually with chip="thermal_thermal_zoneNN") than node_hwmon_sensor_label metrics. In other words, not all hwmon temperature sensors have a "sensor label".

https://docs.kernel.org/hwmon/sysfs-interface.html#temperatures

According to docs of Linux kernel, there should be one label for each temp input:

Temperatures
...
temp[1-*]_input
...
temp[1-*]_label
...

pengbins avatar Apr 23 '23 03:04 pengbins

Apparently not always the case:

$ ll /sys/class/hwmon/hwmon4/
total 0
drwxr-xr-x 3 root root    0 Apr 23 05:45 ./
drwxr-xr-x 4 root root    0 Apr 23 05:45 ../
lrwxrwxrwx 1 root root    0 Apr 23 05:46 device -> ../../thermal_zone12/
-r--r--r-- 1 root root 4096 Apr 23 05:45 name
drwxr-xr-x 2 root root    0 Apr 23 05:46 power/
lrwxrwxrwx 1 root root    0 Apr 23 05:45 subsystem -> ../../../../../class/hwmon/
-r--r--r-- 1 root root 4096 Apr 23 05:46 temp1_input
-rw-r--r-- 1 root root 4096 Apr 23 05:45 uevent

dswarbrick avatar Apr 23 '23 03:04 dswarbrick

Seems duplicate with https://github.com/prometheus/node_exporter/issues/631

And ref https://stackoverflow.com/questions/72217597/prometheus-queries-how-to-give-a-default-label-when-it-is-missing

So not only me have this issue.

It would be great if hwmon can address this instead of searching for a solution and end up with a complex group_left query.

When there is no label for a sensor, how about just use unknow as the value of label label ?

pengbins avatar Apr 23 '23 09:04 pengbins

@dswarbrick Usually the 1:1 issue is for the other way around, when there's more than one label per item, not missing labels.

SuperQ avatar Apr 23 '23 16:04 SuperQ