Incorrect Temperature_Celsius
Temperature value: 240518299684
Prometheus:
smartctl_device_attribute{attribute_flags_long="updated_online",attribute_flags_short="-O----",attribute_id="194",attribute_name="Temperature_Celsius",attribute_value_type="raw",device="/dev/sda",instance="10.99.2.2:9633",job="smartctl",model_family="Hitachi/HGST Travelstar Z5K500",model_name="Hitachi HTS545050A7E380",serial_number="TE95123QJTSM6V"} | 240518299684
-- | --
smartctl --json --xall /dev/sda:
{
"id": 194,
"name": "Temperature_Celsius",
"value": 166,
"worst": 166,
"thresh": 0,
"when_failed": "",
"flags": {
"value": 2,
"string": "-O---- ",
"prefailure": false,
"updated_online": true,
"performance": false,
"error_rate": false,
"event_count": false,
"auto_keep": false
},
"raw": {
"value": 240518299684,
"string": "36 (Min/Max 2/56)"
}
},
Full smartctl output here
@DiTsi ... looking at a particular drive via smartctl --json --xall /dev/sdh myself I see that the value indeed does not make much sense as a temperature reading. But it is simply the RAW value smartctl (and the drive firmware for that matter) does return:
[...]
{
"id": 194,
"name": "Temperature_Celsius",
"value": 181,
"worst": 181,
"thresh": 0,
"when_failed": "",
"flags": {
"value": 2,
"string": "-O---- ",
"prefailure": false,
"updated_online": true,
"performance": false,
"error_rate": false,
"event_count": false,
"auto_keep": false
},
"raw": {
"value": 176095166497,
"string": "33 (Min/Max 23/41)"
}
},
[...]
or if you rather look at the table output:
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE
1 Raw_Read_Error_Rate PO-R-- 100 100 016 - 0
2 Throughput_Performance P-S--- 137 137 054 - 104
3 Spin_Up_Time POS--- 133 133 024 - 495 (Average 495)
4 Start_Stop_Count -O--C- 100 100 000 - 19
5 Reallocated_Sector_Ct PO--CK 100 100 005 - 0
7 Seek_Error_Rate PO-R-- 100 100 067 - 0
8 Seek_Time_Performance P-S--- 140 140 020 - 15
9 Power_On_Hours -O--C- 095 095 000 - 39270
10 Spin_Retry_Count PO--C- 100 100 060 - 0
12 Power_Cycle_Count -O--CK 100 100 000 - 19
192 Power-Off_Retract_Count -O--CK 099 099 000 - 1235
193 Load_Cycle_Count -O--C- 099 099 000 - 1235
194 Temperature_Celsius -O---- 181 181 000 - 33 (Min/Max 23/41)
196 Reallocated_Event_Count -O--CK 100 100 000 - 0
197 Current_Pending_Sector -O---K 100 100 000 - 0
198 Offline_Uncorrectable ---R-- 100 100 000 - 0
199 UDMA_CRC_Error_Count -O-R-- 200 200 000 - 0
but there is the metric smartctl_device_temperature which reads from the:
[...]
"temperature": {
"current": 33,
"power_cycle_min": 25,
"power_cycle_max": 34,
"lifetime_min": 23,
"lifetime_max": 41,
"op_limit_min": 0,
"op_limit_max": 60,
"limit_min": -40,
"limit_max": 70,
"lifetime_over_limit_minutes": 0,
"lifetime_under_limit_minutes": 0
},
[...]
(see https://github.com/prometheus-community/smartctl_exporter/blob/75c76b363f6fb8454655cba5ebc4ad8089910670/smartctl.go#L211)
If you look at the manpage for smartmontools (https://github.com/smartmontools/smartmontools/blob/20d4f102744d0d8978bcad3e1c21773ef0520553/smartmontools/smartctl.8.in#L1225) they clearly state that there is conversion required and some vendors even do weird things. Please also see https://www.smartmontools.org/wiki/FAQ#Whyismydisktemperaturereportedbysmartdas150Celsius about the drive temperature.
I suppose in the end the exporter just converts what smartctl reports into metrics. Any any issues should rather be a bug reported with smartmontools at https://github.com/smartmontools/smartmontools/issues
I can confirm that its still an issue. The output of smartctl --json --xall /dev/sdX is
{
"id": 194,
"name": "Temperature_Celsius",
"value": 34,
"worst": 34,
"thresh": 0,
"when_failed": "",
"flags": {
"value": 34,
"string": "-O---K ",
"prefailure": false,
"updated_online": true,
"performance": false,
"error_rate": false,
"event_count": false,
"auto_keep": true
},
"raw": {
"value": 201864052770,
"string": "34 (Min/Max 9/47)"
}
},
for SSDs, and
{
"id": 194,
"name": "Temperature_Celsius",
"value": 108,
"worst": 102,
"thresh": 0,
"when_failed": "",
"flags": {
"value": 34,
"string": "-O---K ",
"prefailure": false,
"updated_online": true,
"performance": false,
"error_rate": false,
"event_count": false,
"auto_keep": true
},
"raw": {
"value": 35,
"string": "35"
}
}
for HDD
Don't use smartctl_device_attribute This query is being handled by smart.mineDeviceAttribute().
Use smartctl_device_temperature instead, which is handled by smart.mineTemperatures(). It even supposed to support non-sata drives https://github.com/smartmontools/smartmontools/issues/243#issuecomment-1943871227