checkmk icon indicating copy to clipboard operation
checkmk copied to clipboard

Fix parsing of current nvidia_smi section

Open mayrstefan opened this issue 1 year ago • 1 comments

General information

The XML output of current nvidia-smi commands contain two changes:

  1. power_readings was renamed to gpu_power_readings
  2. power_management was removed

You can find an example of the recent XML in https://github.com/influxdata/telegraf/issues/13653

Bug reports

Section parsing for nvidia_smi fails because it cannot find some expected elements in recent versions of the nvidia-smi output.

Proposed changes

This is an imroved version of PR #669

  • What is the expected behavior? Section can be parsed
  • What is the observed behavior? Section parsing crashes because it cannot find some XML elements
  • If it's not obvious from the above: In what way does your patch change the current behavior? The PR checks if an XML element with the new name gpu_power_readings exists. If not it fails back to the old element name power_readings. For the element power_management that has been removed it checks if it exists. If not it assumes a default of "Supported"

I have read the CLA Document and I hereby sign the CLA or my organization already has a signed CLA.

mayrstefan avatar Mar 27 '24 23:03 mayrstefan

Have successfully tested this PR. So hope the CheckMK devs will integrate it to get the NVIDIA GPU related issues fixed in one of the next checkmk releases.

jens-maus avatar Mar 31 '24 13:03 jens-maus

Thank you! I'll bring this into the next releases.

mo-ki avatar Apr 05 '24 11:04 mo-ki