kernel_tuner Add Tegra Observer to control clocks on Jetson devices

I've created an experimental TegraObserver that can monitor and control the GPU core clock on a Tegra device, similar to how NVML can do this for normal NVIDIA GPUs. I thought it may be a nice addition to kernel tuner, hence this PR. Comments and discussion welcome of course. While editing core.py I noticed two duplicate imports and removed those as well.

The TegraObserver does require elevated permission to be able to write to the files that control the clocks, similar to how the NVMLObserver requires that for use with nvidia-smi

I've tested it on a Jetson Nano and verified that the code to find the correct control files works on an Orin as well.

One issue I've noticed is that when kernel_tuner is interrupted through ctrl+C, the __del__ method of TegraObserver seems to be ignored, so the clocks are not set back to their initial values. I'm not sure how to fix that.

Edit: I wasn't sure how to add tests for this new observer. Perhaps just mocking the methods that read/write the clock frequencies?

Feb 16 '24 14:02 loostrum

I see that PR #242 changes how the clocks are set in NVML, when that one is merged I'll look at updating this PR to do it the same way for Tegra

Feb 16 '24 14:02 loostrum

The NVMLObserver is covered by these two files:

https://github.com/KernelTuner/kernel_tuner/blob/4dbcb66fad3561f7ee24d270b60d597e68fc4cdf/test/test_observers.py#L4
https://github.com/KernelTuner/kernel_tuner/blob/4dbcb66fad3561f7ee24d270b60d597e68fc4cdf/examples/cuda/vector_add_observers.py#L9

Regarding the former, you would need something similar to @skip_if_no_pynvml. Maybe it's enough to check the Linux kernel name/version (uname -r), as NVIDIA appends -tegra to the kernel name? The vector_add_observers could be renamed to _nvml and a similar example could be added for this new observer.

Another suggestion is to add a way to check whether the device is actually a Tegra device.

So taking the NVMLObserver as an example:

nvmlobserver = NVMLObserver(["nvml_energy", "temperature"])
metrics = OrderedDict()
metrics["GFLOPS/W"] = lambda p: (size/1e9) / p["nvml_energy"]

Could become:

nvmlobserver = NVMLObserver(["nvml_energy", "temperature"])
metrics = OrderedDict()
if nvmlobserver.is_available():
  metrics["GFLOPS/W"] = lambda p: (size/1e9) / p["nvml_energy"]

This allows for more generic KernelTuner scripts.

Mar 01 '24 14:03 csbnw

Hi @csbnw! That's a very good point actually on writing generic tuning scripts. If someone is interested in tuning on a range of devices, it could be annoying to instantiate different observers based on the device at hand. It also goes a bit against the design philosophy of Kernel Tuner that attempts to take care of 'boring' things automatically for as much as possible. I'll have to think about that for a bit on how we can achieve that, but it's a good point!

Mar 01 '24 15:03 benvanwerkhoven

Quality Gate passed

Issues
4 New issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
0.0% Duplication on New Code

See analysis details on SonarCloud

Mar 04 '24 15:03 sonarqubecloud[bot]

I'm going to merge this into the 'tegra_observer' branch on the main repo. To make it easier for the student working on this to make contributions.

Mar 18 '24 12:03 benvanwerkhoven