[ROCM-SMI] Does not recognize my CDNA GPU (MI300)
does nvtop 3.1.0 support MI300? rocm-smi recognizes the cards but nvtop says No GPU to monitor.
build ❯ ./src/nvtop -v
nvtop version 3.1.0
build ❯ ./src/nvtop
No GPU to monitor.
build ❯ rocm-smi
============================================ ROCm System Management Interface ============================================
====================================================== Concise Info ======================================================
Device Node IDs Temp Power Partitions SCLK MCLK Fan Perf PwrCap VRAM% GPU%
(DID, GUID) (Junction) (Socket) (Mem, Compute, ID)
==========================================================================================================================
0 2 0x74a1, 32700 39.0°C 134.0W NPS1, SPX, 0 132Mhz 900Mhz 0% auto 750.0W 0% 0%
1 3 0x74a1, 3884 41.0°C 139.0W NPS1, SPX, 0 131Mhz 900Mhz 0% auto 750.0W 0% 0%
2 4 0x74a1, 29122 37.0°C 130.0W NPS1, SPX, 0 132Mhz 900Mhz 0% auto 750.0W 0% 0%
3 5 0x74a1, 35464 43.0°C 139.0W NPS1, SPX, 0 131Mhz 900Mhz 0% auto 750.0W 0% 0%
4 6 0x74a1, 46166 37.0°C 133.0W NPS1, SPX, 0 132Mhz 900Mhz 0% auto 750.0W 10% 0%
5 7 0x74a1, 64654 41.0°C 141.0W NPS1, SPX, 0 132Mhz 900Mhz 0% auto 750.0W 0% 0%
6 8 0x74a1, 4769 42.0°C 136.0W NPS1, SPX, 0 132Mhz 900Mhz 0% auto 750.0W 0% 0%
7 9 0x74a1, 6315 35.0°C 129.0W NPS1, SPX, 0 132Mhz 900Mhz 0% auto 750.0W 0% 0%
==========================================================================================================================
================================================== End of ROCm SMI Log ===================================================
Unfortunately nvtop doesn't support the ROCm SMI interface yet.
For now only the AMD GPUs that are exposed though the DRM kernel interface will be displayed by nvtop.
It seems that the MI300 is mostly geared towards "compute" workloads such as ML and probably has no use as a graphical GPU.
although it support MI250X which is the same class. ( but older gen )as the MI300X wouldnt adding the PCI ids into the amdgpu_ids.h file to get it to be recognized, when amdgpu driver is installed, it also allows DRM kernel interface to interogate the MI300X. I will see if I can test this.
looks like this was an environment/setup problem rather than an nvtop issue. I added my user to the video and host-render groups, and it works now.
looks like this was an environment/setup problem rather than an nvtop issue. I added my user to the
videoandhost-rendergroups, and it works now.
It didn't really worked for me.
apt install nvtop # success
sudo usermod -aG video $USER
sudo usermod -aG host-render $USER
nvtop # still hits the no gpu error
and the rocm-smi is working correctly.
I started working on rocm-smi library support some time ago, it may make it in the next release!