nvtop icon indicating copy to clipboard operation
nvtop copied to clipboard

Support for Intel Arc Graphics 130/140V

Open adrianboguszewski opened this issue 1 year ago • 26 comments

When I use nvtop for my latest Intel Core Ultra 258V I can see "No GPU to monitor". Hence, I'm kindy ask for that implementation :)

Ubuntu 24.10 with kernel 6.11.0

adrianboguszewski avatar Dec 17 '24 15:12 adrianboguszewski

Hi, I was just wondering, are you compiling from the latest commit? It sounds like it's using the xe driver which support for isn't in a release yet, but is already implemented.

Steve-Tech avatar Dec 22 '24 03:12 Steve-Tech

Hi, I was just wondering, are you compiling from the latest commit? It sounds like it's using the xe driver which support for isn't in a release yet, but is already implemented.

I just compiled it from master branch, it works on my 258V laptop.

But the MEM TEMP POW shows N/A.

g0ne150 avatar Dec 23 '24 08:12 g0ne150

Hi @g0ne150, thanks! I'm glad it works!

But the MEM TEMP POW shows N/A.

The memory frequency, and temperature isn't exposed by the driver, and I don't believe the power usage is visible on integrated graphics (it works on discrete cards though).

If I am wrong please let me know though; especially since @adrianboguszewski looks to work at Intel and might know something I don't 😉.

Steve-Tech avatar Dec 23 '24 12:12 Steve-Tech

@BartoszDunajski could you comment on the memory, temperature and power values? :)

adrianboguszewski avatar Dec 28 '24 17:12 adrianboguszewski

Hi I see something similar, but only for memory with the i915 driver in use.

Here if I run as root user it's all working but as my user I don't see memory utilization?

Image

Oh and the Fan rpm speed is all funky....

malcolmlewis avatar Feb 09 '25 22:02 malcolmlewis

Hi @malcolmlewis,

Intel requires CAP_PERFMON or CAP_SYS_ADMIN capabilities to access the total memory usage which is why it only shows when run as root, but you can run sudo setcap cap_perfmon=ep <path to nvtop> to grant the permissions for a non-root user.

As for the fan speed, what does sensors or the relevant sysfs file say?

Steve-Tech avatar Feb 09 '25 23:02 Steve-Tech

@Steve-Tech I did wonder about that, setcap cap_perfmon=ep /usr/bin/nvtop resolved 😉

I had to tweak the configs as best I could for sensors output based on a visible look at the fans. I need a tachometer to really check!

compute fan1  @/16,  16*@
label fan1 "ARC GPU Fan"

malcolmlewis avatar Feb 10 '25 00:02 malcolmlewis

@malcolmlewis Damn that's interesting, my A770's values seem realistic, but I also don't have a tachometer to check.

Steve-Tech avatar Feb 10 '25 00:02 Steve-Tech

@Steve-Tech I also have an Sparkle A310 ELF (The A380 is an AsRock) even that one I'm a bit suspect of.... but it does track with the sensors output.

malcolmlewis avatar Feb 10 '25 00:02 malcolmlewis

@Steve-Tech I currently have an Intel Arc B570 GPU and wanted to monitor its usage in Ubuntu 24.10. However, nvtop gives me the result "No GPU to monitor." I was wondering if the latest commit supports this Arc B570 GPU. Is there something I can look into to fix this?

I know this specific GPU uses the xe driver which, as you mentioned, support has been implemented for in the repo. I know this is a fairly new GPU. Appreciate any help!

AbelKel avatar Jun 17 '25 09:06 AbelKel

Hey @AbelKel,

Ubuntu 24.10 is still packaging nvtop 3.1.0, while 3.2.0 is required for xe support.

Try using the AppImage on the releases page, or compiling it yourself. You should also be able to download the deb packaged for questing, but mixing packages from other releases can cause issues.

If you are already running 3.2.0, could you post (or email like before) a sudo lspci -vvv.

Edit: Sorry I didn't see that you are compiling from the latest commit. Can you double check that it actually installed properly with nvtop -v, and perhaps post some info from grep ^ /sys/class/drm/card?/device/*, in addition to lspci.

Hope this helps, Steve

Steve-Tech avatar Jun 17 '25 09:06 Steve-Tech

@Steve-Tech I can detect all my cards with nvtop 3.2.0, but usage is always 0%

Image

adrianboguszewski avatar Jun 17 '25 11:06 adrianboguszewski

@adrianboguszewski Hmm interesting, what kernel are you running and does it still happen with sudo?

Steve-Tech avatar Jun 17 '25 11:06 Steve-Tech

@Steve-Tech thanks for the response. I did a clean install of the system because I felt that I had messed up somewhere when installing drivers. After compiling the latest commit, it now seems to be measuring GPU usage and memory consumption.

Image

I was trying to get the power consumption and fan speed working but ran into a block. It seems like the sensors are not picking up anything from what I can see after running the sensors command:

Image

Is there any way I can get the power and can speed working? I assumed temperature telemetry is not functional at the moment so I did not bother investigating that.

Also, I am on Ubuntu 24.10 and kernel 6.11.0-26-generic

Thanks again for the help!

AbelKel avatar Jun 17 '25 19:06 AbelKel

@AbelKel for the sensors, I know exactly what's happening. nvtop is reading card, when it should be reading pkg. You could flip the order around here, or check that the value it read is non-zero: https://github.com/Syllo/nvtop/blob/master/src%2Fextract_gpuinfo_intel.c#L288-L297 I can also fix it this afternoon.

For the speed, could you post the output of tree /sys/class/drm/card?/ for me?

Steve-Tech avatar Jun 17 '25 21:06 Steve-Tech

Hey @Steve-Tech, I applied your recommended changes as follows:

    const char *hwmon_power_max;
    bool power_data_present = false;
    unsigned val = 0;

    // power1 is for i915 and `card` on supported cards on xe, power2 is `pkg` on xe
    if (nvtop_device_get_sysattr_value(hwmon_dev_noncached, "power1_max", &hwmon_power_max) >= 0) {
      val = strtoul(hwmon_power_max, NULL, 10);
      power_data_present = (val > 0);
    }
    
    if (!power_data_present && nvtop_device_get_sysattr_value(hwmon_dev_noncached, "power2_max", &hwmon_power_max) >= 0) {
      val = strtoul(hwmon_power_max, NULL, 10);
      power_data_present = (val > 0);
    }
    
    SET_GPUINFO_DYNAMIC(dynamic_info, power_draw_max, val / 1000);

    const char *hwmon_energy;
    bool energy_data_present = false;
  
    // energy1 is for i915 and `card` on supported cards on xe, energy2 is `pkg` on xe
    if (nvtop_device_get_sysattr_value(hwmon_dev_noncached, "energy1_input", &hwmon_energy) >= 0) {
      val = strtoul(hwmon_energy, NULL, 10);
      energy_data_present = (val > 0);
    }
    
    //checking value for energy 2 for xe
    if (!energy_data_present && nvtop_device_get_sysattr_value(hwmon_dev_noncached, "energy2_input", &hwmon_energy) >= 0) {
      val = strtoul(hwmon_energy, NULL, 10);
      energy_data_present = (val > 0);
    }

    if (energy_data_present) {
      nvtop_time ts;
      nvtop_get_current_time(&ts);
      unsigned old = gpu_info->energy.energy_uj;
      uint64_t time = nvtop_difftime_u64(gpu_info->energy.time, ts);
      // Skip the first update so we have a time delta
      if (gpu_info->energy.time.tv_sec != 0) {
        unsigned power = ((val - old) * 1000000000LL) / time;
        SET_GPUINFO_DYNAMIC(dynamic_info, power_draw, power / 1000);
      }
      gpu_info->energy.energy_uj = val;
      gpu_info->energy.time = ts;
    }

Now, I am getting values for power(POW) value in nvtop:

Image

However, I am a little confused as I am seeing 0 W above. I assume the program is measuring the energy, displayed as POW, correctly. However, it is seems like it is not measuring the Watts? Feel free to correct me. Also, I can make the changes into a PR if you would like.

For the fan speed, here is the output. It's quite lengthy.

computer:/$ tree /sys/class/drm/card1
/sys/class/drm/card1
├── card1-DP-1
│   ├── connector_id
│   ├── ddc -> i2c-11
│   ├── device -> ../../card1
│   ├── dpms
│   ├── drm_dp_aux0
│   │   ├── dev
│   │   ├── device -> ../../card1-DP-1
│   │   ├── name
│   │   ├── power
│   │   │   ├── async
│   │   │   ├── autosuspend_delay_ms
│   │   │   ├── control
│   │   │   ├── runtime_active_kids
│   │   │   ├── runtime_active_time
│   │   │   ├── runtime_enabled
│   │   │   ├── runtime_status
│   │   │   ├── runtime_suspended_time
│   │   │   └── runtime_usage
│   │   ├── subsystem -> ../../../../../../../../../../class/drm_dp_aux_dev
│   │   └── uevent
│   ├── edid
│   ├── enabled
│   ├── i2c-11
│   │   ├── delete_device
│   │   ├── device -> ../../card1-DP-1
│   │   ├── i2c-dev
│   │   │   └── i2c-11
│   │   │       ├── dev
│   │   │       ├── device -> ../../../i2c-11
│   │   │       ├── name
│   │   │       ├── power
│   │   │       │   ├── async
│   │   │       │   ├── autosuspend_delay_ms
│   │   │       │   ├── control
│   │   │       │   ├── runtime_active_kids
│   │   │       │   ├── runtime_active_time
│   │   │       │   ├── runtime_enabled
│   │   │       │   ├── runtime_status
│   │   │       │   ├── runtime_suspended_time
│   │   │       │   └── runtime_usage
│   │   │       ├── subsystem -> ../../../../../../../../../../../../class/i2c-dev
│   │   │       └── uevent
│   │   ├── name
│   │   ├── new_device
│   │   ├── power
│   │   │   ├── async
│   │   │   ├── runtime_active_kids
│   │   │   ├── runtime_enabled
│   │   │   ├── runtime_status
│   │   │   └── runtime_usage
│   │   ├── subsystem -> ../../../../../../../../../../bus/i2c
│   │   └── uevent
│   ├── modes
│   ├── power
│   │   ├── async
│   │   ├── autosuspend_delay_ms
│   │   ├── control
│   │   ├── runtime_active_kids
│   │   ├── runtime_active_time
│   │   ├── runtime_enabled
│   │   ├── runtime_status
│   │   ├── runtime_suspended_time
│   │   └── runtime_usage
│   ├── status
│   ├── subsystem -> ../../../../../../../../../class/drm
│   └── uevent
├── card1-DP-2
│   ├── connector_id
│   ├── ddc -> i2c-12
│   ├── device -> ../../card1
│   ├── dpms
│   ├── drm_dp_aux1
│   │   ├── dev
│   │   ├── device -> ../../card1-DP-2
│   │   ├── name
│   │   ├── power
│   │   │   ├── async
│   │   │   ├── autosuspend_delay_ms
│   │   │   ├── control
│   │   │   ├── runtime_active_kids
│   │   │   ├── runtime_active_time
│   │   │   ├── runtime_enabled
│   │   │   ├── runtime_status
│   │   │   ├── runtime_suspended_time
│   │   │   └── runtime_usage
│   │   ├── subsystem -> ../../../../../../../../../../class/drm_dp_aux_dev
│   │   └── uevent
│   ├── edid
│   ├── enabled
│   ├── i2c-12
│   │   ├── delete_device
│   │   ├── device -> ../../card1-DP-2
│   │   ├── i2c-dev
│   │   │   └── i2c-12
│   │   │       ├── dev
│   │   │       ├── device -> ../../../i2c-12
│   │   │       ├── name
│   │   │       ├── power
│   │   │       │   ├── async
│   │   │       │   ├── autosuspend_delay_ms
│   │   │       │   ├── control
│   │   │       │   ├── runtime_active_kids
│   │   │       │   ├── runtime_active_time
│   │   │       │   ├── runtime_enabled
│   │   │       │   ├── runtime_status
│   │   │       │   ├── runtime_suspended_time
│   │   │       │   └── runtime_usage
│   │   │       ├── subsystem -> ../../../../../../../../../../../../class/i2c-dev
│   │   │       └── uevent
│   │   ├── name
│   │   ├── new_device
│   │   ├── power
│   │   │   ├── async
│   │   │   ├── runtime_active_kids
│   │   │   ├── runtime_enabled
│   │   │   ├── runtime_status
│   │   │   └── runtime_usage
│   │   ├── subsystem -> ../../../../../../../../../../bus/i2c
│   │   └── uevent
│   ├── modes
│   ├── power
│   │   ├── async
│   │   ├── autosuspend_delay_ms
│   │   ├── control
│   │   ├── runtime_active_kids
│   │   ├── runtime_active_time
│   │   ├── runtime_enabled
│   │   ├── runtime_status
│   │   ├── runtime_suspended_time
│   │   └── runtime_usage
│   ├── status
│   ├── subsystem -> ../../../../../../../../../class/drm
│   └── uevent
├── card1-DP-3
│   ├── connector_id
│   ├── ddc -> i2c-13
│   ├── device -> ../../card1
│   ├── dpms
│   ├── drm_dp_aux2
│   │   ├── dev
│   │   ├── device -> ../../card1-DP-3
│   │   ├── name
│   │   ├── power
│   │   │   ├── async
│   │   │   ├── autosuspend_delay_ms
│   │   │   ├── control
│   │   │   ├── runtime_active_kids
│   │   │   ├── runtime_active_time
│   │   │   ├── runtime_enabled
│   │   │   ├── runtime_status
│   │   │   ├── runtime_suspended_time
│   │   │   └── runtime_usage
│   │   ├── subsystem -> ../../../../../../../../../../class/drm_dp_aux_dev
│   │   └── uevent
│   ├── edid
│   ├── enabled
│   ├── i2c-13
│   │   ├── delete_device
│   │   ├── device -> ../../card1-DP-3
│   │   ├── i2c-dev
│   │   │   └── i2c-13
│   │   │       ├── dev
│   │   │       ├── device -> ../../../i2c-13
│   │   │       ├── name
│   │   │       ├── power
│   │   │       │   ├── async
│   │   │       │   ├── autosuspend_delay_ms
│   │   │       │   ├── control
│   │   │       │   ├── runtime_active_kids
│   │   │       │   ├── runtime_active_time
│   │   │       │   ├── runtime_enabled
│   │   │       │   ├── runtime_status
│   │   │       │   ├── runtime_suspended_time
│   │   │       │   └── runtime_usage
│   │   │       ├── subsystem -> ../../../../../../../../../../../../class/i2c-dev
│   │   │       └── uevent
│   │   ├── name
│   │   ├── new_device
│   │   ├── power
│   │   │   ├── async
│   │   │   ├── runtime_active_kids
│   │   │   ├── runtime_enabled
│   │   │   ├── runtime_status
│   │   │   └── runtime_usage
│   │   ├── subsystem -> ../../../../../../../../../../bus/i2c
│   │   └── uevent
│   ├── modes
│   ├── power
│   │   ├── async
│   │   ├── autosuspend_delay_ms
│   │   ├── control
│   │   ├── runtime_active_kids
│   │   ├── runtime_active_time
│   │   ├── runtime_enabled
│   │   ├── runtime_status
│   │   ├── runtime_suspended_time
│   │   └── runtime_usage
│   ├── status
│   ├── subsystem -> ../../../../../../../../../class/drm
│   └── uevent
├── card1-HDMI-A-1
│   ├── connector_id
│   ├── ddc -> ../../../i2c-7
│   ├── device -> ../../card1
│   ├── dpms
│   ├── edid
│   ├── enabled
│   ├── modes
│   ├── power
│   │   ├── async
│   │   ├── autosuspend_delay_ms
│   │   ├── control
│   │   ├── runtime_active_kids
│   │   ├── runtime_active_time
│   │   ├── runtime_enabled
│   │   ├── runtime_status
│   │   ├── runtime_suspended_time
│   │   └── runtime_usage
│   ├── status
│   ├── subsystem -> ../../../../../../../../../class/drm
│   └── uevent
├── card1-HDMI-A-2
│   ├── connector_id
│   ├── ddc -> ../../../i2c-8
│   ├── device -> ../../card1
│   ├── dpms
│   ├── edid
│   ├── enabled
│   ├── modes
│   ├── power
│   │   ├── async
│   │   ├── autosuspend_delay_ms
│   │   ├── control
│   │   ├── runtime_active_kids
│   │   ├── runtime_active_time
│   │   ├── runtime_enabled
│   │   ├── runtime_status
│   │   ├── runtime_suspended_time
│   │   └── runtime_usage
│   ├── status
│   ├── subsystem -> ../../../../../../../../../class/drm
│   └── uevent
├── card1-HDMI-A-3
│   ├── connector_id
│   ├── ddc -> ../../../i2c-9
│   ├── device -> ../../card1
│   ├── dpms
│   ├── edid
│   ├── enabled
│   ├── modes
│   ├── power
│   │   ├── async
│   │   ├── autosuspend_delay_ms
│   │   ├── control
│   │   ├── runtime_active_kids
│   │   ├── runtime_active_time
│   │   ├── runtime_enabled
│   │   ├── runtime_status
│   │   ├── runtime_suspended_time
│   │   └── runtime_usage
│   ├── status
│   ├── subsystem -> ../../../../../../../../../class/drm
│   └── uevent
├── card1-HDMI-A-4
│   ├── connector_id
│   ├── ddc -> ../../../i2c-10
│   ├── device -> ../../card1
│   ├── dpms
│   ├── edid
│   ├── enabled
│   ├── modes
│   ├── power
│   │   ├── async
│   │   ├── autosuspend_delay_ms
│   │   ├── control
│   │   ├── runtime_active_kids
│   │   ├── runtime_active_time
│   │   ├── runtime_enabled
│   │   ├── runtime_status
│   │   ├── runtime_suspended_time
│   │   └── runtime_usage
│   ├── status
│   ├── subsystem -> ../../../../../../../../../class/drm
│   └── uevent
├── dev
├── device -> ../../../0000:03:00.0
├── metrics
├── power
│   ├── async
│   ├── autosuspend_delay_ms
│   ├── control
│   ├── runtime_active_kids
│   ├── runtime_active_time
│   ├── runtime_enabled
│   ├── runtime_status
│   ├── runtime_suspended_time
│   └── runtime_usage
├── subsystem -> ../../../../../../../../class/drm
└── uevent

79 directories, 222 files

Thanks again for being huge help!

AbelKel avatar Jun 17 '25 23:06 AbelKel

Hey @AbelKel,

POW is just short for power, the 16 is the current wattage and seems correct, but it's not reading the max wattage (the 0 W). I believe power2_crit should have the max wattage, so maybe add a check for that too.

The fan speed (and temperature) are likely just because it was only very recently merged into the kernel, and hasn't been back ported yet. If you do get a chance to run a newer kernel, could you run sensors for me, the driver supports up to 3 fans and I'm curious if it actually works on consumer cards.

I think I was thinking of the GPU speed for the tree command, but it's working so it doesn't matter haha.

Steve-Tech avatar Jun 18 '25 02:06 Steve-Tech

@Steve-Tech Hi, I'm on the 6.15.2 kernel here on Tumbleweed. I've set my Intel ARC A310 to the Xe driver and sensors doesn't report any fan speed (it does on i915). Power is working I see it jump from 16W to 30+ when running a ray tracing benchmark, temperature works too.

malcolmlewis avatar Jun 18 '25 02:06 malcolmlewis

@malcolmlewis @AbelKel

Sorry, it seems the fan speeds only got merged in the 6.16 rcs. I thought they were in 6.15, but I think I confused myself by having the patches applied manually on my own kernel.

Steve-Tech avatar Jun 18 '25 03:06 Steve-Tech

@Steve-Tech This is the sensor output in kernel 6.14.0:

xe-pci-0300
Adapter: PCI adapter
card:             N/A  (max =   0.00 W)
pkg:              N/A  (max =   0.00 W, crit = 320.00 W)
card:          0.00 J
pkg:           3.02 kJ

gigabyte_wmi-virtual-0
Adapter: Virtual device
temp1:        +37.0°C  
temp2:        +47.0°C  
temp3:        +50.0°C  
temp4:        +37.0°C  
temp5:        +40.0°C  
temp6:        +42.0°C  

nvme-pci-1200
Adapter: PCI adapter
Composite:    +48.9°C  (low  =  -0.1°C, high = +82.8°C)
                       (crit = +83.8°C)
Sensor 1:     +41.9°C  (low  = -273.1°C, high = +65261.8°C)
Sensor 2:     +44.9°C  (low  = -273.1°C, high = +65261.8°C)

acpitz-acpi-0
Adapter: ACPI interface
temp1:        +16.8°C  

iwlwifi_1-virtual-0
Adapter: Virtual device
temp1:        +38.0°C  

k10temp-pci-00c3
Adapter: PCI adapter
Tctl:         +50.1°C  

r8169_0_a00:00-mdio-0
Adapter: MDIO adapter
temp1:        +39.0°C  (high = +120.0°C)

I think the telemetry on nvtop is enough for me at the moment.

I don't know if this is related but I was wondering if @Steve-Tech or @malcolmlewis can help me with the following issue. After putting my computer to sleep in Ubuntu, it immediately wakes up and goes to the lock screen. I implemented this solution https://askubuntu.com/questions/1395148/pc-wakes-up-immediately-after-suspend which made my monitor go to sleep but I still can hear the fans on my computer go after going on 'suspend'. Have any of you experienced this before?

I don't know if it is related to the GPU or anything else. I have done hours of debugging and found no solution. Please let me know if any of you have any ideas. Thanks!

AbelKel avatar Jun 18 '25 05:06 AbelKel

@AbelKel Would you be able for test my xe2 branch? I've fixed the power usage and max wattage values.

Steve-Tech avatar Jun 18 '25 10:06 Steve-Tech

@malcolmlewis @AbelKel

Sorry, it seems the fan speeds only got merged in the 6.16 rcs. I thought they were in 6.15, but I think I confused myself by having the patches applied manually on my own kernel.

@Steve-Tech I switched over to kernel 6.16.0-rc2 and can confirm fan speeds are visible, so looking good 😄

xe-pci-6700
Adapter: PCI adapter
pkg:         725.00 mV 
fan1:        3011 RPM
fan2:           0 RPM
pkg:          +37.0°C  
vram:         +36.0°C  
ERROR: Can't get value of subfeature power1_crit: Can't read
card:             N/A  (crit =   0.00 W)
power2:           N/A  (max =  31.25 W, rated max =   0.00 W)
pkg:         660.29 J

Image

Image

malcolmlewis avatar Jun 18 '25 13:06 malcolmlewis

@AbelKel Would you be able for test my xe2 branch? I've fixed the power usage and max wattage values.

I had to go back to square one with my Linux distro because I was having a bunch of driver issues. I am trying out different distros and will let you know if I encounter anything with nvtop. Thanks again for all the help!

AbelKel avatar Jun 19 '25 10:06 AbelKel

@adrianboguszewski Hmm interesting, what kernel are you running and does it still happen with sudo?

Yep, I was running without sudo. Working now :)

adrianboguszewski avatar Jun 23 '25 11:06 adrianboguszewski

@AbelKel Would you be able for test my xe2 branch? I've fixed the power usage and max wattage values.

Hey @Steve-Tech, Sorry for the late reply. This branch resolves the power usage and max wattage values. Thank for all the help!

Image

AbelKel avatar Jun 26 '25 08:06 AbelKel

@AbelKel thanks! I'll create a PR soon.

Steve-Tech avatar Jun 26 '25 09:06 Steve-Tech