compute-runtime icon indicating copy to clipboard operation
compute-runtime copied to clipboard

GPU does not show up as OpenCL device when logged in over SSH, unless you login locally

Open ProjectPhysX opened this issue 1 year ago • 13 comments

On a fresh Ubuntu Server 23.04 installation (kernel 6.5), after installing NEO and rebooting, when accessing the machine remotely over SSH, the GPU (Arc A770) does not show up as OpenCL device. Only when I locally login at the PC, the GPU immediately shows up as OpenCL device both locally and in the remote terminal.

ProjectPhysX avatar Jan 29 '24 15:01 ProjectPhysX

Hi @ProjectPhysX Could you run command strace -o strace.log clinfo and share produced strace.log file?

JablonskiMateusz avatar Jan 29 '24 18:01 JablonskiMateusz

Hi @JablonskiMateusz,

here is strace-before-local-login.log, and visible devices are:

| Device ID    0 | NVIDIA TITAN Xp                                            |
| Device ID    1 | 13th Gen Intel(R) Core(TM) i7-13700K                       |
| Device ID    2 | Intel(R) FPGA Emulation Device                             |

After logging in locally on the PC, here is strace-after-local-login.log, and visible devices are:

| Device ID    0 | Intel(R) Arc(TM) A770 Graphics                             |
| Device ID    1 | Intel(R) UHD Graphics 770                                  |
| Device ID    2 | NVIDIA TITAN Xp                                            |
| Device ID    3 | 13th Gen Intel(R) Core(TM) i7-13700K                       |
| Device ID    4 | Intel(R) FPGA Emulation Device                             |

Kind regards, Moritz

ProjectPhysX avatar Jan 30 '24 15:01 ProjectPhysX

@ProjectPhysX from logs it looks like in the first log you don't have permission to gpu file:

openat(AT_FDCWD, "/dev/dri/by-path/pci-0000:00:02.0-render", O_RDWR|O_CLOEXEC) = -1 EACCES (Permission denied)

Please ensure that user you are using is a member of group render

JablonskiMateusz avatar Jan 30 '24 16:01 JablonskiMateusz

Hi @JablonskiMateusz,

thanks a lot for the help! An additional sudo usermod -a -G render $(whoami) fixes the issue. Please make the installation fix the file permissions or automatically put the user in the render group, and/or include this line in the intallation instructions.

Kind regards, Moritz

ProjectPhysX avatar Jan 30 '24 18:01 ProjectPhysX

@JablonskiMateusz, out of curiosity why does logging in locally "fix" this issue?

bashbaug avatar Jan 31 '24 04:01 bashbaug

@ProjectPhysX

In our readme we have following line:

To allow NEO access to GPU device make sure user has permissions to files /dev/dri/renderD*.

btw.

out of curiosity why does logging in locally "fix" this issue?

@ProjectPhysX when you logged locally, was it the same user as when you logged over ssh?

JablonskiMateusz avatar Jan 31 '24 08:01 JablonskiMateusz

@JablonskiMateusz yes, same user. The local login alone triggers the GPU to become visible as OpenCL device. Why can't the installation set the user access rights? Miss this detail and devices won't show up without any error, that's not user-friendly.

ProjectPhysX avatar Jan 31 '24 08:01 ProjectPhysX

thanks a lot for the help! An additional sudo usermod -a -G render $(whoami) fixes the issue. Please make the installation fix the file permissions or automatically put the user in the render group,

It's (definitely) not the driver (package) responsibility to do things like that.

and/or include this line in the intallation instructions.

Yes, that's a good idea. In which all documents you think this should be mentioned?

@JablonskiMateusz yes, same user. The local login alone triggers the GPU to become visible as OpenCL device.

As to what happens when you do graphical login locally... Your GUI session manager grants authenticated user (temporary) access to the display device. Otherwise user's GUI would not work that well (as it would fall back to CPU rendering, or even fail).

eero-t avatar Feb 16 '24 11:02 eero-t

Yes, that's a good idea. In which all documents you think this should be mentioned?

Here in the Readme and in the "Installation procedure" in release notes would be good. Thanks!

ProjectPhysX avatar Feb 16 '24 13:02 ProjectPhysX

An additional sudo usermod -a -G render $(whoami) fixes the issue.

Older (e.g. Ubuntu) distro versions do not have render group => it's better to use Intel device group ID directly.

In case host has also non-Intel DRM devices (with different group IDs), Intel GPU device file names can be gotten with following: grep -l 0x8086 /sys/class/drm/renderD*/device/vendor | cut -d/ -f 5

And group ID for the first one with: stat --format %g /dev/dri/$(grep -l 0x8086 /sys/class/drm/renderD*/device/vendor | cut -d/ -f 5 | head -1)

Yes, that's a good idea. In which all documents you think this should be mentioned?

Here in the Readme and in the "Installation procedure" in release notes would be good. Thanks!

Thanks! @JablonskiMateusz ?

eero-t avatar Feb 16 '24 13:02 eero-t