open-gpu-kernel-modules icon indicating copy to clipboard operation
open-gpu-kernel-modules copied to clipboard

RTX 5090 not detected

Open grypp opened this issue 8 months ago • 7 comments

NVIDIA Open GPU Kernel Modules Version

575.51.03

Please confirm this issue does not happen with the proprietary driver (of the same version). This issue tracker is only for bugs specific to the open kernel driver.

  • [ ] I confirm that this does not happen with the proprietary driver package.

Operating System and Version

Ubuntu 24.04.2 LTS

Kernel Release

Linux 9989af7-lcedt 6.11.0-26-generic #26~24.04.1-Ubuntu SMP PREEMPT_DYNAMIC Thu Apr 17 19:20:47 UTC 2 x86_64 x86_64 x86_64 GNU/Linux

Please confirm you are running a stable release kernel (e.g. not a -rc). We do not accept bug reports for unreleased kernels.

  • [x] I am running on a stable kernel release.

Hardware: GPU

GeForce RTX 5090 gb202

Describe the bug

$ nvidia-smi 
No devices were found

To Reproduce

installed cuda drivers, nvidia-smi doesn't detect my gpu.

Bug Incidence

Always

nvidia-bug-report.log.gz

nvidia-bug-report.log.gz

More Info

No response

grypp avatar Jun 02 '25 13:06 grypp

Hi, through ubuntu-drivers devices you should be able to see the one with the suffix open, installing that driver should solve your problem

Reneanscen avatar Jun 03 '25 09:06 Reneanscen

@Reneanscen, thanks a lot for your reply. I’m not sure I fully understood your suggestion. Can you please describe it more clearly?

My desktop boots, but it doesn’t display anything on the screen (black screen). I can log in via SSH, but when I run nvidia-smi, it doesn’t detect any GPU. However, lspci shows a VGA device with an NVIDIA GPU.

Installing proprietary drivers also doesn't help.

grypp avatar Jun 03 '25 13:06 grypp

Not sure if its same issue as mine but latest open driver does not work with RTX 5070 TI notebook card either. If I downgrade driver to 570.153.02 then card starts working and nvidia-smi sees it.

nekromantik avatar Jun 07 '25 18:06 nekromantik

Hi @grypp,

I am seeing the following in the logs you shared.

2025-06-02T12:32:53.230422+00:00 9989af7-lcedt kernel: NVRM: loading NVIDIA UNIX Open Kernel Module for x86_64  575.51.03  Release Build  (dvs-builder@U22-I3-G01-1-4)  Wed Apr 16 14:03:30 UTC 2025
2025-06-02T12:32:53.242418+00:00 9989af7-lcedt kernel: nvidia-modeset: Loading NVIDIA UNIX Open Kernel Mode Setting Driver for x86_64  575.51.03  Release Build  (dvs-builder@U22-I3-G01-1-4)  Wed Apr 16 13:47:56 UTC 2025
2025-06-02T12:32:53.247421+00:00 9989af7-lcedt kernel: [drm] [nvidia-drm] [GPU ID 0x00004100] Loading driver
2025-06-02T12:32:53.247429+00:00 9989af7-lcedt kernel: [drm] Initialized nvidia-drm 0.0.0 for 0000:41:00.0 on minor 1
2025-06-02T12:34:27.829438+00:00 9989af7-lcedt kernel: NVRM: knvlinkCoreShutdownDeviceLinks_IMPL: Need to shutdown all links unilaterally for GPU0
2025-06-02T12:40:11.711687+00:00 9989af7-lcedt kernel: nvidia-nvlink: Nvlink Core is being initialized, major device number 235
2025-06-02T12:40:11.711687+00:00 9989af7-lcedt kernel: NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
2025-06-02T12:40:11.711688+00:00 9989af7-lcedt kernel: NVRM: BAR1 is 0M @ 0x0 (PCI:0000:41:00.0)
2025-06-02T12:40:11.711688+00:00 9989af7-lcedt kernel: NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
2025-06-02T12:40:11.711688+00:00 9989af7-lcedt kernel: NVRM: BAR2 is 0M @ 0x0 (PCI:0000:41:00.0)
2025-06-02T12:40:11.711689+00:00 9989af7-lcedt kernel: NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
2025-06-02T12:40:11.711689+00:00 9989af7-lcedt kernel: NVRM: BAR3 is 0M @ 0x0 (PCI:0000:41:00.0)
2025-06-02T12:40:11.711689+00:00 9989af7-lcedt kernel: NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
2025-06-02T12:40:11.711689+00:00 9989af7-lcedt kernel: NVRM: BAR4 is 0M @ 0x0 (PCI:0000:41:00.0)
2025-06-02T12:40:11.711692+00:00 9989af7-lcedt kernel: NVRM: loading NVIDIA UNIX Open Kernel Module for x86_64  575.51.03  Release Build  (dvs-builder@U22-I3-G01-1-4)  Wed Apr 16 14:03:30 UTC 2025
2025-06-02T12:40:11.711711+00:00 9989af7-lcedt kernel: nvidia-modeset: Loading NVIDIA UNIX Open Kernel Mode Setting Driver for x86_64  575.51.03  Release Build  (dvs-builder@U22-I3-G01-1-4)  Wed Apr 16 13:47:56 UTC 2025
2025-06-02T12:40:11.711711+00:00 9989af7-lcedt kernel: [drm] [nvidia-drm] [GPU ID 0x00004100] Loading driver
2025-06-02T12:40:11.711711+00:00 9989af7-lcedt kernel: [drm] Initialized nvidia-drm 0.0.0 for 0000:41:00.0 on minor 0
2025-06-02T12:40:21.704349+00:00 9989af7-lcedt kernel: NVRM: kbusVerifyBar2_GB202: MMUTest BAR0 window offset 0x70e000 returned garbage 0x0
2025-06-02T12:40:21.704355+00:00 9989af7-lcedt kernel: NVRM: nvAssertOkFailedNoLog: Assertion failed: Generic memory error [NV_ERR_MEMORY_ERROR] (0x00000072) returned from kbusVerifyBar2_HAL(pGpu, pKernelBus, NULL, NULL, 0, 0) @ kern_bus_gm107.c:352
2025-06-02T12:40:21.704356+00:00 9989af7-lcedt kernel: NVRM: nvAssertOkFailedNoLog: Assertion failed: Generic memory error [NV_ERR_MEMORY_ERROR] (0x00000072) returned from kbusStateInitLockedKernel_HAL(pGpu, pKernelBus) @ kern_bus_gm107.c:457
2025-06-02T12:40:21.704357+00:00 9989af7-lcedt kernel: NVRM: RmInitNvDevice: *** Cannot initialize the device
2025-06-02T12:40:21.704357+00:00 9989af7-lcedt kernel: NVRM: RmInitAdapter: RmInitNvDevice failed, bailing out of RmInitAdapter
2025-06-02T12:40:21.704358+00:00 9989af7-lcedt kernel: NVRM: rmapiReportLeakedDevices: Device object leak: (0xc1e00004, 0xcaf00000). Please file a bug against RM-core.
2025-06-02T12:40:21.704359+00:00 9989af7-lcedt kernel: NVRM: nvAssertFailedNoLog: Assertion failed: 0 @ rmapi.c:935
2025-06-02T12:40:21.706340+00:00 9989af7-lcedt kernel: NVRM: rmapiReportLeakedDevices: Device object leak: (0xc1e00005, 0xcaf00000). Please file a bug against RM-core.
2025-06-02T12:40:21.706342+00:00 9989af7-lcedt kernel: NVRM: nvAssertFailedNoLog: Assertion failed: 0 @ rmapi.c:935

Can you confirm if the "Above 4G decoding" UEFI setting on your motherboard related to BAR is enabled?

Binary-Eater avatar Jun 07 '25 20:06 Binary-Eater

I tested with rebar enabled or disabled, on Ubuntu 24.04.2 and 25.04. For both cases ONLY if you install the open source version of the 570 or the recommended 575, will the 5090 work.

The 5090 will only work with the open source one.

luisalvarado avatar Jun 16 '25 03:06 luisalvarado

@grypp In the ubuntu terminal, enter the command ubuntu-drivers devices You will see output similar to the following == /sys/devices/pci0000:00/0000:00:06.0/0000:02:00.0 == modalias : pci:v000010DEd00002C58sv0000103Csd00008D41bc03sc00i00 vendor : NVIDIA Corporation driver : nvidia-driver-570-server-open - distro non-free driver : nvidia-driver-570 - distro non-free recommended driver : nvidia-driver-570-open - distro non-free driver : nvidia-driver-570-server - distro non-free driver : xserver-xorg-video-nouveau - distro free builtin You need to install a driver like nvidia-driver-570-open

Reneanscen avatar Jun 26 '25 01:06 Reneanscen

Just to provide a heads up to folks commenting here, we know @grypp is using the open source drivers due to the kernel log messages.

2025-06-02T12:32:53.230422+00:00 9989af7-lcedt kernel: NVRM: loading NVIDIA UNIX Open Kernel Module for x86_64  575.51.03  Release Build  (dvs-builder@U22-I3-G01-1-4)  Wed Apr 16 14:03:30 UTC 2025

The proprietary drivers will not say NVIDIA UNIX Open Kernel Module in the kernel logs.

Binary-Eater avatar Nov 01 '25 07:11 Binary-Eater