open-gpu-kernel-modules icon indicating copy to clipboard operation
open-gpu-kernel-modules copied to clipboard

allow a way to specify the kernel

Open vans163 opened this issue 2 years ago • 1 comments

NVIDIA Open GPU Kernel Modules Version

535.104.05

Operating System and Version

Ubuntu 23.04

Kernel Release

Linux 6.5.1 #1 SMP PREEMPT_DYNAMIC Wed Sep 6 18:49:37 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

Build Command

make modules_install

Terminal output/Build Log

make[1]: Leaving directory '/root/source/open-gpu-kernel-modules/src/nvidia'
cd kernel-open/nvidia/ && ln -sf ../../src/nvidia/_out/Linux_x86_64/nv-kernel.o nv-kernel.o_binary
make -C kernel-open modules
make[1]: Entering directory '/root/source/open-gpu-kernel-modules/kernel-open'
make[2]: Entering directory '/root/source/open-gpu-kernel-modules/kernel-open'
make[2]: *** /lib/modules/6.2.0-26-generic/build: No such file or directory.  Stop.
make[2]: Leaving directory '/root/source/open-gpu-kernel-modules/kernel-open'
make[1]: *** [Makefile:82: modules] Error 2
make[1]: Leaving directory '/root/source/open-gpu-kernel-modules/kernel-open'
make: *** [Makefile:59: modules] Error 2

More Info

I am running the build inside a systemd-container make[2]: *** /lib/modules/6.2.0-26-generic/build: No such file or directory. Stop. 6.2.0-26-generic is the output of uname -r. The NVIDIA driver correctly detects this and pulls the kernel from inside the systemd-container. the installer here does not give any option to specify the kernel to install the modules to or automatically detect it.

vans163 avatar Sep 06 '23 22:09 vans163

It should be better documented, but you can use the SYSSRC and SYSOUT variables to tell the build where to find the kernel sources and build output. See the logic in kernel-open/Makefile:

  ifdef SYSSRC
    KERNEL_SOURCES := $(SYSSRC)
  else
    KERNEL_UNAME ?= $(shell uname -r)
    KERNEL_MODLIB := /lib/modules/$(KERNEL_UNAME)
    KERNEL_SOURCES := $(shell test -d $(KERNEL_MODLIB)/source && echo $(KERNEL_MODLIB)/source || echo $(KERNEL_MODLIB)/build)
  endif

  KERNEL_OUTPUT := $(KERNEL_SOURCES)
  KBUILD_PARAMS :=

  ifdef SYSOUT
    ifneq ($(SYSOUT), $(KERNEL_SOURCES))
      KERNEL_OUTPUT := $(SYSOUT)
      KBUILD_PARAMS := KBUILD_OUTPUT=$(KERNEL_OUTPUT)
    endif
  else
    KERNEL_UNAME ?= $(shell uname -r)
    KERNEL_MODLIB := /lib/modules/$(KERNEL_UNAME)
    ifeq ($(KERNEL_SOURCES), $(KERNEL_MODLIB)/source)
      KERNEL_OUTPUT := $(KERNEL_MODLIB)/build
      KBUILD_PARAMS := KBUILD_OUTPUT=$(KERNEL_OUTPUT)
    endif
  endif

Does that help at all?

aritger avatar Sep 06 '23 22:09 aritger

I opened a separate, related issue: https://github.com/NVIDIA/open-gpu-kernel-modules/issues/791

josephtingiris avatar Feb 21 '25 15:02 josephtingiris

Thanks for the post, @josephtingiris. That definitely looks interesting, but integration with specific distribution package managers (like rpm) and integration with dkms are beyond the scope of the open-gpu-kernel-modules repo, which is distribution-agnostic. I wonder if something like the script above would better belong as part of the Fedora+nvidia dkms integration?

aritger avatar Feb 21 '25 16:02 aritger

Thanks for the post, @josephtingiris. That definitely looks interesting, but integration with specific distribution package managers (like rpm) and integration with dkms are beyond the scope of the open-gpu-kernel-modules repo, which is distribution-agnostic. I wonder if something like the script above would better belong as part of the Fedora+nvidia dkms integration?

I'm not sure and posting over there, too. There are a tremendous number of threads with issues related to upgrades of nvidia-open. Fundamentally, I think the issue starts here with KERNEL_UNAME in the Makefile

Also, the rpm packages I'm using are Nvidia's (not Fedora's) & documented here

Who builds these?

https://developer.download.nvidia.com/compute/cuda/repos/fedora41/x86_64/

josephtingiris avatar Feb 21 '25 17:02 josephtingiris

Thanks, I'll pass that along to the CUDA packaging folks.

For purposes of the open-gpu-kernel-modules github repo: we're definitely open to suggestions, but it seems difficult to infer, in a distribution-agnostic way, which kernel to target. Defaulting to the currently running kernel, and providing a mechanism to override for a different kernel, is probably the best the Makefile can do, I fear.

aritger avatar Feb 21 '25 17:02 aritger

@josephtingiris replied here, will be fixed in the new 570 builds: https://github.com/NVIDIA/yum-packaging-nvidia-driver/issues/10#issuecomment-2676145046

scaronni avatar Feb 22 '25 10:02 scaronni

Contained in the latest 570 builds. This can be closed.

scaronni-nvidia avatar Mar 06 '25 08:03 scaronni-nvidia