cuda-python icon indicating copy to clipboard operation
cuda-python copied to clipboard

EPIC: Path finder for CUDA components

Open leofang opened this issue 11 months ago • 12 comments

In 2025, there are many ways of installing CUDA to a Python environment. One key challenge here is that all header/library search logics implemented in the existing CUDA-enabled libraries (ex: #447) need to be modernized, taking into account that CUDA these days can be installed on a per-component basis (ex: I just want NVRTC and CCCL and nothing else). The consequence is that any prior arts that rely on checking if a certain piece exists (ex: nvcc, cuda.h, nvvm, ...) and generalizing it to assume the whole Toolkit exists based on known relative paths are no longer valid. Even Linux system package managers may not always behave as expected. (Though setting CUDA_HOME/CUDA_PATH as a fallback might still be OK.)

The CUDA Python team is well-positioned to take on the pain points so that all other Python libraries do not need to worry about packaging sources, layouts, and so on. It is our intention to support modern CUDA packages and deployment options in a JIT-compilation friendly way. What this means is that we should be able to return, on a per-component basis,

  • where are the component headers?
  • where are the component shared libraries?
  • ...

Something like (API design TBD)

from cuda.core.utils import CUDALocater

locater = CUDALocater()
nvcc_incl = locater.nvcc.include  # returns a list of valid abs paths to the include directories, or None 
cccl_incl = locater.cccl.include  # returns a list of valid abs paths to the include directories, or None
nvrtc_lib = locater.nvrtc.lib     # returns a list of valid abs paths to the shared libraries, or None 
...

This needs to cover

  • CUDA installed via various package managers (apt, yum, conda, pip, ...)
  • Headers and shared libraries as bare minimum
    • From JIT compilation perspective, headers are considered a kind of shared libraries
  • Linux and Windows
  • Default system search paths, if possible
    • This includes the "legacy" CTK locations, such as /usr/local/cuda on Linux, as a fallback
  • All CTK components relevant to Python users, such as:
    • nvcc/nvvm
      • this includes libdevice.bc
    • nvrtc
    • nvjitlink
    • cublas
    • cusolver
    • curand
    • cufft
    • cusparse
    • ...

Once completed, this would also help us unify the treatment of loading shared libraries in cuda.bindings, which is currently divergent between Linux/Windows:

  • Linux: hack RPATH and rely on dynamic loader (ld.so)
  • Windows: search possible DLL locations (site-packages, ...)

leofang avatar Feb 14 '25 05:02 leofang

cc @rwgk

leofang avatar Feb 14 '25 05:02 leofang

I expect our path finder is enough for these projects to drop the following code

  • https://github.com/NVIDIA/nvmath-python/blob/073b168ac0688fa3b84caaa8bb65948bf7db7eae/nvmath/_utils.py#L81-L350
    • it has been the intention that cuda-python handles this for nvmath, FWIW (cc @samaid for vis)
  • https://github.com/NVIDIA/numba-cuda/blob/main/numba_cuda/numba/cuda/cuda_paths.py (cc @gmarkall for vis)
  • https://github.com/cupy/cupy/blob/2bd9e67d2cd0c588def35632b37dd88a28d609f2/cupy/_environment.py#L18-L245

leofang avatar Feb 18 '25 18:02 leofang

Another question we need to answer is: In which module (binding or core) should we place the path finder? This seems like a high-level pythonic helper that is suitable for cuda.core, but cuda.bindings would need the same info for loading modules if we pull this off. I don't have an answer.

leofang avatar Feb 18 '25 20:02 leofang

(discussed offline, tentatively slate this for beta 3, with the understanding that we might not make it)

leofang avatar Feb 18 '25 20:02 leofang

cc @cryos for vis (since you're also working on wheels)

leofang avatar Feb 19 '25 04:02 leofang

Tracking more relevant links to code that we want to offer an alternative for:

  • https://github.com/NVIDIA/nvmath-python/blob/073b168ac0688fa3b84caaa8bb65948bf7db7eae/nvmath/bindings/_internal/cusparse_windows.pyx#L295-L324
  • https://pypi.org/project/cupti-python/

leofang avatar Feb 26 '25 22:02 leofang

The consequence is that any prior arts that rely on checking if a certain piece exists (ex: nvcc, cuda.h, nvvm, ...) and generalizing it to assume the whole Toolkit exists based on known relative paths are no longer valid. Even Linux system package managers may not always behave as expected.

Keith gave an expanded explanation on what's described in the epic body: https://github.com/NVIDIA/cuda-python/issues/441#issuecomment-2714804149.

leofang avatar Mar 12 '25 00:03 leofang

Tracking a related numba.cuda PR, for easy reference: https://github.com/NVIDIA/numba-cuda/pull/155

rwgk avatar Mar 13 '25 15:03 rwgk

@leofang @kkraus14

  • I expanded my experiment under #447 to move the entire numba/cuda/cuda_paths.py — not just the part that locates libnvvm — into cuda-bindings. It turns out to be very easy.

  • I illustrated the approach here.

It seems very straightforward to me. It'd be great to discuss.

rwgk avatar Mar 20 '25 05:03 rwgk

As of 2025-04-15 (https://github.com/NVIDIA/cuda-python/pull/558/commits/808074d5e14d9630ba241680b297373d1e69f187):

These .so files exist under /usr/local/cuda-12.8/ (Linux x86_64 CTK 12.8.1) but are not supported by cuda.bindings.path_finder:

/usr/local/cuda-12.8/version.json
   "cuda" : {
      "name" : "CUDA SDK",
      "version" : "12.8.1"
   }
/usr/local/cuda-12.8/lib64/
    libaccinj64.so
    libcheckpoint.so
    libcuinj64.so
    libcupti.so
    libnvperf_host.so
    libnvperf_target.so
    libnvToolsExt.so
    libOpenCL.so
    libpcsamplingutil.so

These Windows .dll files are under https://developer.download.nvidia.com/compute/cuda/redist/ but are not supported by cuda.bindings.path_finder:

cuinj64_128.dll
cuinj64_126.dll
cuinj64_125.dll
cuinj64_124.dll
cuinj64_123.dll
cuinj64_122.dll
cuinj64_121.dll
cuinj64_120.dll
cuinj64_118.dll
cuinj64_117.dll
cuinj64_116.dll
cuinj64_115.dll
cuinj64_114.dll

rwgk avatar Apr 15 '25 17:04 rwgk

I'm familiarizing myself with the content of https://developer.download.nvidia.com/compute/cuda/redist/ (to learn what .so and .dll files we have).

A small side product:

cuda/redist Matrix

component 11.0.3 11.1.1 11.2.0 11.2.1 11.2.2 11.3.0 11.3.1 11.4.0 11.4.1 11.4.2 11.4.3 11.4.4 11.5.0 11.5.1 11.5.2 11.6.0 11.6.1 11.6.2 11.7.0 11.7.1 11.8.0 12.0.0 12.0.1 12.1.0 12.1.1 12.2.0 12.2.1 12.2.2 12.3.0 12.3.1 12.3.2 12.4.0 12.4.1 12.5.0 12.5.1 12.6.0 12.6.1 12.6.2 12.6.3 12.8.0 12.8.1
cuda_cccl
cuda_compat
cuda_cudart
cuda_cuobjdump
cuda_cupti
cuda_cuxxfilt
cuda_demo_suite
cuda_documentation
cuda_gdb
cuda_memcheck
cuda_nsight
cuda_nvcc
cuda_nvdisasm
cuda_nvml_dev
cuda_nvprof
cuda_nvprune
cuda_nvrtc
cuda_nvtx
cuda_nvvp
cuda_opencl
cuda_profiler_api
cuda_sandbox_dev
cuda_sanitizer_api
driver_assistant
fabricmanager
imex
libcublas
libcudla
libcufft
libcufile
libcurand
libcusolver
libcusparse
libnpp
libnvfatbin
libnvidia_nscq
libnvjitlink
libnvjpeg
libnvsdm
libnvvm_samples
nsight_compute
nsight_nvtx
nsight_systems
nsight_vse
nvidia_driver
nvidia_fs
release_date
release_label
release_product
visual_studio_integration

rwgk avatar Apr 16 '25 22:04 rwgk

Visual overview of shared library dependencies (GraphViz)

These were generated with:

rwgk avatar Apr 16 '25 22:04 rwgk

Tracking a key insight for easy future reference:

I've verified that all CUDA libraries in version 12.8.1 (x86_64) have their SONAME set (see tiny script below).​

Assuming this is the case for all 12.x releases, and future releases, this means we can reliably check if a shared library is already loaded by using the known SONAMEs​, e.g.:

import ctypes
import os

try:
    handle = ctypes.CDLL("libnvvm.so.4", mode=os.RTLD_NOLOAD)
    print("Library is already loaded.")
except OSError:
    print("Library is not loaded yet.")

According to ChatGPT, "this method is effective for standard system libraries and well-maintained third-party libraries that follow proper versioning practices.​"

Full ChatGPT chat (very long)

Script used to inspect SONAMES under /usr/local/cuda:

find_sonames.sh:

#!/bin/bash
find . -type f -name '*.so*' -print0 | while IFS= read -r -d '' f; do
  type=$(test -L "$f" && echo SYMLINK || echo FILE)
  soname=$(readelf -d "$f" 2>/dev/null | awk '/SONAME/ {gsub(/[][]/, "", $5); print $5; exit}')
  echo "$f $type ${soname:-SONAME_NOT_SET}"
done

rwgk avatar Apr 23 '25 15:04 rwgk

Summary of .so files that do NOT have SONAME set, in these releases:

cuda_11.0.3_450.51.06_linux.run
cuda_11.1.1_455.32.00_linux.run
cuda_11.2.2_460.32.03_linux.run
cuda_11.3.1_465.19.01_linux.run
cuda_11.4.4_470.82.01_linux.run
cuda_11.5.1_495.29.05_linux.run
cuda_11.6.2_510.47.03_linux.run
cuda_11.7.1_515.65.01_linux.run
cuda_11.8.0_520.61.05_linux.run
cuda_12.0.1_525.85.12_linux.run
cuda_12.1.1_530.30.02_linux.run
cuda_12.2.2_535.104.05_linux.run
cuda_12.3.2_545.23.08_linux.run
cuda_12.4.1_550.54.15_linux.run
cuda_12.5.1_555.42.06_linux.run
cuda_12.6.2_560.35.03_linux.run
cuda_12.8.0_570.86.10_linux.run

The first number is the count across all releases:

$ cat soname_not_set_110_through_128.txt
     17 eclipse_1605.so
     21 libbradient.so
      4 libdmabuf-server.so
      4 libdrm-egl-server.so
     21 libfullscreen-shell-v1.so
     30 libGL.so.1.5.0
     21 libivi-shell.so
     19 libqcertonlybackend.so
     34 libqgif.so
     34 libqico.so
     34 libqjpeg.so
     25 libqoffscreen.so
     19 libqopensslbackend.so
     34 libqsvg.so
     34 libqtga.so
     34 libqtiff.so
     21 libqt-plugin-wayland-egl.so
     21 libqwayland-egl.so
     21 libqwayland-generic.so
      4 libqwayland-xcomposite-egl.so
      4 libqwayland-xcomposite-glx.so
     34 libqwbmp.so
     34 libqxcb-glx-integration.so
     34 libqxcb.so
     21 libshm-emulation-server.so
     21 libvulkan-server.so
     17 libwl-shell-plugin.so
      4 libwl-shell.so
      4 libxcomposite-egl.so
      4 libxcomposite-glx.so
     21 libxdg-shell.so
      4 libxdg-shell-v5.so
      4 libxdg-shell-v6.so
      5 _ncu_report.so
     10 _sqlite3.cpython-310-x86_64-linux-gnu.so
      4 _sqlite3.cpython-312-x86_64-linux-gnu.so

Commands used:

cd extracted
find_sonames.sh > ../all_SONAME.txt
grep 'FILE SONAME_NOT_SET' all_SONAME.txt | grep -v /cuda_documentation/ | rev | cut -d/ -f1 | rev | sed 's/ FILE SONAME_NOT_SET$//' | sort | uniq -c

NOTE: The extracted CTK directories have no symlinks (unlike "installed" CTK directories).

rwgk avatar Apr 23 '25 16:04 rwgk

The bulk of the work is largely done now. Let me close this issue and the remaining tasks can be tracked individually, with the cuda.pathfinder label.

leofang avatar Sep 26 '25 02:09 leofang