cuda-python icon indicating copy to clipboard operation
cuda-python copied to clipboard

Stop redefining structs and types for Graphics APIs

Open vzhurba01 opened this issue 10 months ago • 5 comments

When the Driver and Runtime bindings are generated, we redefine the types from headers:

  • cudaEGL.h
  • cudaGL.h
  • cudaVDPAU.h
  • cuda_egl_interop.h
  • cuda_gl_interop.h
  • cuda_vdpau_interop.h

We do this so that source builds don't have to depend on system headers for "vdpau/vdpau.h", "OpenGL/gl.h", "EGL/egl.h" and "EGL/eglext.h". The consequence of depending on them are the build systems and users will need to install pyopengl and libvdpau-dev before proceeding with a source build.

This issue proposes that we stop redefining these types, require users to fulfill this source build dependency and use the types by importing them from the headers. This is a breaking change.

This issue is motivated from the triage of changing Runtime to statically link to libcudart_static for issue https://github.com/NVIDIA/cuda-python/issues/100. The goal for that issue is to completely remove all re-implementations of CUDA Runtime in cuda_bindings/cuda/bindings/_lib/cyruntime. The problem is that redefined types are not type-compatible with the graphics APIs in the static library, therefore a subset of these APIs will continue to be re-implemented and we won't be able to fully clean this up. The silver-lining though is that these APIs should be stable, so no maintenance cost is expected in future releases.

Making this change has another benefit. Historically, we once had all Driver types redefined but that did not allow Cython users to interoperate between the types that they import and the CUDA Python Driver types. Therefore by having graphics types get defined by the header would extend that use-case to here as well. However, no know user is utilizing this use-case.

Final note is for Windows, the VDPAU are the only headers not distributed as part of a CTK will therefore need to be removed as part of a Tempita platform check. This change is trivial and would remove APIs that would have never worked on Windows anyways (VDPAU is a Unix only library).

P.S. For future reference, one of the types needs to change to:

cdef extern from "cuda_vdpau_interop.h":
    ctypedef enum VdpStatus:
        VDP_STATUS_OK = 0
        VDP_STATUS_ERROR

    ctypedef VdpStatus (*VdpGetProcAddress)(unsigned int, unsigned int, void**)

but there were still errors to be resolved in the Python layer so this might not be the final form.

vzhurba01 avatar Mar 03 '25 23:03 vzhurba01

The problem is that redefined types are not type-compatible with the graphics APIs in the static library

I am curious, why aren't they type-compatible (and in what sense)? If the redefinition is not compatible, wouldn't it mean that there's no interoperability between static cudart and graphics APIs?

leofang avatar Mar 04 '25 00:03 leofang

It's the same as issue https://github.com/NVIDIA/cuda-python/issues/22 where Cython mangles our redefined names. Back then we did a breaking change to align all of those types with extern, so a similar situation would have to occur here.

vzhurba01 avatar Mar 04 '25 04:03 vzhurba01

Ah, I see! I had forgotten about #22 exists completely...

I wonder if it would help if we auto-generate the (unmangled) "C names" supported by Cython:

As for functions, C names can be specified for variables, structs, unions, enums, struct and union members, and enum values.

Here's an example what it could do (using cuBLASLt as example, as nvJitLink/NVVM have no interesting structs...): https://github.com/NVIDIA/nvmath-python/blob/073b168ac0688fa3b84caaa8bb65948bf7db7eae/nvmath/bindings/cycublasLt.pxd#L933-L945 This allows us to suppress Cython mangling in the cython layer. (The nvmath example is half-done, because the struct members still get mangled 😅)

leofang avatar Mar 04 '25 04:03 leofang

The problem then is that we hit a redefinition error. Our auto-generated unmangled C names redefine what is found from importing header via cdef extern from "cuda_egl_interop.h":. This import is needed so that we can use the APIs found in the static lib, and I'm not sure I see a way around that.

vzhurba01 avatar Mar 04 '25 18:03 vzhurba01

In an offline discussion we had concluded that this breaking change can only be done as part of a major release (i.e. 13.0) and the fix will not be backported to 12.x. Not even as a patch when the final minor release has been decided. Allowing for this difference in 12.x avoids causing a similar issue as https://github.com/conda-forge/cuda-python-feedstock/issues/15.

vzhurba01 avatar Mar 04 '25 20:03 vzhurba01