cuda-python issues

Support programatic dependent launch (PDL)

- [ ] Users can set `cudaLaunchAttributeProgrammaticStreamSerialization` to do PDL. - [ ] PDL launches are graph-compatible and this use case should be tested and showcased

leofang

enhancement

P1

cuda.core

Support batched memory movement

P0: - [cudaMemcpyBatchAsync](https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__MEMORY.html#group__CUDART__MEMORY_1g6126baf5d881835091c59e48890d6854) P1: - [cudaMemDiscardBatchAsync](https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__MEMORY.html#group__CUDART__MEMORY_1g5acb7cea41bb9115f10568cc8176f51f) - [cudaMemPrefetchBatchAsync](https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__MEMORY.html#group__CUDART__MEMORY_1ge4fa23c9a26c6e5e702cbe35d001d589) - [cudaMemDiscardAndPrefetchBatchAsync](https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__MEMORY.html#group__CUDART__MEMORY_1g0f6d2e27d8f00ee78c5d814f45500605) https://docs.nvidia.com/cuda/cuda-programming-guide/03-advanced/advanced-host-programming.html#batched-memory-transfers

leofang

feature

cuda.core

CUDA graph phase N - graph updates

This task should cover updating both whole graphs and individual graph nodes.

leofang

triage

feature

cuda.core

CUDA graph phase N - Support child graphs

leofang

triage

feature

cuda.core

CUDA graph phase N - CPU callbacks & user objects

The fun part would be: How to keep a generic Python object alive? https://docs.nvidia.com/cuda/cuda-programming-guide/04-special-topics/cuda-graphs.html#cuda-user-objects

leofang

triage

feature

cuda.core

CI: The release workflow should check if the versioned release note is missing

For example, we released `cuda-bindings` and `cuda-python` 13.1.0 yesterday, but we did not add `13.1.0-notes.rst` to https://github.com/NVIDIA/cuda-python/tree/main/cuda_python/docs/source/release.

leofang

bug

triage

P0

CI/CD

`cuda.core.Program`: Support Tile IR

Currently this is low priority because there is no such thing like "libtile", only `tileiras` which is an executable. We prefer in-process compilation through compiler libraries over subprocess calls to...

leofang

P1

feature

cuda.core

blocked

`test_vmm_allocator_policy_configuration` failure: Windows / A6000 / WDDM

8

Tracking the failure below. xref: https://github.com/NVIDIA/cuda-python/pull/1242#issuecomment-3545628920 All details are in the full logs: [qa_bindings_windows_2025-11-18+102913_build_log.txt](https://github.com/user-attachments/files/23611948/qa_bindings_windows_2025-11-18%2B102913_build_log.txt) [qa_bindings_windows_2025-11-18+102913_tests_log.txt](https://github.com/user-attachments/files/23611951/qa_bindings_windows_2025-11-18%2B102913_tests_log.txt) The only non-obvious detail: `C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0` was installed from `cuda_13.0.1_windows.exe` **EDIT:** The...

rwgk

bug

triage

cuda.core

Update `pyproject.toml` for `uv`

1

Capturing feedbacks provided by @xiakun-lu offline. The NCCL team noticed that `uv sync` complains `nccl4py[cu12]` and `nccl4py[cu13]` are incompatible (`uv venv && uv pip install -e .` works out of...

leofang

support

triage

cuda.bindings

cuda.core

CUDA graph phase N - explicit graph construction

Instead of relying on stream capturing, which is considered an implementation detail (that in the future we could allow users to opt in or out), our graph builder APIs were...

leofang

enhancement

triage

cuda.core

cuda-python
cuda-python copied to clipboard

Metadata

Support programatic dependent launch (PDL)

Support batched memory movement

CUDA graph phase N - graph updates

CUDA graph phase N - Support child graphs

CUDA graph phase N - CPU callbacks & user objects

CI: The release workflow should check if the versioned release note is missing

`cuda.core.Program`: Support Tile IR

`test_vmm_allocator_policy_configuration` failure: Windows / A6000 / WDDM

Update `pyproject.toml` for `uv`

CUDA graph phase N - explicit graph construction

← Metadata

Owner

Metadata

cuda-python cuda-python copied to clipboard

Metadata

← Metadata

Owner

Metadata

cuda-python
cuda-python copied to clipboard