cuda-python icon indicating copy to clipboard operation
cuda-python copied to clipboard

CUDA Python: Performance meets Productivity

Results 261 cuda-python issues
Sort by recently updated
recently updated
newest added

- [ ] Users can set `cudaLaunchAttributeProgrammaticStreamSerialization` to do PDL. - [ ] PDL launches are graph-compatible and this use case should be tested and showcased

enhancement
P1
cuda.core

P0: - [cudaMemcpyBatchAsync](https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__MEMORY.html#group__CUDART__MEMORY_1g6126baf5d881835091c59e48890d6854) P1: - [cudaMemDiscardBatchAsync](https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__MEMORY.html#group__CUDART__MEMORY_1g5acb7cea41bb9115f10568cc8176f51f) - [cudaMemPrefetchBatchAsync](https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__MEMORY.html#group__CUDART__MEMORY_1ge4fa23c9a26c6e5e702cbe35d001d589) - [cudaMemDiscardAndPrefetchBatchAsync](https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__MEMORY.html#group__CUDART__MEMORY_1g0f6d2e27d8f00ee78c5d814f45500605) https://docs.nvidia.com/cuda/cuda-programming-guide/03-advanced/advanced-host-programming.html#batched-memory-transfers

feature
cuda.core

This task should cover updating both whole graphs and individual graph nodes.

triage
feature
cuda.core

The fun part would be: How to keep a generic Python object alive? https://docs.nvidia.com/cuda/cuda-programming-guide/04-special-topics/cuda-graphs.html#cuda-user-objects

triage
feature
cuda.core

For example, we released `cuda-bindings` and `cuda-python` 13.1.0 yesterday, but we did not add `13.1.0-notes.rst` to https://github.com/NVIDIA/cuda-python/tree/main/cuda_python/docs/source/release.

bug
triage
P0
CI/CD

Currently this is low priority because there is no such thing like "libtile", only `tileiras` which is an executable. We prefer in-process compilation through compiler libraries over subprocess calls to...

P1
feature
cuda.core
blocked

Tracking the failure below. xref: https://github.com/NVIDIA/cuda-python/pull/1242#issuecomment-3545628920 All details are in the full logs: [qa_bindings_windows_2025-11-18+102913_build_log.txt](https://github.com/user-attachments/files/23611948/qa_bindings_windows_2025-11-18%2B102913_build_log.txt) [qa_bindings_windows_2025-11-18+102913_tests_log.txt](https://github.com/user-attachments/files/23611951/qa_bindings_windows_2025-11-18%2B102913_tests_log.txt) The only non-obvious detail: `C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0` was installed from `cuda_13.0.1_windows.exe` **EDIT:** The...

bug
triage
cuda.core

Capturing feedbacks provided by @xiakun-lu offline. The NCCL team noticed that `uv sync` complains `nccl4py[cu12]` and `nccl4py[cu13]` are incompatible (`uv venv && uv pip install -e .` works out of...

support
triage
cuda.bindings
cuda.core

Instead of relying on stream capturing, which is considered an implementation detail (that in the future we could allow users to opt in or out), our graph builder APIs were...

enhancement
triage
cuda.core