cuda-python issues

Switch to use CUDA driver APIs in `Device` constructor

3

~~Blocked by #459 & https://github.com/NVIDIA/cuda-python/issues/439#issuecomment-2673234572.~~ Before this PR: ```python In [7]: %timeit Device() 622 ns ± 1.17 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)...

leofang

enhancement

P1

cuda.core

Consolidate `DESCRIPTION.rst` and `README.md`

3

> Indeed we're duplicating some of the information. It might take a few iterations to consolidate the two files and have something presentable on both PyPI and GitHub. Let's address...

leofang

documentation

P1

cuda.bindings

cuda.core

NVVM bindings not working on Windows + CUDA conda packages

2

As discussed offline, the search logic here https://github.com/NVIDIA/cuda-python/blob/19563d59dd3f349e365fb57f3d95f1e7af649ad9/cuda_bindings/cuda/bindings/_internal/nvjitlink_windows.pyx#L57-L87 does not look for conda installations , which has the following path: ``` %CONDA_PREFIX%\Library\nvvm\lib\x64\nvvm.lib ``` One way to fix it is to...

leofang

bug

P0

cuda.bindings

Add descriptive exception handling to linker and program options which are not supported on a CTK version basis + handle in tests

2

no-cache, for example, is not supported when using an environment with cuda 12.4 . Find a compatibility matrix and apply it to supported options when users construct a LinkerOptions instance....

keenan-simpson

enhancement

triage

P1

cuda.core

Convert all markdown files under `cuda_*/docs/` to Sphinx ReST files

2

Markdown files are harder to cross reference, and not possible to reference Sphinx objects

leofang

documentation

P1

cuda.bindings

cuda.core

Add tests to cover `cuda.core.experimental.launch()`

We support - Python scalars - NumPy scalars - ctypes scalars These should all be tested in `test_launcher.py`.

leofang

triage

P1

test

cuda.core

Implement `from_handle()` for all `cuda.core` objects

We wanted to do this but it seems so far we've only covered `Stream.from_handle`. We want to also cover these objects: - `Program` - `ObjectCode` - `Kernel` - `Buffer`

leofang

triage

P1

feature

cuda.core

`cuda.core.launch()` improvements

1

- Support ctypes/numpy structs - make sure ctypes is deprioritized - Support converting arbitrary objects to `StridedMemoryView` - Benchmarking - measure `launch()` overhead - reimplement type dispatcher via dict lookup...

leofang

enhancement

triage

P1

cuda.core

Refactor for better handling of `cuModuleLoadDataEx`/`cuLibraryLoadData`

> This is a bit trickier than I thought, because we also need the dict key "old"/"new" as a proxy to prepare for `args` (which is different for `cuModuleLoadDataEx`/`cuLibraryLoadData`). Let's...

leofang

enhancement

P2

cuda.core

Perf: Reduce `StridedMemoryView` construction time

1

Currently it takes 3.4 - 3.45 us (depending on stream-ordering or not) to create a memory view object: ```python In [4]: x = cp.empty((23, 4)) In [7]: %timeit s =...

leofang

enhancement

triage

P1

cuda.core

cuda-python
cuda-python copied to clipboard

Metadata

Switch to use CUDA driver APIs in `Device` constructor

Consolidate `DESCRIPTION.rst` and `README.md`

NVVM bindings not working on Windows + CUDA conda packages

Add descriptive exception handling to linker and program options which are not supported on a CTK version basis + handle in tests

Convert all markdown files under `cuda_*/docs/` to Sphinx ReST files

Add tests to cover `cuda.core.experimental.launch()`

Implement `from_handle()` for all `cuda.core` objects

`cuda.core.launch()` improvements

Refactor for better handling of `cuModuleLoadDataEx`/`cuLibraryLoadData`

Perf: Reduce `StridedMemoryView` construction time

← Metadata

Owner

Metadata

cuda-python cuda-python copied to clipboard

Metadata

← Metadata

Owner

Metadata

cuda-python
cuda-python copied to clipboard