cuda-python
cuda-python copied to clipboard
CUDA Python: Performance meets Productivity
~~Blocked by #459 & https://github.com/NVIDIA/cuda-python/issues/439#issuecomment-2673234572.~~ Before this PR: ```python In [7]: %timeit Device() 622 ns ± 1.17 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)...
> Indeed we're duplicating some of the information. It might take a few iterations to consolidate the two files and have something presentable on both PyPI and GitHub. Let's address...
As discussed offline, the search logic here https://github.com/NVIDIA/cuda-python/blob/19563d59dd3f349e365fb57f3d95f1e7af649ad9/cuda_bindings/cuda/bindings/_internal/nvjitlink_windows.pyx#L57-L87 does not look for conda installations , which has the following path: ``` %CONDA_PREFIX%\Library\nvvm\lib\x64\nvvm.lib ``` One way to fix it is to...
no-cache, for example, is not supported when using an environment with cuda 12.4 . Find a compatibility matrix and apply it to supported options when users construct a LinkerOptions instance....
Markdown files are harder to cross reference, and not possible to reference Sphinx objects
We support - Python scalars - NumPy scalars - ctypes scalars These should all be tested in `test_launcher.py`.
We wanted to do this but it seems so far we've only covered `Stream.from_handle`. We want to also cover these objects: - `Program` - `ObjectCode` - `Kernel` - `Buffer`
- Support ctypes/numpy structs - make sure ctypes is deprioritized - Support converting arbitrary objects to `StridedMemoryView` - Benchmarking - measure `launch()` overhead - reimplement type dispatcher via dict lookup...
> This is a bit trickier than I thought, because we also need the dict key "old"/"new" as a proxy to prepare for `args` (which is different for `cuModuleLoadDataEx`/`cuLibraryLoadData`). Let's...
Currently it takes 3.4 - 3.45 us (depending on stream-ordering or not) to create a memory view object: ```python In [4]: x = cp.empty((23, 4)) In [7]: %timeit s =...