cuda-python
cuda-python copied to clipboard
`cuda.core.launch()` improvements
- Support ctypes/numpy structs
- make sure ctypes is deprioritized
- Support converting arbitrary objects to
StridedMemoryView - Benchmarking
- measure
launch()overhead - reimplement type dispatcher via dict lookup instead of conditional branching
- measure
- Support converting arbitrary objects to
StridedMemoryView
If we make this a two-step approach, where the 1st step is to simply accept passing a StridedMemoryView object to launch() as a kernel launch argument, then this would be doable once https://github.com/NVIDIA/cuda-python/issues/180#issuecomment-2546403002 is done.