cuda-python
cuda-python copied to clipboard
CUDA graph phase N - Device-side graph launch
This task should be simple:
- Allow users to opt-in and internally pass
cudaGraphInstantiateFlagDeviceLaunchtocudaGraphInstantiate() - Add code samples to showcase how this can done
The majority of work would happen on the JIT compiler side, aka numba-cuda; cuda-core will just set up the infrastructure.