cuda-python
cuda-python copied to clipboard
Design of graph support - Phase 1
A few pointers to consider when we design this:
- https://github.com/pytorch/pytorch/pull/130386
- https://github.com/pytorch/pytorch/pull/137318
- https://github.com/cupy/cupy/pull/8615
- https://github.com/numba/numba/pull/4182
Discussed internally. With all things considered will take a multi-phase approach to iteratively enhance the CUDA graph coverage. Below is the phase-1 design considerations:
- Only cover stream capture (no explicit graph construction)
- Exclude memory allocation/deallocation steps, and assume when entering the capturing context all needed memory are already allocated by the user
- Exclude host callback operations
- Basic coverage for conditional nodes
- The resulting graph should be replay-able, meaning
- user objects' lifetimes are properly managed
- ...
Design is being wrapped up with a prototype (#455). Moving this to beta 4.