cuda-python
cuda-python copied to clipboard
Cover occupancy calculator APIs
We have them in the bindings:
- driver: https://nvidia.github.io/cuda-python/cuda-bindings/latest/module/driver.html#occupancy
- runtime: https://nvidia.github.io/cuda-python/cuda-bindings/latest/module/runtime.html#occupancy
and this issue is about cuda.core exposure.
Based on my learning from the CUTLASS team, we should be able to support this API easily (by passing Kernel and LaunchConfig): cuOccupancyMaxActiveClusters.
Greate if it's possible that the process could be simplified with the help from cuda-python!