cuda-python
cuda-python copied to clipboard
[FEA]: Add improved latency test for cuda.bindings benchmarks, add C++ comparison
Is this a duplicate?
- [x] I confirmed there appear to be no duplicate issues for this request and that I agree to the Code of Conduct
Area
cuda.bindings
Is your feature request related to a problem? Please describe.
With the new release of cuda.bench using nvbench as a backing, we want to utilize the statistical models it employs to more accurately bench the runtime/latency/overhead of our binding calls specifically. Adding C++ comparisons will allow allow quick comparision of cuda.bindings overall performance
Describe the solution you'd like
/benchmarks/ folder that has identical benchmarks of CUDA API's in python through bindings, as well as the raw C++ functions. NVBench wrapping both of these to generate json files that can be compared to view latency differences.
Describe alternatives you've considered
Current implementation uses pytest, which does not offer the same granularity of nvbench.
Additional context
No response