ParallelReductionsBenchmark icon indicating copy to clipboard operation
ParallelReductionsBenchmark copied to clipboard

Add Python CUDA reduction benchmark with cuda.cccl

Open AnshSinghSonkhia opened this issue 6 months ago • 2 comments

Introduces reduce_bench.py, a Python script to benchmark parallel reductions on NVIDIA GPUs using the cuda.cccl library, and updates the README with usage instructions and example output. This allows users to compare naive CuPy reductions with optimized CUDA JIT reductions from Python.


This solves #9

AnshSinghSonkhia avatar Jul 22 '25 19:07 AnshSinghSonkhia