ParallelReductionsBenchmark icon indicating copy to clipboard operation
ParallelReductionsBenchmark copied to clipboard

Add Python benchmarks for the new CUDA DSL/JIT

Open ashvardanian opened this issue 6 months ago • 2 comments

Now that CCCL v3 can be used for efficient parallel reductions in Python it would be great to create an additional benchmark file - reduce_bench.py with Python-ic JIT-ed kernels for parallel reductions, showcasing the impact of different hyper-parameters on the result.

ashvardanian avatar Jul 22 '25 14:07 ashvardanian