cub
cub copied to clipboard
Performance regression in 8-bit HistogramRange between 1.3.2B and 1.8.0
FYI, HistogramRange is about half the performance for 8 bit data in 1.8.0 as was 1.3.2B, but everything else is about twice as fast.
V100 on Cuda 9.1 with an updated driver that supports Volta.
Marking as unverified, since we'll need to check this again after #208.