[ENHANCEMENT]: Get rid of of custom atomic operations once CCCL 2.4 is ready

Open PointKernel opened this issue 1 year ago • 0 comments

Is your feature request related to a problem? Please describe.

The current cuco implementations use custom atomic functions, e.g. https://github.com/NVIDIA/cuCollections/blob/1c8b92074d9a0d07ff9288626c22ab4f5fb9d6ad/include/cuco/detail/open_addressing/open_addressing_ref_impl.cuh#L904-L936 due to a performance regression with cuda::atomic_ref (https://github.com/NVIDIA/cccl/issues/1008). With the fix being merged into the main branch, we can get rid of those custom functions once CCCL 2.4 is fetched by rapids-cmake

Describe the solution you'd like

Replace https://github.com/NVIDIA/cuCollections/blob/1c8b92074d9a0d07ff9288626c22ab4f5fb9d6ad/include/cuco/detail/open_addressing/open_addressing_ref_impl.cuh#L905 https://github.com/NVIDIA/cuCollections/blob/1c8b92074d9a0d07ff9288626c22ab4f5fb9d6ad/include/cuco/detail/open_addressing/open_addressing_ref_impl.cuh#L947 https://github.com/NVIDIA/cuCollections/blob/1c8b92074d9a0d07ff9288626c22ab4f5fb9d6ad/include/cuco/detail/hyperloglog/hyperloglog_ref.cuh#L525 with corresponding atomic_ref operations.

Apr 23 '24 19:04 PointKernel