cuda : add half2 __shfl_xor() for ROCm 5.5

Open Engininja2 opened this issue 1 year ago • 0 comments

__shfl_xor() for half2 was added in ROCm 5.6. This PR implements it for HIP versions less than that. Fixes #7242

May 13 '24 18:05 Engininja2

Nvidia GPU

review complexity : medium