llama.cpp
llama.cpp copied to clipboard
cuda : add half2 __shfl_xor() for ROCm 5.5
__shfl_xor() for half2 was added in ROCm 5.6. This PR implements it for HIP versions less than that.
Fixes #7242