gtensor
gtensor copied to clipboard
Enable gt half precision types for HIP
Uses 16 bit FP types from HIP headers (<hip/hip_fp16.h>, <hip/hip_bf16.h>) when CUDA headers not available.
BF16 tests pass on AMD MI300A when built with module rocm/6.3 loaded via
cmake -S . -B build-hip -DCMAKE_INSTALL_PREFIX=build-hip -DGTENSOR_DEVICE=hip -DBUILD_TESTING=ON -DGTENSOR_ENABLE_BF16=ON -DCMAKE_CXX_COMPILER=$(which hipcc)
cmake --build build-hip --target install
(FP16 tests pass analogously, when built with -DGTENSOR_ENABLE_FP16=ON)