[Feature]: Better preprocessor macros to detect RDNA/CDNA family at compile time
Suggestion Description
As new instructions/features are added with each new arch, it is useful to know the target architecture at compile time to employ separate code paths. For example: FP64 MFMA was added in CDNA2, so CDNA2 and later can use one code path while CDNA1 uses a different code path.
It gets tedious because all the archs need to be enumerated, and code needs to be updated as new archs become available:
#if __gfx940__ || __gfx941__ || __gfx942__
// Code path for CDNA3
#elif __gfx90a__
// Code path for CNDA2
#elif __gfx908__
// Code path for CDNA1
#endif
It would be nice if we had something like:
#if CDNA_VERSION >= 3
// Code path for CDNA3 and later
#elif CDNA_VERSION >= 2
// Code path for CDNA2
#else
// Code path for CDNA1
#endif
This would mirror the way it is done in CUDA:
__device__ func()
{
#if __CUDA_ARCH__ >= 800
// Code path for compute capability 8.x and later
#elif __CUDA_ARCH__ >= 700
// Code path for compute capability 7.x
#else
// Code path for compute capability < 7.0
#endif
}
Operating System
No response
GPU
No response
ROCm Component
No response
There are some concerns about introducing a macro for CDNA version.
Using #if __has_builtin may be a better way to determine whether a feature is available (https://clang.llvm.org/docs/LanguageExtensions.html#feature-checking-macros). It works for all GPUs, even for future generations.