Christian Sigg
Christian Sigg
I think [cpp11.cu](https://github.com/NVIDIA/cutlass/blob/6e60b9b17c5e6734488dbb7401b5c55ccb37feba/test/unit/core/cpp11.cu#L76) should be comparing against (from https://gcc.gnu.org/onlinedocs/cpp/Standard-Predefined-Macros.html) `201103L`. Although I vaguely remember that with a newer compiler, it can be difficult to test old standard compatibility. So maybe...
https://github.com/llvm/llvm-project/commit/4ad696231bc7d398c0f4430e60cfc6ab4c7e880e
NVIDIA is implementing an optimization to pass the LHS operand of WGMMA ops in register. This allows element-wise prologues to pass the intermediate result directly to WGMMA without writing it...