ltqin

Results 8 issues of ltqin

Bare minimum batched multihead attention backward kernel. Many missing functionalities: - ~alpha(QK) scaling~ **implemented** - ~masking~ **implemented** - ~dropout~ **implemented** Some quirks that need to be ironed out too. Eg:...

WIP

when compiling branch fp16_transfer_to_bf16 all tests, the compiler give some error info. `fatal error: error in backend: SmallVector unable to grow. Requested capacity (4294967296) is larger than maximum value for...

bug

Compiling test_gemm_fp64 in branch add_mfma_f64 on rocm5.1, I get error result. but compiling with rocm 9110. it can get right result. I record this issue in ticket: https://ontrack-internal.amd.com/browse/SWDEV-335738

bug

1.K padding 2.N padding