ao
ao copied to clipboard
[Question] Is there any plans to support fp8 batched matmul (`_scaled_bmm`)
The currently fp8 kernel can only support standard matmul (A~[M, K], B~[K, N]), however, MoE usually implemented as a batched matmul (A~[B, M, K], [B, K, N]) where B is the size of expert, and the current fp8 linear does not support this batched matmul. So I am wondering is there any plan to support the fp8 batched matmul?