[Question] Is there any plans to support fp8 batched matmul (`_scaled_bmm`)

Open leeeizhang opened this issue 1 year ago • 0 comments

The currently fp8 kernel can only support standard matmul (A~[M, K], B~[K, N]), however, MoE usually implemented as a batched matmul (A~[B, M, K], [B, K, N]) where B is the size of expert, and the current fp8 linear does not support this batched matmul. So I am wondering is there any plan to support the fp8 batched matmul?

Aug 30 '24 08:08 leeeizhang