[WIP] qqmm

Open nastya236 opened this issue 5 months ago • 0 comments

This is still a draft! But this adds a new op mx::qqmm and a new primitive mx::DualQuantizedMatmul (naming is questionable).

At the moment, the implementation only supports the configuration where both inputs are quantized in the same way (this is also the only configuration supported by cublas). The output type is fixed to bf16.

There are some restructuring to ops and cublas utils.

Todo:

batching logic in CublasQQMM
bias and case when c is not nullptr
not sure but we probably want mx::qqmm to return quantized output
CublasQQMM also should be cleaned
jvp, vjp, vmap

Nov 18 '25 18:11 nastya236