mlx icon indicating copy to clipboard operation
mlx copied to clipboard

[WIP] qqmm

Open nastya236 opened this issue 5 months ago • 0 comments

This is still a draft! But this adds a new op mx::qqmm and a new primitive mx::DualQuantizedMatmul (naming is questionable).

At the moment, the implementation only supports the configuration where both inputs are quantized in the same way (this is also the only configuration supported by cublas). The output type is fixed to bf16.

There are some restructuring to ops and cublas utils.

Todo:

  • batching logic in CublasQQMM
  • bias and case when c is not nullptr
  • not sure but we probably want mx::qqmm to return quantized output
  • CublasQQMM also should be cleaned
  • jvp, vjp, vmap

nastya236 avatar Nov 18 '25 18:11 nastya236