feat: implement quantized_matmul with typed CPU implementation, supporting for dispatching different precisions

Open Elubrazione opened this issue 3 months ago • 0 comments

Add complete quantized_matmul_impl_typed template function for CPU (float16, float32, and bfloat16).
Add fp32 test cases for quantized_matmul.
Relax float32 tolerance in test utils.

Oct 18 '25 12:10 Elubrazione