tiny-llm
tiny-llm copied to clipboard
feat: implement quantized_matmul with typed CPU implementation, supporting for dispatching different precisions
- Add complete quantized_matmul_impl_typed template function for CPU (float16, float32, and bfloat16).
- Add fp32 test cases for quantized_matmul.
- Relax float32 tolerance in test utils.