sparseml
sparseml copied to clipboard
Add variable bit width support to ONNXToDeepsparse
This PR adds support for variable-bit weight quantization in the ONNXToDeepsparse exporter. This affects two steps:
- Conversion of intiailziers to unit8
- Clipping in quantization of weight arrays
Test Plan Local sparsify runs with int-4 weight quantization
@bfineran good callout. Updated the array quantization routine and propagated the bit_width args to address this