sparseml icon indicating copy to clipboard operation
sparseml copied to clipboard

Add variable bit width support to ONNXToDeepsparse

Open KSGulin opened this issue 2 years ago • 1 comments

This PR adds support for variable-bit weight quantization in the ONNXToDeepsparse exporter. This affects two steps:

  • Conversion of intiailziers to unit8
  • Clipping in quantization of weight arrays

Test Plan Local sparsify runs with int-4 weight quantization

KSGulin avatar Jul 03 '23 14:07 KSGulin

@bfineran good callout. Updated the array quantization routine and propagated the bit_width args to address this

KSGulin avatar Jul 03 '23 16:07 KSGulin