Justin Stoecker
Justin Stoecker
Hello, The [awq_quantize](https://github.com/intel/neural-compressor/blob/42c2def02e128818f19d8342052ab0544e9623f7/neural_compressor/adaptor/ox_utils/weight_only.py#L703) function [collects the names of input tensors to each MatMul node](https://github.com/intel/neural-compressor/blob/42c2def02e128818f19d8342052ab0544e9623f7/neural_compressor/adaptor/ox_utils/weight_only.py#L758-L764), and later [looks up the parent node that produces the named tensor](https://github.com/intel/neural-compressor/blob/42c2def02e128818f19d8342052ab0544e9623f7/neural_compressor/adaptor/ox_utils/weight_only.py#L783). This assumes the tensors...
When implementing a quantized GEMM/convolution with INT8 activations and weights, it's common to also have the bias as INT32. The usual trick for adding a bias seems to be initializing...