Remove `qlinear_reused` matcher and instead fuse MLIR `quant_dot` with base pointwise operators

Open CharlieL7 opened this issue 1 year ago • 1 comments

There's an accuracy error in resulting from the qlinear_reused matcher in simplify_qdq.
- Note that the other half of the quantized resnet50 accuracy issue was from a disconnect between rocMLIR and MIGX on handling the zero-point subtraction precision.
The intent of the qlinear_reused matcher was to merge more operations by making it such that an intermediate result is not used multiple times.
The accuracy problem came from the fact that the matcher immediately dequantizes a quantized result to get around the previous reuse.
If we're instead able to do input pointwise fusions to quant_conv we should be able to get around the issue entirely.

Jul 11 '24 21:07 CharlieL7

The problem is that we would now output fp16 instead of int8. We should try to re-enable this matcher. Of course, there is accuracy loss from quantization, but we would have the same issue if we quantized the bias. Perhaps there is a better choice of scales in order to improve the accuracy for these cases.

Jul 23 '24 18:07 pfultz2

Closing, we do pass verify accuracy with MLIR's update and testing with the program mentioned in #2949 and a couple of different random seeds I tried.

Aug 05 '24 20:08 CharlieL7