AMDMIGraphX icon indicating copy to clipboard operation
AMDMIGraphX copied to clipboard

Remove `qlinear_reused` matcher and instead fuse MLIR `quant_dot` with base pointwise operators

Open CharlieL7 opened this issue 1 year ago • 1 comments

  • There's an accuracy error in resulting from the qlinear_reused matcher in simplify_qdq.
    • Note that the other half of the quantized resnet50 accuracy issue was from a disconnect between rocMLIR and MIGX on handling the zero-point subtraction precision.
  • The intent of the qlinear_reused matcher was to merge more operations by making it such that an intermediate result is not used multiple times.
  • The accuracy problem came from the fact that the matcher immediately dequantizes a quantized result to get around the previous reuse.
  • If we're instead able to do input pointwise fusions to quant_conv we should be able to get around the issue entirely.

CharlieL7 avatar Jul 11 '24 21:07 CharlieL7

The problem is that we would now output fp16 instead of int8. We should try to re-enable this matcher. Of course, there is accuracy loss from quantization, but we would have the same issue if we quantized the bias. Perhaps there is a better choice of scales in order to improve the accuracy for these cases.

pfultz2 avatar Jul 23 '24 18:07 pfultz2

Closing, we do pass verify accuracy with MLIR's update and testing with the program mentioned in #2949 and a couple of different random seeds I tried.

CharlieL7 avatar Aug 05 '24 20:08 CharlieL7