AMDMIGraphX
AMDMIGraphX copied to clipboard
Remove `qlinear_reused` matcher and instead fuse MLIR `quant_dot` with base pointwise operators
- There's an accuracy error in resulting from the
qlinear_reusedmatcher insimplify_qdq.- Note that the other half of the quantized resnet50 accuracy issue was from a disconnect between rocMLIR and MIGX on handling the zero-point subtraction precision.
- The intent of the
qlinear_reusedmatcher was to merge more operations by making it such that an intermediate result is not used multiple times. - The accuracy problem came from the fact that the matcher immediately dequantizes a quantized result to get around the previous reuse.
- If we're instead able to do input pointwise fusions to
quant_convwe should be able to get around the issue entirely.
The problem is that we would now output fp16 instead of int8. We should try to re-enable this matcher. Of course, there is accuracy loss from quantization, but we would have the same issue if we quantized the bias. Perhaps there is a better choice of scales in order to improve the accuracy for these cases.
Closing, we do pass verify accuracy with MLIR's update and testing with the program mentioned in #2949 and a couple of different random seeds I tried.