XiaobingZhang
XiaobingZhang
@jerryzh168 , from https://github.com/pytorch/pytorch/blob/caa6ef15a294c96fad3bf67a10a8b4fa605080bb/torch/ao/quantization/fx/_equalize.py#L59-L70, I am confused with those code, if the scheme is per_tensor, why use PerChannelMinMaxObserver?
@pytorchbot rebase
@pytorchbot merge
The following is the FP32 performance data of conv+add which is tested on SKX-6148(test script is https://github.com/XiaobingSuper/op_bench/blob/main/conv_add.py): 1. BS=1, thread=1. input size | output channels | kernel | stride |...
@ZolotukhinM , I clear the code which is not related this PR, which can be easily reviewed for you. Thanks!
@pytorchbot merge -g
Yes, this is to be expected, there has an issue with norm reduce which doesn't use accumulate type.
Yes, it was fixed by https://github.com/pytorch/pytorch/pull/95166.
@jansel @desertfire, please help review this code again. Thanks!
@pytorchbot merge