mrbean
mrbean
@ChainYo I would like to get started on the ONNX Config for DeBERTaV2!
@philschmid @JingyaHuang should we put this in? Feels like a bug
@philschmid mind re-reviewing?
By setting the format to QDQ you can get around this but for the QOperator format seems broken.
@philschmid I have raised the concerns in the issue above
@philschmid I am also tracking in https://github.com/microsoft/onnxruntime/issues/12133 and https://github.com/microsoft/onnxruntime/issues/12173 but it is becoming unclear if the issue is truly there or if Optimum is creating a quantized model that can...
@philschmid I am also seeing this weird behavior https://github.com/NVIDIA/TensorRT/issues/2146. I thought this was an oddity of TensorRT but it seems like the same thing is happening when I use your...
@lewtun @michaelbenayoun @JingyaHuang what do you think?
pinging @echarlaix and @JingyaHuang once again. This is a blocker for quantizing very large models so would love to see this go in!
@lewtun I saw there were some tests around static quantization in `tests/test_optimization.py` so I put a unit test in there. That work for you?