mrbean

Results 55 comments of mrbean

@ChainYo I would like to get started on the ONNX Config for DeBERTaV2!

@philschmid @JingyaHuang should we put this in? Feels like a bug

By setting the format to QDQ you can get around this but for the QOperator format seems broken.

@philschmid I have raised the concerns in the issue above

@philschmid I am also tracking in https://github.com/microsoft/onnxruntime/issues/12133 and https://github.com/microsoft/onnxruntime/issues/12173 but it is becoming unclear if the issue is truly there or if Optimum is creating a quantized model that can...

@philschmid I am also seeing this weird behavior https://github.com/NVIDIA/TensorRT/issues/2146. I thought this was an oddity of TensorRT but it seems like the same thing is happening when I use your...

@lewtun @michaelbenayoun @JingyaHuang what do you think?

pinging @echarlaix and @JingyaHuang once again. This is a blocker for quantizing very large models so would love to see this go in!

@lewtun I saw there were some tests around static quantization in `tests/test_optimization.py` so I put a unit test in there. That work for you?