How to quantize specific layers?
I have a trained onnx model that needs to be quantized to INT8. But I want my last fully connected layers are still in FP32 or FP16. So how can I choose specific layers to quantize (or not to quantize)?
PS when I was working with NNCF, I just use parametr ignored_scopes. Maybe is there something similar here at Workbench?
Hi,
Thank you for getting in contact.
DL Workbench uses POT for the quantization. It seems that POT does not provide the layer ignoring option as NNCF does.
You can use the Jupyter Notebooks in DL Workbench for the more fine-grained quantization and inference: in the notebooks, there is a possibility of using your own parameters with the OpenVINO tools (in your case, for the quantization); additionally, it would be possible to install any required tools (in your case, NNCF) and use alongside the rest of the OpenVINO packages.
Should you have any more questions, please do contact us.