Excluding layers from quantization
I have a question about excluding some layers from quantization. I would like to not quantize operations from one module, called warp. I obtained them using following code:
import onnx
# Load the ONNX model
onnx_model_path = 'Model.onnx'
onnx_model = onnx.load(onnx_model_path)
# Get all nodes (layers) in the graph
nodes = onnx_model.graph.node
# Filter nodes by their names, assuming the specific module layers contain 'warp' in the name
layers = [node for node in nodes if 'warp' in node.name]
layers = ['/warp/Constant', '/warp/Add', '/warp/Gather', '/warp/Constant_1', '/warp/Mul', '/warp/Constant_2', '/warp/Div', '/warp/Constant_3', '/warp/Sub', '/warp/Gather_1', '/warp/Constant_4', '/warp/Mul_1', '/warp/Constant_5', '/warp/Div_1', '/warp/Constant_6', '/warp/Sub_1', '/warp/Constant_7', '/warp/Unsqueeze', '/warp/Constant_8', '/warp/Unsqueeze_1', '/warp/Concat', '/warp/Transpose', '/warp/GridSample']
Based on them I created JSON file exclude_layers.json:
{
"activation_encodings": {
"/warp/Constant": [
{
"bitwidth": 32,
"dtype": "float"
}
],
"/warp/Add": [
{
"bitwidth": 32,
"dtype": "float"
}
],
"/warp/Gather": [
{
"bitwidth": 32,
"dtype": "float"
}
],
"/warp/Constant_1": [
{
"bitwidth": 32,
"dtype": "float"
}
],
"/warp/Mul": [
{
"bitwidth": 32,
"dtype": "float"
}
],
"/warp/Constant_2": [
{
"bitwidth": 32,
"dtype": "float"
}
],
"/warp/Div": [
{
"bitwidth": 32,
"dtype": "float"
}
],
"/warp/Constant_3": [
{
"bitwidth": 32,
"dtype": "float"
}
],
"/warp/Sub": [
{
"bitwidth": 32,
"dtype": "float"
}
],
"/warp/Gather_1": [
{
"bitwidth": 32,
"dtype": "float"
}
],
"/warp/Constant_4": [
{
"bitwidth": 32,
"dtype": "float"
}
],
"/warp/Mul_1": [
{
"bitwidth": 32,
"dtype": "float"
}
],
"/warp/Constant_5": [
{
"bitwidth": 32,
"dtype": "float"
}
],
"/warp/Div_1": [
{
"bitwidth": 32,
"dtype": "float"
}
],
"/warp/Constant_6": [
{
"bitwidth": 32,
"dtype": "float"
}
],
"/warp/Sub_1": [
{
"bitwidth": 32,
"dtype": "float"
}
],
"/warp/Constant_7": [
{
"bitwidth": 32,
"dtype": "float"
}
],
"/warp/Unsqueeze": [
{
"bitwidth": 32,
"dtype": "float"
}
],
"/warp/Constant_8": [
{
"bitwidth": 32,
"dtype": "float"
}
],
"/warp/Unsqueeze_1": [
{
"bitwidth": 32,
"dtype": "float"
}
],
"/warp/Concat": [
{
"bitwidth": 32,
"dtype": "float"
}
],
"/warp/Transpose": [
{
"bitwidth": 32,
"dtype": "float"
}
],
"/warp/GridSample": [
{
"bitwidth": 32,
"dtype": "float"
}
]
},
"param_encodings": {}
}
Finally I run snpe commands:
snpe-onnx-to-dlc -i Model.onnx -o Model.dlc --quantization_overrides exclude_layers.json
and then
snpe-dlc-quantize --input_dlc Model.dlc --input_list Inputlist_train_short.txt --use_enhanced_quantizer --use_adjusted_weights_quantizer --axis_quant --output_dlc Quant_Model.dlc --enable_htp --htp_socs sm8550 –override_params
After running snpe-onnx-to-dlc in output there is information: 2024-10-03 13:42:48,459 - 235 - INFO - Processed 0 quantization encodings. I'm not sure if it should be expected.
Additionally in visualization of Quant_Model.dlc created with snpe-dlc-viewer it tells that layers from warp module are 8 bitwidth. Also saved results (using snpe-net-run) show that these module is quantized while it shouldn't be.
Could you help me and tell what should I change to exclude chosen layers from quantization?
Refer Model-Accuracy-Mixed-Precision. This should help you with the process.
For people who would have similar problem: the problem was that I was using layer names from ONNX, instead I had to use layer names from quantized .dlc model. They can be checked in .html file obtained with snpe-dlc-viewer command, for example: "/warp/Constant_output_0": [ { "bitwidth": 16 } ], "/warp/Add_output_0": [ { "bitwidth": 16 } ],
@Piotr94, Can I know if the issue is resolved.