qidk Excluding layers from quantization

I have a question about excluding some layers from quantization. I would like to not quantize operations from one module, called warp. I obtained them using following code:

import onnx

# Load the ONNX model
onnx_model_path = 'Model.onnx'
onnx_model = onnx.load(onnx_model_path)

# Get all nodes (layers) in the graph
nodes = onnx_model.graph.node

# Filter nodes by their names, assuming the specific module layers contain 'warp' in the name
layers = [node for node in nodes if 'warp' in node.name]

layers = ['/warp/Constant', '/warp/Add', '/warp/Gather', '/warp/Constant_1', '/warp/Mul', '/warp/Constant_2', '/warp/Div', '/warp/Constant_3', '/warp/Sub', '/warp/Gather_1', '/warp/Constant_4', '/warp/Mul_1', '/warp/Constant_5', '/warp/Div_1', '/warp/Constant_6', '/warp/Sub_1', '/warp/Constant_7', '/warp/Unsqueeze', '/warp/Constant_8', '/warp/Unsqueeze_1', '/warp/Concat', '/warp/Transpose', '/warp/GridSample']

Based on them I created JSON file exclude_layers.json:

{
    "activation_encodings": {
        "/warp/Constant": [
            {
                "bitwidth": 32,
                "dtype": "float"
            }
        ],
        "/warp/Add": [
            {
                "bitwidth": 32,
                "dtype": "float"
            }
        ],
        "/warp/Gather": [
            {
                "bitwidth": 32,
                "dtype": "float"
            }
        ],
        "/warp/Constant_1": [
            {
                "bitwidth": 32,
                "dtype": "float"
            }
        ],
        "/warp/Mul": [
            {
                "bitwidth": 32,
                "dtype": "float"
            }
        ],
        "/warp/Constant_2": [
            {
                "bitwidth": 32,
                "dtype": "float"
            }
        ],
        "/warp/Div": [
            {
                "bitwidth": 32,
                "dtype": "float"
            }
        ],
        "/warp/Constant_3": [
            {
                "bitwidth": 32,
                "dtype": "float"
            }
        ],
        "/warp/Sub": [
            {
                "bitwidth": 32,
                "dtype": "float"
            }
        ],
        "/warp/Gather_1": [
            {
                "bitwidth": 32,
                "dtype": "float"
            }
        ],
        "/warp/Constant_4": [
            {
                "bitwidth": 32,
                "dtype": "float"
            }
        ],
        "/warp/Mul_1": [
            {
                "bitwidth": 32,
                "dtype": "float"
            }
        ],
        "/warp/Constant_5": [
            {
                "bitwidth": 32,
                "dtype": "float"
            }
        ],
        "/warp/Div_1": [
            {
                "bitwidth": 32,
                "dtype": "float"
            }
        ],
        "/warp/Constant_6": [
            {
                "bitwidth": 32,
                "dtype": "float"
            }
        ],
        "/warp/Sub_1": [
            {
                "bitwidth": 32,
                "dtype": "float"
            }
        ],
        "/warp/Constant_7": [
            {
                "bitwidth": 32,
                "dtype": "float"
            }
        ],
        "/warp/Unsqueeze": [
            {
                "bitwidth": 32,
                "dtype": "float"
            }
        ],
        "/warp/Constant_8": [
            {
                "bitwidth": 32,
                "dtype": "float"
            }
        ],
        "/warp/Unsqueeze_1": [
            {
                "bitwidth": 32,
                "dtype": "float"
            }
        ],
        "/warp/Concat": [
            {
                "bitwidth": 32,
                "dtype": "float"
            }
        ],
        "/warp/Transpose": [
            {
                "bitwidth": 32,
                "dtype": "float"
            }
        ],
        "/warp/GridSample": [
            {
                "bitwidth": 32,
                "dtype": "float"
            }
        ]
    },
    "param_encodings": {}
}

Finally I run snpe commands: snpe-onnx-to-dlc -i Model.onnx -o Model.dlc --quantization_overrides exclude_layers.json and then snpe-dlc-quantize --input_dlc Model.dlc --input_list Inputlist_train_short.txt --use_enhanced_quantizer --use_adjusted_weights_quantizer --axis_quant --output_dlc Quant_Model.dlc --enable_htp --htp_socs sm8550 –override_params

After running snpe-onnx-to-dlc in output there is information: 2024-10-03 13:42:48,459 - 235 - INFO - Processed 0 quantization encodings. I'm not sure if it should be expected.

Additionally in visualization of Quant_Model.dlc created with snpe-dlc-viewer it tells that layers from warp module are 8 bitwidth. Also saved results (using snpe-net-run) show that these module is quantized while it shouldn't be.

Could you help me and tell what should I change to exclude chosen layers from quantization?

Oct 03 '24 14:10 Piotr94

Refer Model-Accuracy-Mixed-Precision. This should help you with the process.

Oct 04 '24 17:10 quic-vraidu

For people who would have similar problem: the problem was that I was using layer names from ONNX, instead I had to use layer names from quantized .dlc model. They can be checked in .html file obtained with snpe-dlc-viewer command, for example: "/warp/Constant_output_0": [ { "bitwidth": 16 } ], "/warp/Add_output_0": [ { "bitwidth": 16 } ],

Oct 07 '24 06:10 Piotr94

@Piotr94, Can I know if the issue is resolved.

Oct 08 '24 03:10 quic-vraidu