tutorials Feedback about Real Time Inference on Raspberry Pi 4 and 5 (40 fps!)

There is the following issue on this page: https://docs.pytorch.org/tutorials/intermediate/realtime_rpi.html

I have a YOLO like model which I used QAT and the quantized model is very slow. It is slower 4 time than the floating point model. Although I used fused layers.

Any help?

Dec 04 '25 13:12 shimon-c

Aslo try to jit YOLO model it failed for troch

Dec 04 '25 13:12 shimon-c

Can you share:

What size YOLO model you're using (YOLOv5s, YOLOv8n, etc.)?
Are you running on Raspberry Pi or another ARM device?
Did you verify the qnnpack backend is set?
What does your quantization code look like?

Dec 04 '25 16:12 patrocinio

cc: @d4l3k

Dec 04 '25 17:12 svekars

@shimon-c can you share the yolo model you're using?

QAT (Quantized Aware Training) is a training technique and it's expected that enabling QAT (which doesn't actually quantize the model) will reduce the performance as it runs the full model + calculates the quantization losses.

To actually quantize the model it's best to hand tune the model to fuse and lower to int8 precision. The quantizated models in that tutorial in torchvision are optimized for performance.

Dec 04 '25 20:12 d4l3k