tutorials icon indicating copy to clipboard operation
tutorials copied to clipboard

Feedback about Real Time Inference on Raspberry Pi 4 and 5 (40 fps!)

Open shimon-c opened this issue 2 months ago • 4 comments

There is the following issue on this page: https://docs.pytorch.org/tutorials/intermediate/realtime_rpi.html

I have a YOLO like model which I used QAT and the quantized model is very slow. It is slower 4 time than the floating point model. Although I used fused layers.

Any help?

shimon-c avatar Dec 04 '25 13:12 shimon-c

Aslo try to jit YOLO model it failed for troch

shimon-c avatar Dec 04 '25 13:12 shimon-c

Can you share:

  • What size YOLO model you're using (YOLOv5s, YOLOv8n, etc.)?
  • Are you running on Raspberry Pi or another ARM device?
  • Did you verify the qnnpack backend is set?
  • What does your quantization code look like?

patrocinio avatar Dec 04 '25 16:12 patrocinio

cc: @d4l3k

svekars avatar Dec 04 '25 17:12 svekars

@shimon-c can you share the yolo model you're using?

QAT (Quantized Aware Training) is a training technique and it's expected that enabling QAT (which doesn't actually quantize the model) will reduce the performance as it runs the full model + calculates the quantization losses.

To actually quantize the model it's best to hand tune the model to fuse and lower to int8 precision. The quantizated models in that tutorial in torchvision are optimized for performance.

d4l3k avatar Dec 04 '25 20:12 d4l3k