ncnn The inference speed on pad

I implement my code with ncnn in pad, here are pad hardware configurations:

处理器
骁龙685处理器
CPU
4 x A73 (2.8GHz) + 4 x A53 (1.9GHz)

The speed reach to 3.9s which is much different than 500ms in phone. The ncnn settings are the same to the phone.

 ncnn::set_cpu_powersave(4);
 model.retina_opt.use_bf16_storage = true;

  model.opt.lightmode = true;
  model.opt.num_threads = 4;
  model.opt.blob_allocator = &retina_g_blob_pool_allocator;
  model.opt.workspace_allocator = &retina_g_workspace_pool_allocator;

Any help will be appreciated in advance.

Apr 10 '25 03:04 zengjie617789

use the latest ncnn release delete all the settings code you presented

Apr 10 '25 06:04 nihui

Thank you your instant response. The ncnn version is based on 202412 and the code runs fast in other phone devices. I wonder if the cpu type make a difference on the results. I found you post a comment that A53 CPU not support fp16 calculation and that layer will turn back to fp32. I'm not sure it is right?

Apr 10 '25 07:04 zengjie617789

yep, A53 is a slow cpu and has no fp16 capability, its micro architecture is very old :]

Apr 11 '25 07:04 nihui

Thank you your frank response. If it will slow down the inference in A53 because of without fp16 support, why it is still much slow when I use the int8 model to inference and make no difference with fp16 model inference time?

yep, A53 is a slow cpu and has no fp16 capability, its micro architecture is very old :]

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

nihui left a comment (Tencent/ncnn#5979)

yep, A53 is a slow cpu and has no fp16 capability, its micro architecture is very old :]

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

Apr 11 '25 15:04 zengjie617789