Lite-HRNet Inference time is suprisingly long

Test on V100 GPU card, with top-level CPU. For litehrnet_18_coco_256x192, the inference latency > 200 ms

May 19 '21 01:05 huilongan

can you speak chinese? i meet the same question as you!

May 19 '21 09:05 gentlehuijiaasddsa123

any update on this issue? I tasted on A10 card and the inference latency is pretty long as well...

Jun 07 '21 04:06 WingsOfPanda

no， i think the main reason is cause taht the Communication bottleneck

------------------ 原始邮件 ------------------ 发件人: @.>; 发送时间: 2021年6月7日(星期一) 中午12:45 收件人: @.>; 抄送: @.>; @.>; 主题: Re: [HRNet/Lite-HRNet] Inference time is suprisingly long (#35)

any update on this issue? I tasted on A10 card and the inference latency is pretty long as well...

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

Jun 07 '21 09:06 gentlehuijiaasddsa123

do you use inference_top_down_pose_model()? from mmpose.apis.inference import _inference_single_pose_model in this method, changing the device of tensor to cuda slows the speed down a bit. Also, some more speed down occurs in the test_pipeline(data) part of this method.

Jun 09 '21 02:06 IanUJo

This is a good point. PyTorch does not support multi-branch structure well. The inference time is a little long. With careful implementation at CPU, the runtime acceleration is also the same as at in FLOPs. In our product: the theoretic acceleration is 3.7, the runtime acceleration is ~3.5.

Jun 10 '21 08:06 welleast

@welleast Could you tell me the main points of "careful implementation"? And, if the multi-branch structure is a problem, will HRNet be just as fast with a similar implementation?

Jul 29 '21 04:07 iiou16