caffe_rtpose icon indicating copy to clipboard operation
caffe_rtpose copied to clipboard

Options to increase fps

Open carstenschwede opened this issue 9 years ago • 22 comments

Are there any options to increase fps besides reducing resolution or adding GPUs? Is it possible to restrict detection to certain joints (e.g. Heads) in order to speed up processing?

carstenschwede avatar Jan 12 '17 14:01 carstenschwede

(1) Using MPI model instead of COCO model (2) Using one scale for testing can speed up the processing time. Restricting the detections does not help because the CNN still need to use the same trained model and thus the CNN forwarding processing time is the same.

ZheC avatar Jan 12 '17 14:01 ZheC

Another option is to modify the text.prototxt and reduce the stage number from 6 to 3.

ZheC avatar Jan 13 '17 02:01 ZheC

Thanks, I will try both. Any idea of what kind of speedup I could expect?

carstenschwede avatar Jan 13 '17 15:01 carstenschwede

Hi I have a question about the fps is that: I run the rtpose demo on the AWS p2.large instance(with one K80 GPU 24G), however it takes 1.1s to deal a frame. I don't know whether it is because that the k80 gpu has a compute capability of 3.7 lower than that of 6.1 of GTX1080?

Warden7 avatar Feb 13 '17 08:02 Warden7

These is a preliminary benchmark we have made with the new version we are working on (it will be released in around 1 month). The current version you are using should be around 25-30% slower. Let me know if you are using the same flags. If so, are you using cuDNN 5.1? Older versions of cuDNN might also slow down the program. Thanks!

Current benchmark: https://docs.google.com/spreadsheets/d/1-DynFGvoScvfWDA1P4jDInCkbD4lg0IKOYbXgEq0sK0/edit#gid=0

gineshidalgo99 avatar Feb 14 '17 17:02 gineshidalgo99

@Warden7 their compute capabilities K80: 8.73TFLOPS 1080: 9TFLOPS

low fps maybe other reasons

wangzhangup avatar Feb 16 '17 19:02 wangzhangup

Thanks for your warmly analysis. The version of cuDNN is 5.0 and Cuda is 7.5. The key word of GPU information "volatile gpu util" always shows 99%, even though nothing is done on the GPU.Maybe something debug need to be done further.

Warden7 avatar Feb 21 '17 06:02 Warden7

@Warden7 kill the processes on the GPU

wangzhangup avatar Feb 21 '17 13:02 wangzhangup

Another way to speed it up is by using the new version (~25% faster): https://github.com/CMU-Perceptual-Computing-Lab/openpose

gineshidalgo99 avatar May 02 '17 12:05 gineshidalgo99

Reduce the number of feature maps. I modify the stage 3-6 conv layer's output number from 128 to 64. And the result is as good as original version, speed up 25%!

wangzhangup avatar May 03 '17 04:05 wangzhangup

@wangzhangup Thanks, can you try your modification also on the newer version at https://github.com/CMU-Perceptual-Computing-Lab/openpose? Would be interesting to see what overall speedup you are able to get.

carstenschwede avatar May 03 '17 13:05 carstenschwede

@gineshidalgo99 Thanks for the update!

carstenschwede avatar May 03 '17 13:05 carstenschwede

@wangzhangup Thank you so much for your idea! Please, could you email me: [email protected] to discuss how you did it in more details? We are interested in adding it to our system if that is OK for you!

gineshidalgo99 avatar May 03 '17 14:05 gineshidalgo99

@gineshidalgo99 OK!

wangzhangup avatar May 05 '17 02:05 wangzhangup

@gineshidalgo99 @carstenschwede this is the speedup model https://drive.google.com/open?id=0B-SxboVJxF-WNmtpWGc5emZrRDg

wangzhangup avatar May 08 '17 07:05 wangzhangup

@wangzhangup The speed-up is impressive, and the accuracy does decrease a bit, but it is a fine for the huge speedup. Do you mind if I add it to the new OpenPose? (I went from 14 to 20 fps on my desktop and from 30 to 22 mAP). Or you can make a pull request with your new prototxt, and I will fix the other details (so you would appear as contributor of OpenPose). Thanks!

https://github.com/CMU-Perceptual-Computing-Lab/openpose

gineshidalgo99 avatar May 10 '17 22:05 gineshidalgo99

@wangzhangup thanks for the model, impressive speedup!

@gineshidalgo99 is a similar speedup expected for the upcoming "extended" models at OpenPose (e.g. finger tracking)?

carstenschwede avatar May 11 '17 23:05 carstenschwede

@carstenschwede The speed up applies to the body pose, but finger tracking is made on top of it (you need to know the body location to detect the hand), so it will take advantage of it too if this model is used (I did not measure the accuracy impact yet though, I guess I will add both models: 1 for better accuracy and 1 for speed).

gineshidalgo99 avatar May 11 '17 23:05 gineshidalgo99

I guess I will add both models: 1 for better accuracy and 1 for speed

Sounds perfect. Can't wait to try out the finger detection.

carstenschwede avatar May 11 '17 23:05 carstenschwede

@gineshidalgo99 Could you share your measure code?

wangzhangup avatar May 20 '17 05:05 wangzhangup

It is still quite messy, it uses Matlab and C++, and it is not completely finished. I prefer to wait until I actually finish it properly... sorry!

gineshidalgo99 avatar May 26 '17 14:05 gineshidalgo99

@carstenschwede The speed up applies to the body pose, but finger tracking is made on top of it (you need to know the body location to detect the hand), so it will take advantage of it too if this model is used (I did not measure the accuracy impact yet though, I guess I will add both models: 1 for better accuracy and 1 for speed).

I just try finger tracking, with option 640x480, also use tracking 5 but fps just around 10fps. May you give an advice?

aakendi avatar Nov 20 '18 13:11 aakendi