batch_size idea
Hi,
As of now, The time complexity for UHD images and 1080p images are as the following:
On UHD images, (e.g. images in a folder) the first one is slow (understandable) and the average execution time on the rest is around 0.71 sec. I have GeForce GTX 1080 and GeForce GTX 980. I disabled the memory growth flag though due to the 980 one.
The same experiment on the 1080p gives me 0.18 sec. Do you expect these numbers?
Is there anyway to send the images not one by one to the network (as the current code status). Something like batch_size of 5-8 for each call in the loop. It might speed up the network in case is doable. Let me know your thoughts. Thanks
Can you tell me the resolution of your input images? 1024*2048 size image give me 0.04 sec on GTX1080. Yes, you can change the code to feed batch of images as input, and this might be much faster.
On my machine with the gtx 1080 for an image of 1920x1080 is .18 sec ... hmm almost 4 times slower, isnthere any settings I need to do?
Oh, my graphic card is gtx 1080 ti, but I don't think gtx 1080 will 4 times slower than it. Can you try with single input image with following code?
for i in range(10):
start_time = time.time()
preds = sess.run(pred, feed_dict={x: img})
print(time.time() - start_time)
Yes that’s the way I outputed as well
GPU: gtx 1080 (not a TI) Tensorflow: (r1.6 from the source) Libcuda: 8.0 Libcnn: 5.0 gpu decide version: 6.1 python: 2.7 Even I played with the blaze build option to re-compile tensorflow but still I don't get 0.04 sec as your machine. still around 0.17-0.18 second per frame 1920x1080...
@aliericcantona , when I install r1.6, it recommended cuda 9.0, I don't know whether this is a problem or not. However, I use tf 1.4 instead of tf 1.6, maybe you can try on tf 1.4? I think 0.17 is really slow for gtx 1080, really strange.
still I can't get less than 0.16 sec. Even I have the new image on my centos 7 machine. Is there any trick (OS) wise that you get that number? 4 times faster than mine.
BTW, I installed cuda 9.1 and cudnn 7.0 with tensorflow r1.6 on gpu 1080ti, stil the same number. 0.16seconds per frame of 1920x1080 size. I installed tensorflow from the source. Is there any special trick you may know of when ./configure?
Can you list your machine installed packages list, mine is as the following:
- protobuf == 3.5.2
- python 2.7.5
- gcc 4.8.5
- nvidia cuda 9.0
- nvidia cudann 7.0
- protobuf
- OS (Centos 7)