ICNet-tensorflow icon indicating copy to clipboard operation
ICNet-tensorflow copied to clipboard

batch_size idea

Open aliericcantona opened this issue 7 years ago • 9 comments

Hi,

As of now, The time complexity for UHD images and 1080p images are as the following:

On UHD images, (e.g. images in a folder) the first one is slow (understandable) and the average execution time on the rest is around 0.71 sec. I have GeForce GTX 1080 and GeForce GTX 980. I disabled the memory growth flag though due to the 980 one.

The same experiment on the 1080p gives me 0.18 sec. Do you expect these numbers?

Is there anyway to send the images not one by one to the network (as the current code status). Something like batch_size of 5-8 for each call in the loop. It might speed up the network in case is doable. Let me know your thoughts. Thanks

aliericcantona avatar Mar 13 '18 20:03 aliericcantona

Can you tell me the resolution of your input images? 1024*2048 size image give me 0.04 sec on GTX1080. Yes, you can change the code to feed batch of images as input, and this might be much faster.

hellochick avatar Mar 14 '18 03:03 hellochick

On my machine with the gtx 1080 for an image of 1920x1080 is .18 sec ... hmm almost 4 times slower, isnthere any settings I need to do?

aliericcantona avatar Mar 14 '18 04:03 aliericcantona

Oh, my graphic card is gtx 1080 ti, but I don't think gtx 1080 will 4 times slower than it. Can you try with single input image with following code?

for i in range(10):
    start_time = time.time()
    preds = sess.run(pred, feed_dict={x: img})
    print(time.time() - start_time)

hellochick avatar Mar 14 '18 05:03 hellochick

Yes that’s the way I outputed as well

aliericcantona avatar Mar 14 '18 05:03 aliericcantona

GPU: gtx 1080 (not a TI) Tensorflow: (r1.6 from the source) Libcuda: 8.0 Libcnn: 5.0 gpu decide version: 6.1 python: 2.7 Even I played with the blaze build option to re-compile tensorflow but still I don't get 0.04 sec as your machine. still around 0.17-0.18 second per frame 1920x1080...

aliericcantona avatar Mar 14 '18 23:03 aliericcantona

@aliericcantona , when I install r1.6, it recommended cuda 9.0, I don't know whether this is a problem or not. However, I use tf 1.4 instead of tf 1.6, maybe you can try on tf 1.4? I think 0.17 is really slow for gtx 1080, really strange.

hellochick avatar Mar 15 '18 06:03 hellochick

still I can't get less than 0.16 sec. Even I have the new image on my centos 7 machine. Is there any trick (OS) wise that you get that number? 4 times faster than mine.

aliericcantona avatar Mar 20 '18 04:03 aliericcantona

BTW, I installed cuda 9.1 and cudnn 7.0 with tensorflow r1.6 on gpu 1080ti, stil the same number. 0.16seconds per frame of 1920x1080 size. I installed tensorflow from the source. Is there any special trick you may know of when ./configure?

aliericcantona avatar Mar 20 '18 19:03 aliericcantona

Can you list your machine installed packages list, mine is as the following:

  1. protobuf == 3.5.2
  2. python 2.7.5
  3. gcc 4.8.5
  4. nvidia cuda 9.0
  5. nvidia cudann 7.0
  6. protobuf
  7. OS (Centos 7)

aliericcantona avatar Mar 21 '18 16:03 aliericcantona