SSD-Tensorflow I can train on gpu but can only eval on cpu

env : python2.7 tensorflow-1.10.0 I only changed eval_op=flatten(list(names_to_updates.values()))

I want to eval on GPU, but all of my tries are in vain. The training is totally ok with GPU. what's wrong with eval?

Apr 01 '19 12:04 zhengduoru

can you test with cpu? and what is your wrong result?

Apr 02 '19 05:04 cjnjuwhy

训练的时候显卡利用率可以达到100%，但是验证的时候就一直是0%，我发现整个验证的过程完全是在cpu中进行的，最终验证一次需要7000多秒。

Apr 02 '19 06:04 zhengduoru

训练的时候显卡利用率可以达到100%，但是验证的时候就一直是0%，我发现整个验证的过程完全是在cpu中进行的，最终验证一次需要7000多秒。

没用gpu确实要这么久，你在测试的过程中是否出现tensorflow展示出来的gpu信息？

Apr 02 '19 06:04 cjnjuwhy

你的代碼有改嗎?我驗證的時候也是只能用CPU

Apr 02 '19 06:04 EdwinChien

你的代碼有改嗎?我驗證的時候也是只能用CPU

因为版本的问题，改过代码，但只涉及一些bug不设计gpu相关的代码。你可以查看一下gpu的显存为多少，以及在运行命令时的提示有没有gpu信息。我猜测问题是给test分配的最大显存不够--gpu_memory_fraction，test预定是0.1，train是0.8 我的测试命令：

DATASET_DIR=./tfrecords/
TRAIN_DIR=./logs/ssd_300_vgg_tfs
EVAL_DIR=${TRAIN_DIR}/eval
python eval_ssd_network.py \
    --eval_dir=${EVAL_DIR} \
    --dataset_dir=${DATASET_DIR} \
    --dataset_name=pascalvoc_2007 \
    --dataset_split_name=test \
    --model_name=ssd_300_vgg \
    --checkpoint_path=${TRAIN_DIR} \
    --wait_for_checkpoints=True \
    --batch_size=2 \
    --gpu_memory_fraction=0.9 \
    --max_num_batches=500

提示：

INFO:tensorflow:Evaluating ./logs/ssd_300_vgg_tfs/model.ckpt-1774
INFO:tensorflow:Starting evaluation at 2019-04-02-06:17:32
INFO:tensorflow:Graph was finalized.
2019-04-02 14:17:33.014338: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2019-04-02 14:17:33.781433: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1405] Found device 0 with properties: 
name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235
pciBusID: 0000:14:00.0
totalMemory: 11.17GiB freeMemory: 11.10GiB
2019-04-02 14:17:34.187589: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1405] Found device 1 with properties: 
name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235
pciBusID: 0000:15:00.0
totalMemory: 11.17GiB freeMemory: 11.10GiB
2019-04-02 14:17:34.188465: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1484] Adding visible gpu devices: 0, 1
2019-04-02 14:17:34.889481: I tensorflow/core/common_runtime/gpu/gpu_device.cc:965] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-04-02 14:17:34.889536: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971]      0 1 
2019-04-02 14:17:34.889549: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 0:   N Y 
2019-04-02 14:17:34.889559: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 1:   Y N 
2019-04-02 14:17:34.890464: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10295 MB memory) -> physical GPU (device: 0, name: Tesla K80, pci bus id: 0000:14:00.0, compute capability: 3.7)
2019-04-02 14:17:35.006704: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 10295 MB memory) -> physical GPU (device: 1, name: Tesla K80, pci bus id: 0000:15:00.0, compute capability: 3.7)

Apr 02 '19 06:04 cjnjuwhy

机子centos 6， python2.7， tf1.10.0， P40显卡验证时，gpu_memory_fraction我从在[0, 0.1, 0.8, 1]这些值里面都试过了，最终都是没有让GPU跑起来修改log_device_placement=True后，报出的信息全部是CPU的。但是我通过watch -n 1 nvidia-smi观察显存发现显存能被占住，可是显卡没有在计算。CPU利用率达到了500%。

后来我把220行左右的with tf.device('/device:CPU:0')注释掉了，显卡才能开始计算，并且报出的信息是可以看到某些结构是在GPU中计算的。但是验证速度依旧很慢。

我在另一个机子上重新实验了一下，Ubuntu16.04 ，python2.7, tf1.10.0， 1080Ti显卡还有Ubuntu16.04 ，python2.7, tf1.12.0， 1080Ti显卡。两种情况的验证都是很快的，显卡利用率到了20%。

Apr 02 '19 07:04 zhengduoru

机子centos 6， python2.7， tf1.10.0， P40显卡验证时，gpu_memory_fraction我从在[0, 0.1, 0.8, 1]这些值里面都试过了，最终都是没有让GPU跑起来修改log_device_placement=True后，报出的信息全部是CPU的。但是我通过watch -n 1 nvidia-smi观察显存发现显存能被占住，可是显卡没有在计算。CPU利用率达到了500%。

后来我把220行左右的with tf.device('/device:CPU:0')注释掉了，显卡才能开始计算，并且报出的信息是可以看到某些结构是在GPU中计算的。但是验证速度依旧很慢。

我在另一个机子上重新实验了一下，Ubuntu16.04 ，python2.7, tf1.10.0， 1080Ti显卡还有Ubuntu16.04 ，python2.7, tf1.12.0， 1080Ti显卡。两种情况的验证都是很快的，显卡利用率到了20%。

真正消耗时间的是在L313和L332的slim.evaluation.evaluation_loop部分，所以你注释掉L220会使用GPU但仍然没法解决加速问题。使用python2.7是不是改过了很多的代码？我用的是Ubuntu14.4+python3.6+tensorflow1.10，没用过2.7和centos所以也不清楚是哪个版本出现的问题😂

Apr 02 '19 07:04 cjnjuwhy