SSD-Tensorflow icon indicating copy to clipboard operation
SSD-Tensorflow copied to clipboard

when will be the train stop?

Open seasonyang opened this issue 8 years ago • 9 comments

I have trained my own dataset(they are 60 images of 4 classes) as fllow: 1、commd: DATASET_DIR=./tfrecords/voc2007 TRAIN_DIR=./logs/ CHECKPOINT_PATH=./checkpoints/ssd_300_vgg.ckpt python2 train_ssd_network.py
--train_dir=${TRAIN_DIR}
--dataset_dir=${DATASET_DIR}
--dataset_name=pascalvoc_2007
--dataset_split_name=train
--model_name=ssd_300_vgg
--checkpoint_path=${CHECKPOINT_PATH}
--save_summaries_secs=60
--save_interval_secs=600
--weight_decay=0.0005
--optimizer=adam
--learning_rate=0.001
--batch_size=16 2、when the loss hovered near 0.6 and setp near 40000,it‘s not stop like this。 INFO:tensorflow:Saving checkpoint to path ./logs/model.ckpt INFO:tensorflow:Recording summary at step 37262. INFO:tensorflow:global step 37270: loss = 0.6264 (1.872 sec/step) INFO:tensorflow:global step 37280: loss = 0.6262 (1.894 sec/step) INFO:tensorflow:global step 37290: loss = 0.6265 (1.807 sec/step) INFO:tensorflow:Recording summary at step 37295. INFO:tensorflow:global step 37300: loss = 0.6261 (1.813 sec/step) INFO:tensorflow:global step 37310: loss = 0.6263 (1.814 sec/step) INFO:tensorflow:global step 37320: loss = 0.6265 (1.814 sec/step)

3、question:the loss has been too low,why it doent convergence and stop trainning? who can help me?

seasonyang avatar Aug 16 '17 08:08 seasonyang

@seasonyang Only stop when you stop it yourself. Ctrl+C.

JDanielWu avatar Aug 16 '17 08:08 JDanielWu

@WuDanFly Is there some way like set max steps or loss threshold to control the training?

seasonyang avatar Aug 16 '17 08:08 seasonyang

@seasonyang max steps :you can find at train_ssd_network.py. tf.app.flags.DEFINE_integer('max_number_of_steps', None, 'The maximum number of training steps.'). Modify None to the max steps you want..By the way, do you get a good result at your own dataset?

JDanielWu avatar Aug 16 '17 08:08 JDanielWu

@WuDanFly thank you for your help, i think it's my param(match_threshold=0.5) lead the result,my loss(0.6) has not reach to 0.5,so it not stop training。 anyway,I have another question as follow。 when i stop the training with Ctrl+C,i found so many files in ./logs/ like this:

checkpoint
events.out.tfevents.1502780913.TENCENT64.site
events.out.tfevents.1502782292.TENCENT64.site
events.out.tfevents.1502782426.TENCENT64.site events.out.tfevents.1502822085.TENCENT64.site model.ckpt-28153.data-00000-of-00001 model.ckpt-26189.meta model.ckpt-26189.index
model.ckpt-37915.meta
model.ckpt-38242.data-00000-of-00001 model.ckpt-38242.index model.ckpt-28153.index
model.ckpt-38242.meta model.ckpt-28153.meta
model.ckpt-3870.data-00000-of-00001 ... training_config.txt

the question is how can i eval my model? for purpose, my model is fine-tuning with ssd_300_vgg. my eval commd run to a erro result:

DATASET_DIR=./tfrecords/voc2007/ EVAL_DIR=./logs/ CHECKPOINT_PATH=./logs/model.ckpt python eval_ssd_network.py
--eval_dir=${EVAL_DIR}
--dataset_dir=${DATASET_DIR}
--dataset_name=pascalvoc_2007
--dataset_split_name=test
--model_name=model
--checkpoint_path=${CHECKPOINT_PATH}
--batch_size=1

when i use "model_name=ssd_300_vgg" or "model_name=model" result is the same!

how can i eval my own model

seasonyang avatar Aug 16 '17 09:08 seasonyang

eval result at ssd_300_vgg: 2017-08-16 19:09:31.452408: I tensorflow/core/kernels/logging_ops.cc:79] AP_VOC07/mAP[0] 2017-08-16 19:09:31.452709: I tensorflow/core/kernels/logging_ops.cc:79] AP_VOC12/mAP[0] INFO:tensorflow:Finished evaluation at 2017-08-16-11:09:31 Time spent : 427.356 seconds. Time spent per BATCH: 0.086 seconds.

seasonyang avatar Aug 16 '17 11:08 seasonyang

@seasonyang The function of match_threshold is not to stop the train.You can read #71 .About eval ,I suggest you read README.md, it helps me a lot ,should be same to you.

JDanielWu avatar Aug 17 '17 03:08 JDanielWu

@seasonyang i meet the same confusion,don't know where is the model by training .During the training ,the ssd_300_vgg is not changed .

hallochen avatar Apr 22 '18 11:04 hallochen

@seasonyang can you share the training code? you got a really low loss!

RogerAylagas avatar Sep 13 '18 11:09 RogerAylagas

can't get ssd_300_vgg file when running "train_ssd_network.py --detaset_dir=*/tfrecords",and there occured some error,but when add arguments which are in the finetuning's,it can run ,but cant get ssd_300_vgg,but get some other files in logs/ what shall i do to start the train?

VolleyballBird avatar Nov 13 '19 09:11 VolleyballBird