TextBoxes_plusplus icon indicating copy to clipboard operation
TextBoxes_plusplus copied to clipboard

the train was so slowly with my own dataset

Open qianxuyidian-2018 opened this issue 7 years ago • 8 comments

hi, @MhLiao I prepared my own dataset, there were 500 720*1280 photos, I marked these pictures, and then trained. I used 6 NVIDIA K80 GPUs, but the training speed is very slow, 10 iterations take an hour, I don't Know what the reason is, can you give me some suggestion?

The annotation format is as follows: image1

The log of the training is as follows: image2

qianxuyidian-2018 avatar Jul 31 '18 06:07 qianxuyidian-2018

Well, if i only use one GPU (gpus="0"), the training speed is normal, 10 iterations take 1 minute. If I set (gpus = "0,1,2,3,4,5") or (gpus = "2,3") or other combinations, the speed will become very slow. I donot know what wrong with muti-GPU.

qianxuyidian-2018 avatar Aug 01 '18 01:08 qianxuyidian-2018

The images in ICDAR 2015 are the same resolution as yours. Have you tried it?

MhLiao avatar Aug 13 '18 02:08 MhLiao

Thank you for your reply, I solved this problem I reinstalled the nvidia k80 driver, if I set gpus=“0,2” or gpus=“0,2,4”, the speed is normal, if I set gpus = "0,1" or gpus="2,3" or something else, the speed becomes very slow, I think it may be because each K80 card has two cores. I have three K80 cards, so i can see 6 GPU IDs through nvidia-smi. If I use two GPU ID on one K80, like "2,3".., the speed will be slower.

qianxuyidian-2018 avatar Aug 13 '18 06:08 qianxuyidian-2018

hello,could you tell me how use my prepared dataset. @qianxuyidian-2018 .please,tell me details.Thank you

image

425183525 avatar Oct 11 '18 08:10 425183525

you can perpare you dataset as PASCAL VOC format. you can find some overview of PASCAL VOC on network first.Then train you dataset as described in README.md. Besides,You should make some changes to these scripts to suit your project.,so,you should understand these scripts. Because I am in a closed work environment, although I can browse the web,but I can't send the code and scripts directly outside.

qianxuyidian-2018 avatar Oct 22 '18 03:10 qianxuyidian-2018

Hello, I trained 1600 images with the size of 1280 x 720. With 1 GPU (GTX 1080), it took 2 hours to iterate 100 times. I've tried multiple gpus, different combinations, and GPU training is still slow. Can you tell me your thoughts on using multiple gpus to speed up training? Thank you very much. And my gpu-util is always 0, but occupies GPU memory. image

425183525 avatar Dec 05 '18 03:12 425183525

I have the same problem@425183525 , do you solve it?

cainiaojy avatar Mar 17 '19 06:03 cainiaojy

I have the same problem@425183525 , do you solve it?

I have the same problem,did you solve it?

mrlihellohorld avatar Jan 06 '20 15:01 mrlihellohorld