SoftTeacher icon indicating copy to clipboard operation
SoftTeacher copied to clipboard

Training with a Single GPU - CUDA out of memory.

Open tserdar opened this issue 4 years ago • 6 comments

Hey,

I have been trying to run the repo over a custom dataset for a while now. I believe that I have the custom dataset prepared accordingly after some hassling.

However, now I am stuck with getting out of CUDA memory. Could you help me find out what to change in the configs that I can reduce the video memory occupation during training (e.g. mini-batching, etc.)?

I am currently using 2 samples per GPU with number of GPUs = 1 using the bash code below (just to get the repo working),

for FOLD in 1;
do
  bash tools/dist_train_partially.sh semi ${FOLD} 5 1
done

Thanks

tserdar avatar Jan 03 '22 14:01 tserdar

Could you tell me the memory size of your gpus?

MendelXu avatar Jan 03 '22 14:01 MendelXu

Could you tell me the memory size of your gpus?

Thanks for the quick reply. I am currently testing the repo on my NVIDIA RTX 2060M with a VRAM of ~5600-5700MB.

tserdar avatar Jan 03 '22 14:01 tserdar

I have tried to do this and I think the only solution is to decrease the sizes of input images in your case.

MendelXu avatar Jan 03 '22 14:01 MendelXu

Alright, thanks.

tserdar avatar Jan 03 '22 15:01 tserdar

Is there a configuration that automatically resizes images on the fly (during training) or do I need to resize all images manually?

tserdar avatar Jan 03 '22 15:01 tserdar

https://github.com/microsoft/SoftTeacher/blob/bef9a256e5c920723280146fc66b82629b3ee9d4/configs/soft_teacher/base.py#L30 https://github.com/microsoft/SoftTeacher/blob/bef9a256e5c920723280146fc66b82629b3ee9d4/configs/soft_teacher/base.py#L80 https://github.com/microsoft/SoftTeacher/blob/bef9a256e5c920723280146fc66b82629b3ee9d4/configs/soft_teacher/base.py#L153 Change these lines.

MendelXu avatar Jan 03 '22 15:01 MendelXu