MagNet icon indicating copy to clipboard operation
MagNet copied to clipboard

# of required GPUs to reproduce Best outputs

Open yswang1717 opened this issue 3 years ago • 10 comments

Hello ! Thanks for your great contribution in this field.

I'm setting up to follow your work (MagNet) and wonder how many GPUs are required to implement your codes? In details, I want to work on DeepGlobe Dataset first with the following running codes. Please tell me the number and the memory size of GPUs you used in this experiments!

Best regards, Yooseung

======================================================================== python train.py --dataset deepglobe
--root data/deepglobe
--datalist data/list/deepglobe/train.txt
--scales 612-612,1224-1224,2448-2448
--crop_size 612 612
--input_size 508 508
--num_workers 8
--model fpn
--pretrained checkpoints/deepglobe_fpn.pth
--num_classes 7
--batch_size 8
--task_name deepglobe_refinement
--lr 0.001

or in short, run the script below sh scripts/deepglobe/train_magnet.sh

yswang1717 avatar Jun 03 '22 00:06 yswang1717

Thanks if you tell me the information separately for MagNet and Fast-MagNet. In addition, I also want to know the required memory to take 1 batch on the GPU.

yswang1717 avatar Jun 03 '22 00:06 yswang1717

To train the backbone, you are suggested to run on big GPUs (from previous works). I used V100 32GB.

To train the refinement module, I used 2080Ti 11GB. However, you can reduce the batch size to run on GPUs with 8GB. I don't think there are too many differences in refinement's performance when reducing the batch size.

hmchuong avatar Jun 05 '22 14:06 hmchuong

Thanks for your quick response. I'm running and testing your codes and have some questions.

-How can I reproduce your paper results on DeepGlobe dataset? In the paper, Tables 8 shows results of MagNet-fast:71.85, MagNet:72.96 However, in my environment, the results do not match to them.

running MagNet-Fast: sh scripts/deepglobe/test_magnet_fast.sh output : coarse iou: 67.23, refinement iou: 68.22

running MagNet: sh scripts/deepglobe/test_magnet.sh output : coarse iou: 67.23, refinement iou: 72.10

refinement ious reproduced from your code are lower than the paper results. (MagNet-Fast:71.85->68.22, MagNet:72.96->72.10) I used the same .pth file downloaded from your github code following your readme instruction. --pretrained checkpoints/deepglobe_fpn.pth --pretrained_refinement checkpoints/deepglobe_refinement.pth \

yswang1717 avatar Jun 07 '22 02:06 yswang1717

Hi, The results in Table 8 are performed with the multi-scale/ flipping setting adopting from the GLNet paper for a fair comparison. Unfortunately, that testing script is not included in the current code.

hmchuong avatar Jun 09 '22 02:06 hmchuong

Hello!

  1. If possible could you please mail me the test code? ([email protected] ) I really want to reproduce your paper results.

  2. Or can you just write the ratio for multi-scales and related parameters?

  3. In addition, I cannot find the multi-scale option in GLNet paper github below. Where is it? https://github.com/VITA-Group/GLNet/blob/7b7bdee196e368a1f3a32c54b984915f8e397275/helper.py#L390

yswang1717 avatar Jun 09 '22 04:06 yswang1717

Hi,

Sorry, it's not multi-scale, it's flipping/rotating testing. You can check the code here https://github.com/VITA-Group/GLNet/blob/7b7bdee196e368a1f3a32c54b984915f8e397275/helper.py#L434

hmchuong avatar Jun 10 '22 13:06 hmchuong

To reproduce Table 8 "Segmentation results on the DeepGlobe dataset.", which input size did you use for "patch processing" and "down sampling"? (U-Net, FCN, SegNet, ... ). The following two sentence in the paper contradict for each other.

  1. "We also used the same input size 508×508 as GLNet."
  2. MagNet, and 64 patches of the patch processing approach.

Downsampling does not matter, but it is not clear on patch processing experimental settings. 508x508 is not divided by 2448x2448 and did you produce 64 306x306 patches of 2448x2448 images and upscaled them to 508x508?

yswang1717 avatar Jun 16 '22 00:06 yswang1717

Hi, there is an overlapping between patches. You can use the code I provide to get 64 patches

hmchuong avatar Jun 16 '22 14:06 hmchuong

Hello,

  1. I think the backbone network is not trained with torch.no_grad() option in the following codes. Can I train the backbone network by removing the with torch.no_grad() option for both coarse_pred and fine_pred? Should I have to remove torch.no_grad() in aggregate features additionally?

with torch.no_grad():
coarse_pred = model(coarse_image).softmax(1)
fine_pred = model(fine_image).softmax(1)

  1. Should I have to freeze the refinement module while training backbone network? Get early predictions

  2. running codes with removing torch.no_grad() option requires 13GB and 19GB (coarse and fine, same training protocol except hrnet18+ocr on deepglobe dataset) for only single batch size. it is correct to being required 32GB GPU memory for training backbone network? (on deepglobe dataset, 508^2 input size, same scale, hrnet18+ocr backbone)

yswang1717 avatar Jun 21 '22 00:06 yswang1717

Hi,

There are two separate sections: one for backbone training, one for refinement module training. Please check that, currently the two modules can not be joint trained.

hmchuong avatar Jun 27 '22 02:06 hmchuong