RstarCNN icon indicating copy to clipboard operation
RstarCNN copied to clipboard

Data Augmentation

Open csmailis opened this issue 8 years ago • 1 comments

I would like to ask how Data Augmentation is performed in the case of the Baseline RCNN that only uses groundtruth ROIs as Primary Regions.

More specifically within the paper, you mention that:

Rather than limiting training to the ground-truth person locations, we use all regions that overlap more than 0.5 with a ground-truth box. This condition serves as a form of data augmentation. For every primary region, we randomly select N regions from the set of candidate secondary regions. N is a function of the GPU memory limit (we use a Nvidia K40 GPU) and the batch size. We fine-tune our network starting with a model trained on ImageNet-1K for the image classification task. We tie the weights of the fully connected primary and secondary layers (fc6, fc7), but not for the final scoring models. We set the learning rate to 0.0001, the batch size to 30 and consider 2 images per batch. We pick N = 10 and train for 10K iterations. Larger learning rates prevented fine-tuning from converging.

Thus for the case of the simple RCNN baseline that uses only primary regions and no secondary regions, this means that each batch contains 2 images and 30 ROIs for the ROI-Pooling layer.

Assuming the aforementioned assumption holds, in case the two images contain only 1 primary region each, with what do you fill the rest of the batch (as there should be 28 positions left empty) ?

Since the number of primary regions is not fixed per image, do you enforce the number of data augmentation samples to be balanced per class somehow?

Would it be possible to share the results you achieve without using data augmentation?

csmailis avatar Mar 04 '18 03:03 csmailis

I believe this part of the code answers your questions

https://github.com/gkioxari/RstarCNN/blob/master/lib/data_layer/minibatch.py#L78

gkioxari avatar Mar 05 '18 00:03 gkioxari