vision GroupedBatchSampler in detection reference

🐛 Describe the bug

Since we have multiscale and lsj augmentations which use fix sizes, we no longer resize the images in the transforms of GeneralizedRCNNTransform. However, we still use the GroupedBatchSampler but for me the grouping by aspect ratio is no longer mandatory. Am I right ?

Even though I don't think it is very useful (if the statement above is true), we can easily patch it.

We can change in the training reference script the following lines: https://github.com/pytorch/vision/blob/bd19fb8ea9b1f67df2a2a1ee116874609ad3ee8c/references/detection/train.py#L191-L195 as

    if args.aspect_ratio_group_factor >= 0 and args.data_augmentation not in ["multiscale", "lsj"]:
        group_ids = create_aspect_ratio_groups(dataset, k=args.aspect_ratio_group_factor)
        train_batch_sampler = GroupedBatchSampler(train_sampler, group_ids, args.batch_size)
    else:
        train_batch_sampler = torch.utils.data.BatchSampler(train_sampler, args.batch_size, drop_last=True)

Versions

[pip3] numpy==1.21.5 [pip3] torch==1.11.0 [pip3] torchvision==0.12.0 [conda] blas 1.0 mkl
[conda] cudatoolkit 10.2.89 hfd86e86_1
[conda] ffmpeg 4.3 hf484d3e_0 pytorch [conda] mkl 2021.4.0 h06a4308_640
[conda] mkl-service 2.4.0 py37h7f8727e_0
[conda] mkl_fft 1.3.1 py37hd3c417c_0
[conda] mkl_random 1.2.2 py37h51133e4_0
[conda] numpy 1.21.5 py37he7a7128_1
[conda] numpy-base 1.21.5 py37hf524024_1
[conda] pytorch 1.11.0 py3.7_cuda10.2_cudnn7.6.5_0 pytorch [conda] pytorch-mutex 1.0 cuda pytorch [conda] torchvision 0.12.0 py37_cu102 pytorch

Jul 07 '22 09:07 rvandeghen

@rvandeghen Thanks for your feedback! You are technically right; let me elaborate on some of the nuances.

The introduction of multiscale and lsj augmentations was done to boost the accuracy of existing Detection models and close the gap from SOTA. There are still augmentation strategies that don't do fixed sizes and thus still require the GroupedBatchSampler (which is why it's maintained). Unfortunately since GeneralizedRCNNTransform renders the results of the previous augmentations useless, I had to introduce a really ugly and hacky workaround to turn it off. This parameter is private (undocumented, starting with _ and shall remain nameless here 😄) and will be removed on the near future. Ideally we want to review our detection API to pull the transforms outside of the Detection models but this is going to create a bunch of new issues (such as having to undo the resize ops to recover the original image sizes + any other image postprocessing that happens in the model). So yes, you are right to say that when using these parameters we don't need the GroupedBatchSampler but I would be hesitant to make further modifications to the reference script to avoid exposing these changes to more places. I believe if you do it on your side, it won't affect your accuracy but I haven't done checks myself.

One question for you is what is the motivation for removing the use of GroupedBatchSampler? Is it to clean up part of the code that you correctly say is not mandatory? Or there is an additional reason that I'm currently missing? Thanks!

Jul 07 '22 12:07 datumbox

@datumbox First, I rather use what I call an InfiniteSampler, defined as

class InfiniteSampler(torch.utils.data.Sampler):
    def __init__(self, sampler):
        self.sampler = sampler
        
    def __len__(self):
        return len(self.sampler)

    def __iter__(self,):
        while True:
            yield from self.sampler

which was for me at the time the simplest way to see my data as an infinite data stream.

Second, I build on top of the InfiniteSampler a sampler with other constraints (not based on the aspect ratio), which could break the aspect ratio.

For the first remark, most of the other frameworks (d2, mmcv, ...) propose samplers either based on epoch or iteration, which is not the case here if I'm right.

Jul 08 '22 14:07 rvandeghen