david-az
david-az
Hi, thank you for the great work. I would like to know if anyone has done experiments pre-training with larger Image Size, especially when pre-training for object detection/segmentation tasks ?...
Hi, thank you for your great work. The number of FLOPs and the numbers of parameters are less than Swin Transformer, however the training time of HRFormer is at least...
In the BLIP-2 paper, it is specified that: "[Q-Former] _extracts a fixed number of output features from the image encoder, independent of input image resolution._". However, when using cross-attention, this...