visual_prompting icon indicating copy to clipboard operation
visual_prompting copied to clipboard

Input Grids and Support pairs

Open KhatreeSuneel opened this issue 7 months ago • 0 comments

Hello. I have a clarification question regarding input grids and support pairs.

It looks like the model always works with a 224×224 input image, which is tokenized into 14×14 patches. If we want to include more number of support pairs (more rows) it seems we have to fit them within the 14×14 patches which means there will be tradeoff between number of support pairs and image resolution. Is it correct? And if so I have a question regarding the figure in paper where it's shown that more examples,better results. Were the 5 support pairs in the grid as shown in the figure within these 14×14 patches (lower resolution per image) and the model still produced better results?

Thanks!

KhatreeSuneel avatar Sep 02 '25 18:09 KhatreeSuneel