sam-hq Question: Is it SAM-HQ model applicable for predicting segmentation mask for the input images without boxes, point or label?

If I understand it correctly, both SAM and SAM-HQ takes box points, input points and labels (text) as input along with the input image. What about the input images that we don't have these information available for?

If we want to take the human completely out from the scenario and would want the model to take the input and predict the mask, what changes do we need to make to the model?

Dec 06 '23 22:12 mzg0108

we can use the everything mode as demonstrated here, which input uniform sampled points on the images as prompt.

Dec 09 '23 09:12 lkeab

if I'm not mistaken, you need the prompt encoder to determine embeddings on an image to mask. automask generator actually is a bit misleading as it just generates a point prompt every 20 pixels or so. For each point prompt, embeddings are encoded and these are matched with the model on their IOU and the most probable (or top 3) is determined to be the masks.

Dec 24 '23 06:12 jez-moxmo