CLIP icon indicating copy to clipboard operation
CLIP copied to clipboard

Can a text-guided model focus on the features of a specific area in an image?

Open lucky-asia opened this issue 9 months ago • 1 comments

For example, if we input a facial image, can the text-guided network focus on the mouth area? Is this achievable?

lucky-asia avatar May 13 '25 08:05 lucky-asia

is the intention here to describe it? as in, do you want to get labels or a description for the mouth area?

jennyluciav avatar May 22 '25 14:05 jennyluciav