Can a text-guided model focus on the features of a specific area in an image?

Open lucky-asia opened this issue 9 months ago • 1 comments

For example, if we input a facial image, can the text-guided network focus on the mouth area? Is this achievable?

May 13 '25 08:05 lucky-asia

is the intention here to describe it? as in, do you want to get labels or a description for the mouth area?

May 22 '25 14:05 jennyluciav