[Feature Request]: Image to Description Model
Is there an existing issue for this?
- [X] I have searched the existing issues
Contact Details
Through Github
What should this feature add?
Is there are feature planned to automatically create descriptions of a image? (like the reverse of what users currently to from text to image) Sometimes i have an original image and would like to get the same style as the image i already have and a description of the image to see what the AI thinks. Would something like this be possible? (I am not talking about style transfer image-to-image).... i would really be interested in image-to-text description that is as accurate as possible.
Alternatives
No response
Aditional Content
No response
CLIP Interrogator
https://colab.research.google.com/github/pharmapsychotic/clip-interrogator/blob/main/clip_interrogator.ipynb
BLIP
https://huggingface.co/spaces/Salesforce/BLIP
@n00mkrad thank you very very very much, i didnt know that such a model already exists... BLIP "image captioning" is exactly what i was looking for :-) Just noticed the descriptions are extremely small, i was expecting a description with 5-10 sentences.
For example for a person foto: "A Foto of person that is around 62 years old, the body stature of the person is slim. The haircolor is dark brown and the hair length is down to the shoulders. The person wears red jeans with a black belt and a white shirt with pockets. The background shows he is walking on a stone plated way near a river where the water color is blue and there is a lot of vegitation like trees and bushes probably somewhere in Japan. The Sun is in a sunset position and he seems to be freezing."
Shouldn't this be integrated as a feature to invokeAI?
There has been no activity in this issue for 14 days. If this issue is still being experienced, please reply with an updated confirmation that the issue is still being experienced with the latest release.
Still would be really nice to have, the other webui already has this feature as an extension. This should absolutely be a feature
Adding CLIP interrogator or BLIP would be an excellent addition as a Tab or feature to have directly implemented into the web interface.