Captions generated are very short
Hello, I was using BLIP captioning and I found that captions are getting cut off rather than being complete captions. Is there a way to extend the length of the tokens/captions like we can do in Kohya or CLIP Interrogator?
Example. OneTrainer Caption: a woman in a long dress, holding a wand and pointing at the
Clip Interrogator 2 Fast mode: https://huggingface.co/spaces/fffiloni/CLIP-Interrogator-2 a woman standing on top of a mountain holding a wand, digital art fantasy art, digital art fantasy, very beautiful fantasy art, greek myth digital painting, fantasy digital art, very beautiful digital art, sky witch, beautiful fantasy art, maya ali as a wind mage, [[fantasy]], detailed cover artwork, fantasy digital painting, dreamlike digital painting
Wanted to follow up on this, I think the default setting for BLIP is 20 tokens. Maybe we could add a parameter to allow us to adjust this?
Colleagues, just use a TagGui for captioning. I'm using it in two stages:
- Caption all images with one of the WD models with max length 75 tokens
- Using the Mistral 7B based model or COGVLM v2 model, asking it to read the tags from WD as a context and produce more sophisticated natural language captions (adding to WD tags or replacing the WD tags completely depending on my goal).
WD / BLIP is by design giving a short information. Use more sophisticated LLMs if you need a more rich caption.
If someone wants to PR this, then by all means but the current position of almost everyone thats a dev or experienced is to use dedicated captioning software.