OneTrainer icon indicating copy to clipboard operation
OneTrainer copied to clipboard

Captions generated are very short

Open Aamir3d opened this issue 2 years ago • 2 comments

Hello, I was using BLIP captioning and I found that captions are getting cut off rather than being complete captions. Is there a way to extend the length of the tokens/captions like we can do in Kohya or CLIP Interrogator?

Example. OneTrainer Caption: a woman in a long dress, holding a wand and pointing at the

Clip Interrogator 2 Fast mode: https://huggingface.co/spaces/fffiloni/CLIP-Interrogator-2 a woman standing on top of a mountain holding a wand, digital art fantasy art, digital art fantasy, very beautiful fantasy art, greek myth digital painting, fantasy digital art, very beautiful digital art, sky witch, beautiful fantasy art, maya ali as a wind mage, [[fantasy]], detailed cover artwork, fantasy digital painting, dreamlike digital painting

Aamir3d avatar Dec 10 '23 16:12 Aamir3d

Wanted to follow up on this, I think the default setting for BLIP is 20 tokens. Maybe we could add a parameter to allow us to adjust this?

aliftadvantage avatar Jan 26 '24 05:01 aliftadvantage

Colleagues, just use a TagGui for captioning. I'm using it in two stages:

  1. Caption all images with one of the WD models with max length 75 tokens
  2. Using the Mistral 7B based model or COGVLM v2 model, asking it to read the tags from WD as a context and produce more sophisticated natural language captions (adding to WD tags or replacing the WD tags completely depending on my goal).

WD / BLIP is by design giving a short information. Use more sophisticated LLMs if you need a more rich caption.

homoluden avatar Aug 17 '24 07:08 homoluden

If someone wants to PR this, then by all means but the current position of almost everyone thats a dev or experienced is to use dedicated captioning software.

O-J1 avatar Feb 16 '25 12:02 O-J1