sd-scripts icon indicating copy to clipboard operation
sd-scripts copied to clipboard

Add OFA captioning

Open sheldonxxxx opened this issue 3 years ago • 1 comments

Hi,

I have integrated a new captioning model(OFA) to the repo. Link to OFA model repo: https://github.com/OFA-Sys/OFA.git

Below is a quick comparison betwen OFA, BLIP and BLIP2: gundam_1671601028_0 BLIP: girl blowing a toy in the air as she stands near her BLIP2 OPT_6.7B: a girl with red hair kissing a robot OFA caption_huge_best: a girl with red hair holding a small robot to her face OFA setting: num_beams: 3, max_len: 16, temperature: 0.5

Performance on my 1070 mobile:

  1. ~3GB for batch_size=1
  2. ~7.5GB for batch_size=20, max_data_loader_n_workers=4
  3. With the setting from 2, it runs at 12s/it, which converts to 0.15s per image

Install requirement: The code for OFA contains a custom version of fairseq, and need to build it from source. Is best to install build-essential before running pip install.

sheldonxxxx avatar Feb 08 '23 17:02 sheldonxxxx

Thank you for this! OFA captioning seemed to be good!

However, it seems difficult to include all OFA codes in the repository for future code management. It would be nice if there is a simpler way to do this.

Also, our repository can process the caption (text) files if they are created by any process. So, I think it would be an idea for you to create your own independent repository.

kohya-ss avatar Feb 09 '23 12:02 kohya-ss