BLIP Batch predictions Image Captioning task

Hi, glad to see and use this cool project, thanks you. I have a question: if it possible to batch predictions on Image captioning task? I see https://github.com/salesforce/BLIP/issues/48 but it's not my case.

i do something like:

base_model_path = 'path_to_base_model' model_base = blip_decoder(pretrained=base_model_path, vit='base', image_size=IMAGE_SIZE) model_base.eval() model_base.to(device)

img = transform(sample).unsqueeze(0).to(device) with torch.no_grad(): caption_bs_base=model_base.generate(img, sample=False, num_beams=7, max_length=16, min_length=5)

It works good, but i want to inference 4 models(vit base/large and beam search/nucleus sampling) and it's to long. On my server signature 12 pictures 4 models takes ~34 sec (12*4 = 48 signature).

Thanks you.

May 18 '22 08:05 MikeMACintosh

Yes you can do batch inference.

May 23 '22 02:05 LiJunnan1992

@LiJunnan1992 Сould you explain how i can do that? Should I write my own Dataloader?

May 23 '22 08:05 MikeMACintosh

yes you have to write your own data loader I just done it myself

May 26 '22 05:05 poipiii