BLIP Finetune vqa by own data

Hi, I'm here again. This time I meet a new question. I want to finetune the vqa checkpoints with my own data. There are two questions confused me. The 1st one is that which init checkpoints should I choose. The pre-trained checkpoints or the finetuned checkpoints on VQA? I noticed both checkpoints you provide download :) The 2nd question is how to create the vqa dataset in the correct format. I have some photos now, and for all these photos, I just want to ask an yes/no question on it. I haved checked the coco vqa annotations, I found for every picture, It has five keys per image and it seems have 3 important keys for me : 'question_id', 'question', 'answer'.
If I want to create my own question, which id I should use for the question? Maybe just choose an unique number? And there are many repetitions words in the 'answer', I can't understand what does this mean. If I set a yes or no question, how to set the 'answer'? Looking forward to your reply and thanks for your help.

Aug 24 '22 09:08 SKBL5694

+1

Sep 13 '23 10:09 lyp0413

@SKBL5694 Has your 2nd question been resolved? I have the same problem.

Dec 12 '23 10:12 WKaiH123