Haozhan Shen comments

Results 6 comments of


                                            Haozhan Shen

Some doubt about contrastive loss and the output of BertImgForPreTraining

I want to know where contrasive loss is and how to show the ability about image-text-alignment of pretrained model. Thanks in advance～

Where to find the Evaluation Scripts for InstructBLIP paper's Table 1？

I also met the same problem. For example, I only achieve the 40.72acc on the GQA test-dev for vicuna7b. And my used checkpoint is just the weight recorded in lavis/config/models/blip2_instruct_vicuna7b.yaml....

How to generate the negative caption?

Thanks for your interest~ Some datasets have meta info files including the syntactic structure of each text and the part-of-speech of each word; based on the part-of-speech, we performed direct...

Code for Other Models

Thanks for your interest~ The experiments related to VinVL were originally conducted on Colab. Due to the complexity of this model’s dependencies, we encountered numerous environment-related errors during local deployment....

训练代码不会开源么

Hello, our method directly uses pre-trained models and does not involve training process. If you are referring to the fine-tuned model mentioned in Section 4.6 of the paper, you can...

Regarding the issue of fine-tuning Llava onevision

> > Hello! Have you figured out? I used multi-image data to finetune llava-ov-0.5B, but gibberish output. Have you encountered the same situation? Thanks! > > I had the same...