Haozhan Shen
Haozhan Shen
I want to know where contrasive loss is and how to show the ability about image-text-alignment of pretrained model. Thanks in advance~
I also met the same problem. For example, I only achieve the 40.72acc on the GQA test-dev for vicuna7b. And my used checkpoint is just the weight recorded in lavis/config/models/blip2_instruct_vicuna7b.yaml....
Thanks for your interest~ Some datasets have meta info files including the syntactic structure of each text and the part-of-speech of each word; based on the part-of-speech, we performed direct...
Thanks for your interest~ The experiments related to VinVL were originally conducted on Colab. Due to the complexity of this model’s dependencies, we encountered numerous environment-related errors during local deployment....
Hello, our method directly uses pre-trained models and does not involve training process. If you are referring to the fine-tuned model mentioned in Section 4.6 of the paper, you can...
> > Hello! Have you figured out? I used multi-image data to finetune llava-ov-0.5B, but gibberish output. Have you encountered the same situation? Thanks! > > I had the same...