BLIP
BLIP copied to clipboard
I want to use the existing image-text pedestrian dataset and finetune the BLIP model. Should I use pre-trained checkpoints weights or finetuned checkpoints weights?
I want to use the existing image-text pedestrian dataset and finetune the BLIP model. Should I use pre-trained checkpoints weights or finetuned checkpoints weights? The generated text should be as detailed as possible, with a length of 40-60! Looking forward to your answer! Thank you!