I want to use the existing image-text pedestrian dataset and finetune the BLIP model. Should I use pre-trained checkpoints weights or finetuned checkpoints weights?

Open shams2023 opened this issue 1 year ago • 0 comments

I want to use the existing image-text pedestrian dataset and finetune the BLIP model. Should I use pre-trained checkpoints weights or finetuned checkpoints weights? The generated text should be as detailed as possible, with a length of 40-60! Looking forward to your answer! Thank you!

Mar 28 '24 02:03 shams2023