CLIP_prefix_caption A question about applying models to new datasets

Hello, thank you very much for your excellent work！ I tried to apply the model to a new dataset, and I processed the dataset according to the required data format, but there seemed to be some problems with my results.

The resulting results are similar to: "12935.jpg ThereĠareĠtwoĠgroundtrackfieldsĠinĠtheĠimageĠaboveĠtheĠimageĠabove….".

I have checked that there are no redundant characters in my annotation. Could you please explain the cause? Could it be GPT2? Thank you very much!

Jul 07 '22 03:07 Waiting-TT

I think the possible reason could be that you used GPT2Tokenizer to tokenize your captions. You could try nltk.tokenize.word_tokenize instead.

Feb 17 '23 11:02 YiranHuangIrene

i hvae a question.I tried to use coco's json downloaded from the Internet, but it was not successful because it was different from the json marked in the code. However, I would like to know what his json file is like, why it seems that each image has only one caption.I would like to ask you, how do you make the json file of the data set you need

Jul 14 '23 06:07 rongtongxueya