Preprocessing of pretraining data
Hi, Thanks for open-sourcing such a wonderful work.
Regarding the preprocessing of pretraining data, did you apply this template prompts in ULIP to both the raw texts, blip and msft generated captions, but not the retrieved texts right? As I can tell, in pretraining data, each *.npy file contains embeddings of "original" and "prompt_avg" versions for text_feat, blip_caption_feat, msft_caption_feat, only the retrieval_text_feat does not have "prompt_avg" version.
And could you please give me a hint on where to download the thumbnail images you used to extract the thumbnail feats as the thumbnail images are not included in released pretraining data, only the extracted thumbnail embeddings are available.
If you can provide your full preprocess file to extract text and image embeddings, it would be of great help!
I noticed that for rendered images, there is one extra merged_img.png alongside colors_{0-11}.png, I look into the merged_img.png and it is stitch of all 12 views, is this what you used to extract the thumbnail image feature? But your paper said the thumbnail image is a single view image
The thumbnail images are from the original Objaverse dataset. You should check the meta info from Objaverse dataset.
Hello, could you please provide the captioning script or the instruction prompt you used, is it just "please describe this image in detail"?