OpenShape_code icon indicating copy to clipboard operation
OpenShape_code copied to clipboard

Preprocessing of pretraining data

Open seanzhuh opened this issue 10 months ago • 3 comments

Hi, Thanks for open-sourcing such a wonderful work.

Regarding the preprocessing of pretraining data, did you apply this template prompts in ULIP to both the raw texts, blip and msft generated captions, but not the retrieved texts right? As I can tell, in pretraining data, each *.npy file contains embeddings of "original" and "prompt_avg" versions for text_feat, blip_caption_feat, msft_caption_feat, only the retrieval_text_feat does not have "prompt_avg" version.

And could you please give me a hint on where to download the thumbnail images you used to extract the thumbnail feats as the thumbnail images are not included in released pretraining data, only the extracted thumbnail embeddings are available.

If you can provide your full preprocess file to extract text and image embeddings, it would be of great help!

seanzhuh avatar Mar 10 '25 02:03 seanzhuh

I noticed that for rendered images, there is one extra merged_img.png alongside colors_{0-11}.png, I look into the merged_img.png and it is stitch of all 12 views, is this what you used to extract the thumbnail image feature? But your paper said the thumbnail image is a single view image

seanzhuh avatar Mar 10 '25 04:03 seanzhuh

The thumbnail images are from the original Objaverse dataset. You should check the meta info from Objaverse dataset.

Colin97 avatar Mar 16 '25 22:03 Colin97

Hello, could you please provide the captioning script or the instruction prompt you used, is it just "please describe this image in detail"?

seanzhuh avatar Jun 17 '25 07:06 seanzhuh