OpenShape_code how to generate lvis_cat_name_pt

Hi, I have a question on how to generate lvis_cat_name_pt_feat.npy used for evaluation.

I wrote the following code to mimick your extracted embeddings, the templates is copied from ULIP-templates:

from open_clip import tokenize

meta_path = osp.join(path_to_meta_data, "split", "lvis.json")
with open(meta_path, "r") as file:
    meta = json.load(file)
cats = sorted(np.unique([data['category'] for data in meta]))

# load your extracted embeddings
cat_embs_openshape_big_g_14 = np.load(osp.join(path_to_meta_data, "lvis_cat_name_pt_feat.npy"), allow_pickle=True)

clip, _, _ = open_clip.create_model_and_transforms(model_name="ViT-bigG-14", pretrained='laion2b_s39b_b160k')
clip.to("cuda").eval()
with torch.no_grad():
      gen_cat_embs = []
      for name in tqdm.tqdm(cats):
          texts = [template.format(name) for template in _TEMPLATES] 
          texts = tokenize(texts).to(device=device, non_blocking=True)
          if len(texts.shape) < 2:
              texts = texts[None, ...]
          class_emb = clip.encode_text(texts)
          class_emb = class_emb / class_emb.norm(dim=-1, keepdim=True)
          class_emb = class_emb.mean(dim=0)
          gen_cat_embs.append(class_emb)
      gen_cat_embs = torch.stack(gen_cat_embs, dim=0).cpu().numpy() # C x D
assert np.allclose(gen_cat_embs, cat_embs_openshape_big_g_14)

The assertion above gives wrong and I mannually checked that none of the row in gen_cat_embs equals to the 1st row of cat_embs_openshape_big_g_14, so the problem is not about the order of category names or templates. Could you please give a hint on this?

Mar 16 '25 09:03 seanzhuh

Hi, Please check my newest commit, which includes two version (w/ and w/o templates)

Mar 16 '25 22:03 Colin97

Thanks for your fast reply. On the template version, there is no

from models.ULIP_models import ULIP_WITH_IMAGE
from utils.tokenizer import SimpleTokenizer

and this file /datasets-slow1/Objaverse/lvis-annotations.json also does not exist

Mar 17 '25 02:03 seanzhuh

And you use /kaiming-fast-vol/workspace/ULIP_copy/initialize_models/slip_base_100ep.pt to encode the prompted text embeddings instead of ViT-big-G-14?

Mar 17 '25 02:03 seanzhuh

extract_lvis_feat_ulip.py is used for retraining and evaluating ULIP on the Objaverse dataset. Could you try the other script (without the template) to see if the text feature matches lvis_cat_name_pt_feat.npy?

Mar 18 '25 07:03 Colin97