Dinosaurcubs comments

Repositories
Issues
Comments

Results 3 comments of


                                            Dinosaurcubs

How could I only use image encoder to do classification task after fine tuning the entire CLIP.

I met the same question...Have u done with it?

Why does CLIP always need softmax and not simple Cosine Similarity

> You don't need to do > > image_features /= image_features.norm(dim=-1, keepdim=True) text_features /= text_features.norm(dim=-1, keepdim=True) > > if you're using cosine_similarity. torch.cosine_similarity(x, y) already normalizes the inputs, by nature...

RuntimeError: The size of tensor a (2) must match the size of tensor b (50) at non-singleton dimension 1

> Found the solution. My problem was the size of the images: I had batches of dimension (16, 3, 32, 32) (16 images per batch, 3 channels, 32 height/width). Got...