CLIP
CLIP copied to clipboard
Do the text encoders vary between different clip models
when we load clip model, eg model_1, preprocess = clip.load("RN50", device=device, jit=False) model_2, preprocess = clip.load("ViT-B/16", device=device, jit=False)
Obviously, the image encoders in model_1 and model_2 are different(ResNet and ViT), how about the text encoder in these two models, are they also different?