Do the text encoders vary between different clip models

Open xiaotingxuan opened this issue 3 years ago • 0 comments

when we load clip model, eg model_1, preprocess = clip.load("RN50", device=device, jit=False) model_2, preprocess = clip.load("ViT-B/16", device=device, jit=False)

Obviously, the image encoders in model_1 and model_2 are different(ResNet and ViT), how about the text encoder in these two models, are they also different?

Oct 29 '22 08:10 xiaotingxuan