Sameer Khanna
Sameer Khanna
The math derivations in the paper make sense to me, but the code does not seem to match unless I am mistaken. There should be a method of handling subsets...
print(f"Using CUDA: {is_available()}") device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') class HuggingFaceImageEncoder(Module): """"Wrapper for HuggingFace pretrained CLIP image encoder""" def __init__(self, projection_dim): super().__init__() # ViT_dim is 768 for 'openai/clip-vit-base-patch32' self.model...
Say we choose a batch size of 64, and a sub-batch size of 16. This means we split each input into 64/16 = 4 sub batches. For contrastive learning, we...