Sameer Khanna comments

Repositories
Issues
Comments

Results 3 comments of


                                            Sameer Khanna

How does this provide the same gradient as a larger batch size?

The math derivations in the paper make sense to me, but the code does not seem to match unless I am mistaken. There should be a method of handling subsets...

How does this provide the same gradient as a larger batch size?

print(f"Using CUDA: {is_available()}") device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') class HuggingFaceImageEncoder(Module): """"Wrapper for HuggingFace pretrained CLIP image encoder""" def __init__(self, projection_dim): super().__init__() # ViT_dim is 768 for 'openai/clip-vit-base-patch32' self.model...

How does this provide the same gradient as a larger batch size?

Say we choose a batch size of 64, and a sub-batch size of 16. This means we split each input into 64/16 = 4 sub batches. For contrastive learning, we...