CLIP_prefix_caption
CLIP_prefix_caption copied to clipboard
Simple image captioning model
Hello. I've been trying to distinguish between "prefix_length" and "clip_length" and I kind of understand that prefix_length is the learnable part that you attach to GPT2 input, but not what...
https://github.com/rmokady/CLIP_prefix_caption/blob/1ad805a844a62ab2e5480479aa021bccf0d4d12a/train.py#L230 I think if your labels are None you can create labels data. So it would be `if labels is None`
Hi @rmokady, Thank you for your nice work, I learned a lot from it. Since the default CLIP model you are using seems to be the ViT-B32 version, I am...
Thank you for your work. Do you have any plans to add code that supports multiple gpu's?
I would like to ask how can Clip-Cap generate multiple different sentences for one image? I've changed the` entry_count` count in the `generate2()` function, but the output sentence is the...
Heihei, here's my issues: a) I'm getting a very low variety of outputs with ([very] different) custom images, e.g. "...sitting on a cellphone", "...with a cellphone", "...a cellphone and a...
prefix_dim = 640 in the pretrained model. But how to translate CLIP's 512 embedding into 640 before forwarding to the net?
Hey, real quick question - I'm putting together a custom dataset of captions in the coco format, and it's pretty obvious how the "image_id" and "caption" values are put to...
I have trained the model (both MLP and GPT-2) using the CC3M dataset but the loss doesn't seem to decrease very much (stays around 3.0). What loss can I expect...
I am trying to train data having roughly 3000 images on google colab GPU and resulting GPU error as below. So I tried giving 50 images to process then it...