CLIP_prefix_caption issues

Question about "clip_length" and "prefix_length" difference

Hello. I've been trying to distinguish between "prefix_length" and "clip_length" and I kind of understand that prefix_length is the learnable part that you attach to GPT2 input, but not what...

YueyangLiulyy

it would be None

https://github.com/rmokady/CLIP_prefix_caption/blob/1ad805a844a62ab2e5480479aa021bccf0d4d12a/train.py#L230 I think if your labels are None you can create labels data. So it would be `if labels is None`

enes3774

have you tried different CLIP models?

4

Hi @rmokady, Thank you for your nice work, I learned a lot from it. Since the default CLIP model you are using seems to be the ViT-B32 version, I am...

dhansmair

Adding code for batched run with multiple gpu support, for faster training?

Thank you for your work. Do you have any plans to add code that supports multiple gpu's?

raojay7

I would like to ask how can Clip-Cap generate multiple different sentences for one image?

3

I would like to ask how can Clip-Cap generate multiple different sentences for one image? I've changed the` entry_count` count in the `generate2()` function, but the output sentence is the...

linhuixiao

Low variety of outputs.

Heihei, here's my issues: a) I'm getting a very low variety of outputs with ([very] different) custom images, e.g. "...sitting on a cellphone", "...with a cellphone", "...a cellphone and a...

hideosnes

For the transformer (without fine-tuning GPT-2) we provide COCO pretrained model.

1

prefix_dim = 640 in the pretrained model. But how to translate CLIP's 512 embedding into 640 before forwarding to the net?

fido20160817

purpose of "id" element in coco train data?

1

Hey, real quick question - I'm putting together a custom dataset of captions in the coco format, and it's pretty obvious how the "image_id" and "caption" values are put to...

stevebottos

Conceptual Captions Training

9

I have trained the model (both MLP and GPT-2) using the CC3M dataset but the loss doesn't seem to decrease very much (stays around 3.0). What loss can I expect...

goel-shashank

Not able to train custom data.

3

I am trying to train data having roughly 3000 images on google colab GPU and resulting GPU error as below. So I tried giving 50 images to process then it...

sanjaygunda13

CLIP_prefix_caption
CLIP_prefix_caption copied to clipboard

Metadata

Question about "clip_length" and "prefix_length" difference

it would be None

have you tried different CLIP models?

Adding code for batched run with multiple gpu support, for faster training?

I would like to ask how can Clip-Cap generate multiple different sentences for one image?

Low variety of outputs.

For the transformer (without fine-tuning GPT-2) we provide COCO pretrained model.

purpose of "id" element in coco train data?

Conceptual Captions Training

Not able to train custom data.

← Metadata

Owner

Metadata

CLIP_prefix_caption CLIP_prefix_caption copied to clipboard

Metadata

← Metadata

Owner

Metadata

CLIP_prefix_caption
CLIP_prefix_caption copied to clipboard