shams2023 comments

Results 48 comments of


                                            shams2023

Fine tune BLIP Image Captioning to custom dataset

> 你好， > > 如果您想微调BLIP 的[Hugging Face 版本，我们确实有一个笔记本：https: ](https://huggingface.co/docs/transformers/model_doc/blip)[//github.com/huggingface/notebooks/blob/main/examples/image_captioning_blip.ipynb](https://github.com/huggingface/notebooks/blob/main/examples/image_captioning_blip.ipynb)。 > > 我们还有一个使用 PEFT (LoRa) 的笔记本：https://github.com/huggingface/notebooks/blob/main/peft/Fine_tune_BLIP2_on_an_image_captioning_dataset_PEFT.ipynb。这可以提高内存效率，因为您只训练几个线性投影层，同时保持模型本身冻结。 Hello! How can I use my own image text dataset to fine tune the BLIP2...

Training with one GPU

> efault process group has not been initialized, please make sure to call init_process_group". ![image](https://github.com/salesforce/BLIP/assets/141383792/f3146a2f-3a3c-42df-9134-7949e09f59b1) ![image](https://github.com/salesforce/BLIP/assets/141383792/c5f86996-f43d-411e-a7f2-e7c9ec59fb62) If you are not using distributed training, changing these two functions to this format...

BLIP-2 input image size setting (image captioning)

> in the BLIP-2 paper, "We propose Q-Former as the trainable module to bridge the gap between a frozen image encoder and a frozen LLM. It extracts a fixed number...

How to fine-tune BLIP-2 on a local Chinese dataset?

> 我想向 BLIP-2 提供图像，作为回报，它应该生成中文描述。谁能指导我如何去做？你成功了吗？我也想这样做（即把我自己搜集的图像给BLIP2，然后让他生成对应的文本描述）

BLIP2: Unable to connect to HuggingFace.co

> I have encountered the same problem. Have you resolved it 手动离线从hF下载所有文件到一个文件夹中，然后指定路径，就可以了

OPT2.7B underperforming & weird behavior compared to flant5xl on image captioning?

> 你好！我正在对 pretrained_flant5xl 和 pretrained_opt2.7b 模型进行微调，令我惊讶的是 flant5xl 模型擅长创建正确的标签，因为我的标题实际上是一串标签。我的目标是确定在复杂的互连标签上训练这些模型的可行性，其中一些模型有子类别，有些则没有。Flan 以非常高的准确度正确生成所有单词，而 opt 开始随机使用像 ~ 这样的字符，这些字符在我的数据集中不存在，并用“urchin”替换了几组标签。因此，在任何可能预测标签 1、2 和 3 的地方，例如它会说 ~ urchin ~，在我的数据集中，这实际上是幻想种族。这清楚地表明模型知道该位置应该有正确的标签，因为它遵循某种逻辑并且仅替换某些标签。 > > 这是我实现的图像-txt 对的自定义数据集。它有点小，大约有2104张图像，它实际上代表了大约有子选项的标签，所以它就像20个选择（4个子类别的平均值）、3个选择、13个选择（7个子类别的平均值）、8个选择、8个选择、4个选择。 > > 需要注意的是，我正在编辑数据集加载器，为 opt 提供 text_input，就像...

OPT2.7B underperforming & weird behavior compared to flant5xl on image captioning?

> yes, my apologies I had seen your other comment and meant to respond to you. I will preface this all with I am no machine learning engineer I am...

OPT2.7B underperforming & weird behavior compared to flant5xl on image captioning?

> 我很抱歉，我意识到我给了你一个不正确的标题数据集来使用 flant5xl，这就是我所需要的，因为 flant5xl 需要text_input和text_output。这是在类的末尾 CaptionDataset（BaseDataset， __DisplMixin）：在caption_datasets.py文件中： return { “image”： self.vis_processor（image）， “text_input”：（prefix）， “text_output”：（caption）， } You're right, bro! Thank you for your help!

How can I train a BLIP model completely from scratch?

Did you succeed, brother?

how to run this code, please provide the step for text to image , please

> _没有提供描述。_ Is there any good code for generating images from zero sample text nowadays (that is, using their pre training weights to directly generate corresponding images for one's own...