Peng Jin

Results 26 comments of Peng Jin

Thank you for your attention to our work. The code and inference steps for MSVD can be found in our work EMCL (https://github.com/jpthu17/EMCL/tree/main/video_retrieval/EMCL-Net#train-on-msvd). I have other things to do recently,...

> Hi, I am facing the same issue when trying to train on the MSVD dataset. I got the same errors as the message above. I'm sorry for not replying...

Sorry, because I have not encountered this error, can you provide more information? Or have you tried to re-download the model and restart the demo?

We use standard multi-head attention. Since LLaMA 3 uses grouped-query attention, we guess that LLaVA made changes following LLaMA 3. (The main purpose of grouped-query attention is to reduce KV...

16GB-VRAM doesn't seem like enough for training the model, you can try using Lora and reducing the batch size. For model inference, 16GB-VRAM is sufficient.

I am sorry to reply to you so late because I have been busy with other projects recently. The relevant code is below, we are currently trying to improve this...

Thank you for your interest in our work. I will solve this error as soon as possible.

If you intend to perform full parameter fine-tuning, it should be carried out on the ```8*A100 (80G)```. If opting for Lora, the tuning process is feasible on ```4*V100 (32G)```. To...

> Hi,any plan to support some decent Chinese LLM models? There are now some hight quality Chinese video instruct data now. We will support fine-tuning of the Qwen and mixture...

I added phi2 code, but the code seems to have bugs. See https://github.com/PKU-YuanGroup/Chat-UniVi/tree/main/ChatUniVi/model/language_model I hope this helps. But the code seems to have bugs. The phi2 model often hangs when...