Peng Jin
Peng Jin
Thank you for your attention to our work. The code and inference steps for MSVD can be found in our work EMCL (https://github.com/jpthu17/EMCL/tree/main/video_retrieval/EMCL-Net#train-on-msvd). I have other things to do recently,...
> Hi, I am facing the same issue when trying to train on the MSVD dataset. I got the same errors as the message above. I'm sorry for not replying...
Sorry, because I have not encountered this error, can you provide more information? Or have you tried to re-download the model and restart the demo?
We use standard multi-head attention. Since LLaMA 3 uses grouped-query attention, we guess that LLaVA made changes following LLaMA 3. (The main purpose of grouped-query attention is to reduce KV...
16GB-VRAM doesn't seem like enough for training the model, you can try using Lora and reducing the batch size. For model inference, 16GB-VRAM is sufficient.
I am sorry to reply to you so late because I have been busy with other projects recently. The relevant code is below, we are currently trying to improve this...
Thank you for your interest in our work. I will solve this error as soon as possible.
If you intend to perform full parameter fine-tuning, it should be carried out on the ```8*A100 (80G)```. If opting for Lora, the tuning process is feasible on ```4*V100 (32G)```. To...
> Hi,any plan to support some decent Chinese LLM models? There are now some hight quality Chinese video instruct data now. We will support fine-tuning of the Qwen and mixture...
I added phi2 code, but the code seems to have bugs. See https://github.com/PKU-YuanGroup/Chat-UniVi/tree/main/ChatUniVi/model/language_model I hope this helps. But the code seems to have bugs. The phi2 model often hangs when...