Add LLaVA model
Model description
LLaVA is a multimodal model that combines a vision encoder and Vicuna for general-purpose visual and language understanding, "achieving impressive chat capabilities mimicking spirits of the multimodal GPT-4".
Open source status
- [X] The model implementation is available
- [X] The model weights are available
Provide useful links for the implementation
https://github.com/haotian-liu/LLaVA
@sgugger and @youssefadr, I want to work on this issue. I am new to open source and hugging face, can u pls provide me some guidance to work on this issue. Any reference issue that helps on getting an idea on this..pls help me out.
@sushmanthreddy It's great you want to contribute a model!
There's a detailed guide in the docs outlining important information about the model class, how it fits in the library and the steps to take to add a model. Let us know if there's anything which is unclear or you hit a blocker. Looking forward to seeing the PR! 🤗
@sushmanthreddy I don't know if you are still planning to work on this model, but if not, I would be glad to contribute on my side 🤗!
Please let me know if working on this issue is still in your plans 🙂!
@youssefadr Sorry, I am busy with my google summer of code work...couldn't contribute much you can go ahead and contribute to it
Hello @youssefadr are you going to take on the work of adding this model in? I'd be happy to collaborate or take this task on
@sushmanthreddy Okay, thank you and good luck with your Google program!
@jprivera44 Hello! Yes, I am going to open a draft PR this week, do not hesitate to collaborate!
That's fantastic @youssefadr, do you mind adding me as a collaborator on your branch so we can plan there on which sections of LLava we are going to tackle? I've got time today to create a branch and add you there if you prefer. Excited for this :)
@amyeroberts any other suggestions on the best way to collaborate with peers on a new model such as this? I read through the suggestions and I appreciate the philosophy of transformers.
@jprivera44 @youssefadr - great to hear that you're both keen to work on this model!
The main piece of advice I have if you're both collaborating on a PR is to make sure that it's clear who is working on what and when - you don't want to find out that one piece has been implemented twice! If working on the same branch, make sure not to force push as well :)
Thanks @amyeroberts, I'm waiting for the approval for the LLaMA weights from Meta at the moment do you know if there is any way to speed up that process?
@youssefadr hey nice job with the pr! I noticed you added a lot of changes, are you working with the 7B, 13B, or 65B parameter count?
@jprivera44 I am planning to work with 7B parameter checkpoint. I think it would be better if we could communicate directly to better collaborate on this model together. What do you think of discussing through Discord ? Here is my username 'Youssef Adarrab#3595'
Fantastic, I'll reach out to you on discord.