transformers Add LLaVA model

Model description

LLaVA is a multimodal model that combines a vision encoder and Vicuna for general-purpose visual and language understanding, "achieving impressive chat capabilities mimicking spirits of the multimodal GPT-4".

Open source status

[X] The model implementation is available
[X] The model weights are available

Provide useful links for the implementation

https://github.com/haotian-liu/LLaVA

Apr 19 '23 04:04 youssefadr

@sgugger and @youssefadr, I want to work on this issue. I am new to open source and hugging face, can u pls provide me some guidance to work on this issue. Any reference issue that helps on getting an idea on this..pls help me out.

Apr 21 '23 20:04 sushmanthreddy

@sushmanthreddy It's great you want to contribute a model!

There's a detailed guide in the docs outlining important information about the model class, how it fits in the library and the steps to take to add a model. Let us know if there's anything which is unclear or you hit a blocker. Looking forward to seeing the PR! 🤗

Apr 24 '23 12:04 amyeroberts

@sushmanthreddy I don't know if you are still planning to work on this model, but if not, I would be glad to contribute on my side 🤗!

Please let me know if working on this issue is still in your plans 🙂!

May 22 '23 18:05 youssefadr

@youssefadr Sorry, I am busy with my google summer of code work...couldn't contribute much you can go ahead and contribute to it

May 22 '23 18:05 sushmanthreddy

Hello @youssefadr are you going to take on the work of adding this model in? I'd be happy to collaborate or take this task on

May 22 '23 23:05 jprivera44

@sushmanthreddy Okay, thank you and good luck with your Google program!

@jprivera44 Hello! Yes, I am going to open a draft PR this week, do not hesitate to collaborate!

May 23 '23 06:05 youssefadr

That's fantastic @youssefadr, do you mind adding me as a collaborator on your branch so we can plan there on which sections of LLava we are going to tackle? I've got time today to create a branch and add you there if you prefer. Excited for this :)

@amyeroberts any other suggestions on the best way to collaborate with peers on a new model such as this? I read through the suggestions and I appreciate the philosophy of transformers.

May 23 '23 16:05 jprivera44

@jprivera44 @youssefadr - great to hear that you're both keen to work on this model!

The main piece of advice I have if you're both collaborating on a PR is to make sure that it's clear who is working on what and when - you don't want to find out that one piece has been implemented twice! If working on the same branch, make sure not to force push as well :)

May 28 '23 16:05 amyeroberts

Thanks @amyeroberts, I'm waiting for the approval for the LLaMA weights from Meta at the moment do you know if there is any way to speed up that process?

@youssefadr hey nice job with the pr! I noticed you added a lot of changes, are you working with the 7B, 13B, or 65B parameter count?

May 30 '23 18:05 jprivera44

@jprivera44 I am planning to work with 7B parameter checkpoint. I think it would be better if we could communicate directly to better collaborate on this model together. What do you think of discussing through Discord ? Here is my username 'Youssef Adarrab#3595'

May 30 '23 21:05 youssefadr

Fantastic, I'll reach out to you on discord.

May 30 '23 23:05 jprivera44