Youssef Adarrab
Youssef Adarrab
### Model description [LLaVA](https://llava-vl.github.io/) is a multimodal model that combines a vision encoder and Vicuna for general-purpose visual and language understanding, "achieving impressive chat capabilities mimicking spirits of the multimodal...
# What does this PR do? This PR adds the LlaVA model ([https://arxiv.org/abs/2304.08485](https://arxiv.org/abs/2304.08485)), an end-to-end trained large multimodal model that connects a vision encoder and LLM for general-purpose visual and...
### Feature request Is it possible to start supporting Pytorch and TensorRT inference optimizations? There are a lot of use cases where it could be useful, and optimum seems to...
Hello, Congratulations for the great work! Do you think it is possible to add the model to Huggingface transformers? Are you planning on doing it? Thanks a lot and looking...