support for llama3.2 vision
First of all thanks for the amazing work. It helps us build a very simple yet efficient router within our Java applications.
I was wondering if there is any plan to support LLama3.2 vision models.
--Thanks and Regards Vaijanath
I looked into implementing the vision encoder component, specially for QwenVL models, which were merged into llama.cpp just a few days ago. I work on this on my spare time, which is not much lately. To make it easier in the future, I'm working on a simple tensor library for inference in Java. Slowly but I'm on it, I really enjoy hacking on this.
If you can provide with tensor library, I can take a stab at it. Right now in order to make llama3.2 vision to work with current code i need to make weights to have List<FloatTensor[]> and have identity operations for missing layers.
for example attn_q.weight is not available for all the layers.