llama3.java icon indicating copy to clipboard operation
llama3.java copied to clipboard

support for llama3.2 vision

Open vaiju1981 opened this issue 1 year ago • 2 comments

First of all thanks for the amazing work. It helps us build a very simple yet efficient router within our Java applications.

I was wondering if there is any plan to support LLama3.2 vision models.

--Thanks and Regards Vaijanath

vaiju1981 avatar Dec 17 '24 20:12 vaiju1981

I looked into implementing the vision encoder component, specially for QwenVL models, which were merged into llama.cpp just a few days ago. I work on this on my spare time, which is not much lately. To make it easier in the future, I'm working on a simple tensor library for inference in Java. Slowly but I'm on it, I really enjoy hacking on this.

mukel avatar Dec 17 '24 20:12 mukel

If you can provide with tensor library, I can take a stab at it. Right now in order to make llama3.2 vision to work with current code i need to make weights to have List<FloatTensor[]> and have identity operations for missing layers.

for example attn_q.weight is not available for all the layers.

vaiju1981 avatar Dec 18 '24 00:12 vaiju1981