llama.cpp Support LLaVA-UHD

https://github.com/thunlp/LLaVA-UHD

This method is seemingly on par with or better than LLaVA 1.6 Next, however they opensourced the training code for reproduction.

LLM analysis from Gemini 1.5 pro:

Feature LLaVA-UHD-13B LLaVA-NeXT-7B LLaVA-NeXT-13B LLaVA-NeXT-34B LLaVA 1.5-13B

VQAv2 81.7 81.8 (Vicuna) / 82.2 (Mistral) 82.8 83.7 80

GQA 65.2 64.2 (Vicuna) / 64.8 (Mistral) 65.4 67.1 63.3

TextVQA 67.7 64.9 (Vicuna) / 65.7 (Mistral) 67.1 69.5 61.3

ScienceQA 72 70.1 (Vicuna) / 72.8 (Mistral) 73.6 81.8 71.6

VizWiz 56.1 57.6 (Vicuna) / 60.0 (Mistral) 60.5 63.8 53.6

MMU (val) 36.4 35.8 (Vicuna) / 35.3 (Mistral) 36.2 51.1 36.4

MMU (test) 33.6 - - 44.7 33.6

MME 1535 1519 (Vicuna) / 1498 (Mistral) 1575 1631 1531

POPE 89.1 86.5 (Vicuna) / 86.7 (Mistral) 86.2 87.7 85.9

Observations:

LLaVA-UHD generally performs better than LLaVA 1.5 across all metrics.

LLaVA-NeXT series shows comparable performance to LLaVA-UHD on most tasks, with slight variations depending on the specific model (Vicuna or Mistral).

LLaVA-NeXT-34B stands out with significantly higher performance on ScienceQA and MMU tasks.

Feature	LLaVA-UHD-13B	LLaVA-NeXT-7B	LLaVA-NeXT-13B	LLaVA-NeXT-34B	LLaVA 1.5-13B
VQAv2	81.7	81.8 (Vicuna) / 82.2 (Mistral)	82.8	83.7	80
GQA	65.2	64.2 (Vicuna) / 64.8 (Mistral)	65.4	67.1	63.3
TextVQA	67.7	64.9 (Vicuna) / 65.7 (Mistral)	67.1	69.5	61.3
ScienceQA	72	70.1 (Vicuna) / 72.8 (Mistral)	73.6	81.8	71.6
VizWiz	56.1	57.6 (Vicuna) / 60.0 (Mistral)	60.5	63.8	53.6
MMU (val)	36.4	35.8 (Vicuna) / 35.3 (Mistral)	36.2	51.1	36.4
MMU (test)	33.6	-	-	44.7	33.6
MME	1535	1519 (Vicuna) / 1498 (Mistral)	1575	1631	1531
POPE	89.1	86.5 (Vicuna) / 86.7 (Mistral)	86.2	87.7	85.9

Originally posted by @choyakawa in https://github.com/thunlp/LLaVA-UHD/issues/1#issuecomment-2005996645

Mar 19 '24 07:03 choyakawa

Moreover, the model can be efficiently trained in academic settings, within 23 hours on 8 A100 GPUs (vs. 26 hours of LLaVA-1.5).

Mar 19 '24 07:03 choyakawa

This issue was closed because it has been inactive for 14 days since being marked as stale.

May 03 '24 01:05 github-actions[bot]