VLMEvalKit Local judge LLM?

Is it possible to use locally deployed LLM like LLaVa-Critic as judge LLM instead of calling GPT4 API?

Dec 12 '24 17:12 lyzhongcrd

Hi, @lyzhongcrd , Yeah. However, we recommend you use the same LLM as the judger for all LMMs to make it comparable. For MCQ or Y/N benchmarks, when LLMs are only used as choice extractor for more accurate evaluation, using different LLMs will not lead to significantly different results.

Dec 17 '24 09:12 kennymckormick

@kennymckormick Could you tell me how to use locally deployed LLMs as judge LLM in the VLM eval kit? Thanks.

Dec 20 '24 08:12 lyzhongcrd

@kennymckormick您能告诉我如何使用本地部署的LLM作为VLM评估套件中的判断LLM吗？谢谢。

请问一下贴主，如果我要评估的模型是VLM，本地部署的模型应该是LLM还是VLM呢

Dec 24 '24 02:12 Leke-G

@kennymckormick Same question +1, how to use locally deployed LLMs as judge LLM in the VLM eval kit? Thanks.

Jul 17 '25 11:07 zl9501

Hi, @lyzhongcrd , Yeah. However, we recommend you use the same LLM as the judger for all LMMs to make it comparable. For MCQ or Y/N benchmarks, when LLMs are only used as choice extractor for more accurate evaluation, using different LLMs will not lead to significantly different results.

Hi, How much would it cost to evaluate on all major benchmarks with GPT4 API ? and how to use locally LLMs as judge LLM ?

Sep 24 '25 17:09 Keyird