VLMEvalKit icon indicating copy to clipboard operation
VLMEvalKit copied to clipboard

Local judge LLM?

Open lyzhongcrd opened this issue 1 year ago • 5 comments

Is it possible to use locally deployed LLM like LLaVa-Critic as judge LLM instead of calling GPT4 API?

lyzhongcrd avatar Dec 12 '24 17:12 lyzhongcrd

Hi, @lyzhongcrd , Yeah. However, we recommend you use the same LLM as the judger for all LMMs to make it comparable. For MCQ or Y/N benchmarks, when LLMs are only used as choice extractor for more accurate evaluation, using different LLMs will not lead to significantly different results.

kennymckormick avatar Dec 17 '24 09:12 kennymckormick

@kennymckormick Could you tell me how to use locally deployed LLMs as judge LLM in the VLM eval kit? Thanks.

lyzhongcrd avatar Dec 20 '24 08:12 lyzhongcrd

@kennymckormick您能告诉我如何使用本地部署的LLM作为VLM评估套件中的判断LLM吗?谢谢。

请问一下贴主,如果我要评估的模型是VLM,本地部署的模型应该是LLM还是VLM呢

Leke-G avatar Dec 24 '24 02:12 Leke-G

@kennymckormick Same question +1, how to use locally deployed LLMs as judge LLM in the VLM eval kit? Thanks.

zl9501 avatar Jul 17 '25 11:07 zl9501

Hi, @lyzhongcrd , Yeah. However, we recommend you use the same LLM as the judger for all LMMs to make it comparable. For MCQ or Y/N benchmarks, when LLMs are only used as choice extractor for more accurate evaluation, using different LLMs will not lead to significantly different results.

Hi, How much would it cost to evaluate on all major benchmarks with GPT4 API ? and how to use locally LLMs as judge LLM ?

Keyird avatar Sep 24 '25 17:09 Keyird