Local judge LLM?
Is it possible to use locally deployed LLM like LLaVa-Critic as judge LLM instead of calling GPT4 API?
Hi, @lyzhongcrd , Yeah. However, we recommend you use the same LLM as the judger for all LMMs to make it comparable. For MCQ or Y/N benchmarks, when LLMs are only used as choice extractor for more accurate evaluation, using different LLMs will not lead to significantly different results.
@kennymckormick Could you tell me how to use locally deployed LLMs as judge LLM in the VLM eval kit? Thanks.
@kennymckormick您能告诉我如何使用本地部署的LLM作为VLM评估套件中的判断LLM吗?谢谢。
请问一下贴主,如果我要评估的模型是VLM,本地部署的模型应该是LLM还是VLM呢
@kennymckormick Same question +1, how to use locally deployed LLMs as judge LLM in the VLM eval kit? Thanks.
Hi, @lyzhongcrd , Yeah. However, we recommend you use the same LLM as the judger for all LMMs to make it comparable. For MCQ or Y/N benchmarks, when LLMs are only used as choice extractor for more accurate evaluation, using different LLMs will not lead to significantly different results.
Hi, How much would it cost to evaluate on all major benchmarks with GPT4 API ? and how to use locally LLMs as judge LLM ?