L.ZHONG

Results 2 comments of L.ZHONG

@kennymckormick Could you tell me how to use locally deployed LLMs as judge LLM in the VLM eval kit? Thanks.

I can confirm the issue on commit 9f21ee8. `"perception","reasoning","OCR","artwork","celebrity","code_reasoning","color","commonsense_reasoning","count","existence","landmark","numerical_calculation","position","posters","scene","text_translation" "1342.2684073629453","301.42857142857144","130.0","112.0","125.88235294117646","62.5","151.66666666666669","106.42857142857143","93.33333333333333","185.0","138.5","40.0","115.0","140.1360544217687","150.75","92.5" ` The counting score is 93.3. The counting score is 155 in a reference experiment (https://github.com/haotian-liu/LLaVA/issues/927). The score of counting...