Results on the TextVQA benchmark

Open Gavin001201 opened this issue 1 year ago • 1 comments

The results of LLaVA-v1.5 on the TextVQA benchmark reported in the paper are much lower than those in the LLaVA-v1.5 paper.

Aug 26 '24 13:08 Gavin001201

In LLaVA's TextVQA evaluation, OCR data was incorporated into the textual questions. We conducted evaluation experiments according to MultimodalOCR (https://arxiv.org/abs/2305.07895) (https://github.com/Yuliang-Liu/MultimodalOCR). In their TextVQA evaluation dataset, OCR data wasn't utilized. Our baseline results are similar to those outlined in their paper.

Aug 30 '24 02:08 bzluan