TextCoT
TextCoT copied to clipboard
Results on the TextVQA benchmark
The results of LLaVA-v1.5 on the TextVQA benchmark reported in the paper are much lower than those in the LLaVA-v1.5 paper.
In LLaVA's TextVQA evaluation, OCR data was incorporated into the textual questions. We conducted evaluation experiments according to MultimodalOCR (https://arxiv.org/abs/2305.07895) (https://github.com/Yuliang-Liu/MultimodalOCR). In their TextVQA evaluation dataset, OCR data wasn't utilized. Our baseline results are similar to those outlined in their paper.