TextCoT
TextCoT copied to clipboard
The official repo for “TextCoT: Zoom In for Enhanced Multimodal Text-Rich Image Understanding”.
is there a demo?
There is No answer_format.json file!!!
The results of LLaVA-v1.5 on the TextVQA benchmark reported in the paper are much lower than those in the LLaVA-v1.5 paper.
How to use this method when I input multiple images.
Hello, did you customize the llava package and add this additional script? Could you share with it please?