TextCoT
TextCoT copied to clipboard

bzluan

→

Metadata

The official repo for “TextCoT: Zoom In for Enhanced Multimodal Text-Rich Image Understanding”.

Reame
Issues

Results 6 TextCoT issues

Sort by recently updated

installation and demo

is there a demo?

GallonDeng

No answer_format.json file!!!

3

There is No answer_format.json file!!!

orormaybe

prepare_stage2_question.py cannot be found.

1

hanzefang

Results on the TextVQA benchmark

1

The results of LLaVA-v1.5 on the TextVQA benchmark reported in the paper are much lower than those in the LLaVA-v1.5 paper.

Gavin001201

Can this method be used when inputting multiple images

How to use this method when I input multiple images.

thunderbolt-fire

llava.eval.eval_stvqa can not be found

1

Hello, did you customize the llava package and add this additional script? Could you share with it please?

SkyFishMoon

About

The official repo for “TextCoT: Zoom In for Enhanced Multimodal Text-Rich Image Understanding”.

chain-of-thought

large-multimodal-models

30

Stars

3

Forks

Watchers

Owner

bzluan

← Metadata

30

Stars

3

Forks

Watchers

Owner

bzluan

Metadata

The official repo for “TextCoT: Zoom In for Enhanced Multimodal Text-Rich Image Understanding”.

Back

TextCoT TextCoT copied to clipboard

Metadata

installation and demo

No answer_format.json file!!!

prepare_stage2_question.py cannot be found.

Results on the TextVQA benchmark

Can this method be used when inputting multiple images

llava.eval.eval_stvqa can not be found

← Metadata

Owner

Metadata

TextCoT
TextCoT copied to clipboard