donut DOCVQA Single Inference

I already work with train.py and test.py scripts... but I'm wondering if it is possible to run the inference with the question that you wanted at the execution time. Something like:

python test.py --task_name docvqa --dataset_name_or_path "img.jpg" --pretrained_model_name_or_path model --question "what time is it?"

Or there is any way to do the inference just with the image and question? without providing the ground_truth parameter in the json file with the desired result as an answer

Feb 16 '23 15:02 emigomez

I am doing single inference on VQA like this:

# model = DonutModel.from_pretrained("./result/train_docvqa/20230216_120910")
# if torch.cuda.is_available():
#     model.half()
#     device = torch.device("cuda")
#     model.to(device)
# else:
#     model.encoder.to(torch.bfloat16)
# model.eval()

image = Image.open("docvqa-donut/test/xhcc0228_3.png").convert("RGB")

question = "What is the title provided in the left of the document"
output = model.inference(image=image, prompt=f"<s_docvqa-donut><s_question>{question}</s_question><s_answer>")
output['predictions'][0]

Feb 17 '23 01:02 wdprsto

@wdprsto thanks I did something similar!

Do you know how to obtain the score in the response? and not just the predictions

Feb 17 '23 11:02 emigomez

Scores are covered here: https://github.com/clovaai/donut/issues/37

If you have specific questions you are always asking, I would avoid using any softmax personally and just figure out a "safe" score per question from there. Otherwise, softmax is probably what you want

Feb 18 '23 16:02 logan-markewich