DOCVQA Single Inference
I already work with train.py and test.py scripts... but I'm wondering if it is possible to run the inference with the question that you wanted at the execution time. Something like:
python test.py --task_name docvqa --dataset_name_or_path "img.jpg" --pretrained_model_name_or_path model --question "what time is it?"
Or there is any way to do the inference just with the image and question? without providing the ground_truth parameter in the json file with the desired result as an answer
I am doing single inference on VQA like this:
# model = DonutModel.from_pretrained("./result/train_docvqa/20230216_120910")
# if torch.cuda.is_available():
# model.half()
# device = torch.device("cuda")
# model.to(device)
# else:
# model.encoder.to(torch.bfloat16)
# model.eval()
image = Image.open("docvqa-donut/test/xhcc0228_3.png").convert("RGB")
question = "What is the title provided in the left of the document"
output = model.inference(image=image, prompt=f"<s_docvqa-donut><s_question>{question}</s_question><s_answer>")
output['predictions'][0]
@wdprsto thanks I did something similar!
Do you know how to obtain the score in the response? and not just the predictions
Scores are covered here: https://github.com/clovaai/donut/issues/37
If you have specific questions you are always asking, I would avoid using any softmax personally and just figure out a "safe" score per question from there. Otherwise, softmax is probably what you want