parseq icon indicating copy to clipboard operation
parseq copied to clipboard

The model hallucinates blank images and creates output out of thin air

Open SleepEarlyLiveLong opened this issue 1 year ago • 0 comments

When using the ParseQ model for inference, if a blank image is input, the output content is out of thin air. image

To solve this problem, I generated 10,000 blank images of varying lengths and set their labels to a single space ‘ ’, but these data were ignored when finetuning the model: https://github.com/baudm/parseq/blob/1902db043c029a7e03a3818c616c06600af574be/strhub/data/dataset.py#L115 image

How can I solve this problem? Should I simply set remove_whitespace to false? For blank images of varying lengths, should I set their labels to a single space or to different numbers of spaces depending on the length of the image?

SleepEarlyLiveLong avatar Oct 14 '24 09:10 SleepEarlyLiveLong