The model hallucinates blank images and creates output out of thin air

Open SleepEarlyLiveLong opened this issue 1 year ago • 0 comments

When using the ParseQ model for inference, if a blank image is input, the output content is out of thin air.

To solve this problem, I generated 10,000 blank images of varying lengths and set their labels to a single space ‘ ’, but these data were ignored when finetuning the model： https://github.com/baudm/parseq/blob/1902db043c029a7e03a3818c616c06600af574be/strhub/data/dataset.py#L115

How can I solve this problem? Should I simply set remove_whitespace to false? For blank images of varying lengths, should I set their labels to a single space or to different numbers of spaces depending on the length of the image?

Oct 14 '24 09:10 SleepEarlyLiveLong