Transformers-Tutorials icon indicating copy to clipboard operation
Transformers-Tutorials copied to clipboard

[Help] how can one modify the config and retrain the model to use it for words instead of text lines?

Open KD1994 opened this issue 3 years ago • 0 comments

Hi @NielsRogge,

I'm aware that TrOCR is specifically used for text lines but would it be possible to modify it for words? If so, could you please provide some info on how to achieve this?

Model: microsoft/trocr-base-handwritten Dataset (private): combinations of words and text lines

Also just curious, which one is better encoding character level or word level for the dataset? How should I modify the tokenizer according to you?

Thanks, KD

KD1994 avatar Mar 03 '22 06:03 KD1994