Transformers-Tutorials
Transformers-Tutorials copied to clipboard
[Help] how can one modify the config and retrain the model to use it for words instead of text lines?
Hi @NielsRogge,
I'm aware that TrOCR is specifically used for text lines but would it be possible to modify it for words? If so, could you please provide some info on how to achieve this?
Model: microsoft/trocr-base-handwritten
Dataset (private): combinations of words and text lines
Also just curious, which one is better encoding character level or word level for the dataset? How should I modify the tokenizer according to you?
Thanks, KD