Transformers-Tutorials
Transformers-Tutorials copied to clipboard
Issue while training Donut model for parsing with custom decoder and tokenizer
Hey all, I was trying to train donut model for parsing, which contains Arabic(only) information, in order to achieve this i had collected Arabic corpus from various sources and then trained,
-
Mbart Tokenizerfor arabic corpus. -
Mbart decoderwith the same dataset.
Initially the model was training well meaning the loss was decreasing gradually but, during Validation, all my dataset tokens are predicting as <UNK> tokens. Because of this the Normed ED value is above 0.9 but still the loss is decreasing.
Is there anything I am missing out , any inputs will help a lot. @gwkrsrch , @Vadkoz ,@NielsRogge Thanks regards.