mustaszewski
mustaszewski
Dear Mikel, first of all congratulations on this great piece of work and thank you for sharing it with the community. I experienced out-of-memory errors when mapping pre-trained fastText embeddings...
Thank you for developing this very useful package. However, I have a problem with the `crawlUrlfilter` argument. From a large website, I would like to crawl and scrape only those...
Does the pre-training of Donut require bounding boxes of individual words? In the synthetically generated SynthDoG dataset (https://huggingface.co/datasets/naver-clova-ix/synthdog-en), which was also used for Donut pretraining, there are no bounding boxes,...
I would like to train the [Donut base model](https://huggingface.co/naver-clova-ix/donut-base) for a few more epochs on the pre-training pseudo-OCR task using a custom dataset. In what reading order should the individual...
I would like to train the base model for a few more epochs on the pre-training pseudo-OCR task using a custom dataset. In what reading order should the individual words...