mustaszewski issues

Results 5 issues of


                                            mustaszewski

Out-of-memory error

Dear Mikel, first of all congratulations on this great piece of work and thank you for sharing it with the community. I experienced out-of-memory errors when mapping pre-trained fastText embeddings...

crawlUrlfilter

Thank you for developing this very useful package. However, I have a problem with the `crawlUrlfilter` argument. From a large website, I would like to crawl and scrape only those...

Bounding boxes required for pretraining?

Does the pre-training of Donut require bounding boxes of individual words? In the synthetically generated SynthDoG dataset (https://huggingface.co/datasets/naver-clova-ix/synthdog-en), which was also used for Donut pretraining, there are no bounding boxes,...

DONUT: Reading order for pseudo-OCR pre-training task

I would like to train the [Donut base model](https://huggingface.co/naver-clova-ix/donut-base) for a few more epochs on the pre-training pseudo-OCR task using a custom dataset. In what reading order should the individual...

Reading order for pseudo-OCR pre-training task

I would like to train the base model for a few more epochs on the pre-training pseudo-OCR task using a custom dataset. In what reading order should the individual words...