Emmett McFarlane
Emmett McFarlane
It looks like the LLM hit the 16K token generation limit. This is a limitation of the language model, so trying other models with larger token limits can help. Using...
For those still looking for page-wise markdown extraction, [the library markitdown is based on](https://github.com/emcf/thepipe) has this feature
Hi @Fuckingnameless , it looks like this is a downstream failure as a result of #34 . Replied there ps. The output folder is created in the directory the command...
Hi @camrail , I've introduced some additional options `rescale: float`, `input_images: bool`, and `output images: bool` into the `scraper.scrape_pdf` function to ease memory usage (this creates a tradeoff that may...
Marker is great, but unfortunately, the idea of a heuristic pipeline with multiple fine-tuned specialized models ignores [the bitter lesson](http://www.incompleteideas.net/IncIdeas/BitterLesson.html). I only see a future for PDF extraction using general...
If you're still looking to accurately extract the tables from PDF check out this [library](https://github.com/emcf/thepipe)