Emmett McFarlane

Results 16 comments of Emmett McFarlane

It looks like the LLM hit the 16K token generation limit. This is a limitation of the language model, so trying other models with larger token limits can help. Using...

For those still looking for page-wise markdown extraction, [the library markitdown is based on](https://github.com/emcf/thepipe) has this feature

Hi @Fuckingnameless , it looks like this is a downstream failure as a result of #34 . Replied there ps. The output folder is created in the directory the command...

Hi @camrail , I've introduced some additional options `rescale: float`, `input_images: bool`, and `output images: bool` into the `scraper.scrape_pdf` function to ease memory usage (this creates a tradeoff that may...

Marker is great, but unfortunately, the idea of a heuristic pipeline with multiple fine-tuned specialized models ignores [the bitter lesson](http://www.incompleteideas.net/IncIdeas/BitterLesson.html). I only see a future for PDF extraction using general...

If you're still looking to accurately extract the tables from PDF check out this [library](https://github.com/emcf/thepipe)