Results 2 issues of Julian West

I'm trying to *extract text* from PDF documents, to isolate individual words and create an indexing system. Some PDF files are parsed fine, but others (such as the attached "Ocean...

workflow-text-extraction
whitespace

## Describe the bug I'm trying to *extract text from PDF documents*, to isolate individual words and create an indexing system. For most PDF files, pymupdf (version 1.23.5) does a...

enhancement-upstream