Julian West issues

Repositories
Issues
Comments

Results 2 issues of


                                            Julian West

Inconsistent hyphenation (and lost blanks)

I'm trying to *extract text* from PDF documents, to isolate individual words and create an indexing system. Some PDF files are parsed fine, but others (such as the attached "Ocean...

workflow-text-extraction

whitespace

For some documents, many words get lumped together by get_text()

## Describe the bug I'm trying to *extract text from PDF documents*, to isolate individual words and create an indexing system. For most PDF files, pymupdf (version 1.23.5) does a...

enhancement-upstream