textract
textract copied to clipboard
extract text from any document. no muss. no fuss.
### Update [argcomplete](https://pypi.org/project/argcomplete) from **1.10.3** to **2.0.0**. Changelog ### 2.0.0 ``` =============================== - Truncate input after cursor. Fixes 351 (352) - Support of path completion in fish 327 (359) -...
There are small typos in: - docs/installation.rst - textract/exceptions.py Fixes: - Should read `suppressed` rather than `supressed`. - Should read `documentation` rather than `documenation`. - Should read `accommodated` rather than...
When extracting a PDF using the pdfminer method, it looks for an application called `pdf2text.py`, but the spawn package adds `.exe` to it automatically. Obviously this file doesn't exists, so...
I am trying to extract text from hundreds of thousands of PDFs using a computer cluster. I want to run commands like textract cl-exec-201666USCOC.pdf -o test1.txt -m tesseract where the...
**Describe the bug** ``` ERROR: Cannot install beautifulsoup4==4.11.1 and textract==1.6.3 because these package versions have conflicting dependencies. The conflict is caused by: The user requested beautifulsoup4==4.11.1 textract 1.6.3 depends on...
**Describe the bug** When parsing files using textract specifically '.txt' files the input/output_encoding arguments simply don't work when parsing any text **To Reproduce** Steps to reproduce the behavior: 1. Create...
**Describe the bug** I operate locally on a Mac and a simple test from a sample pdf passes locally, but fails in a Docker container. **To Reproduce** Steps to reproduce...
Added a fix for issue #342 caused by `extract_msg.Message._getStringStream` returning `None` for streams that are not found in the MSG file (this is intentional and should be handled accordingly). `ensure_bytes`...
For the moment, pip complaints about these dependency conflicts: ``` textract 1.6.5 requires argcomplete~=1.10.0, but you have argcomplete 2.0.0 which is incompatible. textract 1.6.5 requires beautifulsoup4~=4.8.0, but you have beautifulsoup4...
**Is your feature request related to a problem? Please describe.** A clear and concise description of what the problem is. Ex. I'm always frustrated when [...] **Which filetype should textract...