Fakabbir Amin comments

Results 13 comments of


                                            Fakabbir Amin

Need to read the tables of pdf through camelot

Can you share the steps or error you encountered during the process ?

Add hOCR output type for pdf2txt

@pietermarsman At present the html output is the best representation of the PDF. I think, what @hason had mentioned is to only extract text from the pdf. In that case,...

Strange Code Comments

Sure, Let me grab some patience and time.

Strange Code Comments

@jstockwin Yes, probably I would devote some time for this and some other issues too.

update python-pdfbox to support PDFBox 3.*

currently a fork of python-pdfbox is available which works smoothly. pip install python-pdfbox-v2

update python-pdfbox to support PDFBox 3.*

@mara004 As far as I remember, #29 was not merged or working when I discovered the breaking changes due to pdfbox v3. If #29 is working now, its great and...

cannot import name 'DEFAULT_CIPHERS' from 'urllib3.util.ssl_'

The issue is mainly due to some conflicting dependency and under python 3.9. Try running with python3.10 fresh, things should get right.

Cannot find text pdf files

Hi, pdf.xml files have to be generated via tesseract. https://github.com/fakabbir/OCR/blob/master/src/OCRScript.py#L23

CAN YOU HELP

Hi, OCR refers to extraction of text, In order to convert them to key value pairs, you would require rules which may not be exactly the way it's in this...

Command - pip reset

Its more of a way to revert the base environment, which is generally used by default