markitdown icon indicating copy to clipboard operation
markitdown copied to clipboard

AttributeError: module 'pdfminer.high_level' has no attribute 'extract_text'"

Open subhrajit-mohanty opened this issue 1 year ago • 2 comments

I am getting the following issue when I was trying to extract the attached

Interstallar.pdf

PDF.

FileConversionException: Could not convert 'Interstallar.pdf' to Markdown. File type was recognized as ['.pdf', '.pdf', '.fdf']. While converting the file, the following error was encountered:

Traceback (most recent call last):
  File "/opt/miniconda/envs/py311/lib/python3.11/site-packages/markitdown/_markitdown.py", line 1239, in _convert
    res = converter.convert(local_path, **_kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda/envs/py311/lib/python3.11/site-packages/markitdown/_markitdown.py", line 490, in convert
    text_content=pdfminer.high_level.extract_text(local_path),
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: module 'pdfminer.high_level' has no attribute 'extract_text'

subhrajit-mohanty avatar Jan 20 '25 08:01 subhrajit-mohanty

+1, have you found a solution to this problem?

LongQIByte avatar Feb 10 '25 11:02 LongQIByte

I just tried to restart to install pdfminer and it works. The pdfminer version is:

➜  pip list | grep pdf                 
pdfminer.six             20240706

LongQIByte avatar Feb 10 '25 11:02 LongQIByte