markitdown [bug] Markitdown failed to convert pdf that contains image

cn_dissertation_1st_page.pdf In trying to analyze the attached file with

    result = md.convert(file_path)
    return result.text_content

I got the following error

Traceback (most recent call last):
  File "/Users/fengjunchen/Library/Caches/pypoetry/virtualenvs/ai-advisor-IuYXIAy1-py3.10/lib/python3.10/site-packages/markitdown/_markitdown.py", line 1239, in _convert
    res = converter.convert(local_path, **_kwargs)
  File "/Users/fengjunchen/Library/Caches/pypoetry/virtualenvs/ai-advisor-IuYXIAy1-py3.10/lib/python3.10/site-packages/markitdown/_markitdown.py", line 490, in convert
    text_content=pdfminer.high_level.extract_text(local_path),
  File "/Users/fengjunchen/Library/Caches/pypoetry/virtualenvs/ai-advisor-IuYXIAy1-py3.10/lib/python3.10/site-packages/pdfminer/high_level.py", line 169, in extract_text
    for page in PDFPage.get_pages(
  File "/Users/fengjunchen/Library/Caches/pypoetry/virtualenvs/ai-advisor-IuYXIAy1-py3.10/lib/python3.10/site-packages/pdfminer/pdfpage.py", line 171, in get_pages
    for (pageno, page) in enumerate(cls.create_pages(doc)):
  File "/Users/fengjunchen/Library/Caches/pypoetry/virtualenvs/ai-advisor-IuYXIAy1-py3.10/lib/python3.10/site-packages/pdfminer/pdfpage.py", line 127, in create_pages
    yield cls(document, objid, tree, next(page_labels))
  File "/Users/fengjunchen/Library/Caches/pypoetry/virtualenvs/ai-advisor-IuYXIAy1-py3.10/lib/python3.10/site-packages/pdfminer/pdfpage.py", line 64, in __init__
    resolve1(mediabox_param) for mediabox_param in self.attrs["MediaBox"]
KeyError: 'MediaBox'

I am running it on Python 3.10 in MacOS 15.2 (24C101)

Dec 26 '24 07:12 Drjunchenfeng

atter consulting with o1 and tinkering with it. I realize that it is because I am using pymupdf to reconstruct the pdf page and thus missing this meta info.

Dec 26 '24 08:12 Drjunchenfeng

check out #139

Dec 26 '24 08:12 l-lumin

So is the expectation for this to work with pdfs that contain images that you write your own plugin? This limitation should be made clear.

May 02 '25 20:05 supernitin