markitdown icon indicating copy to clipboard operation
markitdown copied to clipboard

I'm unable to catch the error. The error breaks the code despite being in a try block.

Open aditya005 opened this issue 1 year ago • 3 comments

Code:

            try:
                result = md.convert(str(pdf_file))
            except Exception as e:
                log.error(f"MarkItDown conversion failed for {pdf_file.name}: {e}")
                print(f"DEBUG: Exception caught in conversion - {e}")

Error:

Traceback (most recent call last):
  File "<python_environment>/Lib/site-packages/markitdown/_markitdown.py", line 1239, in _convert
    res = converter.convert(local_path, **_kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<python_environment>/Lib/site-packages/markitdown/_markitdown.py", line 490, in convert
    text_content = pdfminer.high_level.extract_text(local_path),
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<python_environment>/Lib/site-packages/pdfminer/high_level.py", line 176, in extract_text
    interpreter.process_page(page)
  File "<python_environment>/Lib/site-packages/pdfminer/pdfinterp.py", line 997, in process_page
    self.render_contents(page.resources, page.contents, ctm=ctm)
  File "<python_environment>/Lib/site-packages/pdfminer/pdfinterp.py", line 1014, in render_contents
    self.init_resources(resources)
  File "<python_environment>/Lib/site-packages/pdfminer/pdfinterp.py", line 387, in init_resources
    colorspace = get_colorspace(resolve1(spec))
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<python_environment>/Lib/site-packages/pdfminer/pdfinterp.py", line 370, in get_colorspace
    return PDFColorSpace(name, stream_value(spec[1])["N"])
                               ~~~~~~~~~~~~~~~~~~~~~^^^^^
  File "<python_environment>/Lib/site-packages/pdfminer/pdftypes.py", line 263, in __getitem__
    return self.attrs[name]
           ~~~~~~~~~~^^^^^^
KeyError: 'N'

The pdf is corrupted and it's fine that it throws an exception. But it's not getting caught to be handled. I'm using markitdown = "^0.0.1a3" on python = "^3.11"

aditya005 avatar Feb 13 '25 07:02 aditya005

same issue.

2niuhe avatar Feb 14 '25 06:02 2niuhe

@aditya005

it works when you use it like this.

            try:
                result = md.convert(str(pdf_file))
            except:
                log.error(f"MarkItDown conversion failed for {pdf_file.name}: {e}")
                print(f"DEBUG: Exception caught in conversion - {e}")

williambrach avatar Feb 22 '25 07:02 williambrach

@aditya005

it works when you use it like this.

            try:
                result = md.convert(str(pdf_file))
            except:
                log.error(f"MarkItDown conversion failed for {pdf_file.name}: {e}")
                print(f"DEBUG: Exception caught in conversion - {e}")

Thanks this worked. Unable to get the exception object e. But that works for now.

aditya005 avatar Feb 22 '25 19:02 aditya005

This was because the exception was subclassing BaseException rather than Exception. This is now fixed. Sorry for the inconvenience.

afourney avatar Mar 01 '25 07:03 afourney