markitdown
markitdown copied to clipboard
I'm unable to catch the error. The error breaks the code despite being in a try block.
Code:
try:
result = md.convert(str(pdf_file))
except Exception as e:
log.error(f"MarkItDown conversion failed for {pdf_file.name}: {e}")
print(f"DEBUG: Exception caught in conversion - {e}")
Error:
Traceback (most recent call last):
File "<python_environment>/Lib/site-packages/markitdown/_markitdown.py", line 1239, in _convert
res = converter.convert(local_path, **_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<python_environment>/Lib/site-packages/markitdown/_markitdown.py", line 490, in convert
text_content = pdfminer.high_level.extract_text(local_path),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<python_environment>/Lib/site-packages/pdfminer/high_level.py", line 176, in extract_text
interpreter.process_page(page)
File "<python_environment>/Lib/site-packages/pdfminer/pdfinterp.py", line 997, in process_page
self.render_contents(page.resources, page.contents, ctm=ctm)
File "<python_environment>/Lib/site-packages/pdfminer/pdfinterp.py", line 1014, in render_contents
self.init_resources(resources)
File "<python_environment>/Lib/site-packages/pdfminer/pdfinterp.py", line 387, in init_resources
colorspace = get_colorspace(resolve1(spec))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<python_environment>/Lib/site-packages/pdfminer/pdfinterp.py", line 370, in get_colorspace
return PDFColorSpace(name, stream_value(spec[1])["N"])
~~~~~~~~~~~~~~~~~~~~~^^^^^
File "<python_environment>/Lib/site-packages/pdfminer/pdftypes.py", line 263, in __getitem__
return self.attrs[name]
~~~~~~~~~~^^^^^^
KeyError: 'N'
The pdf is corrupted and it's fine that it throws an exception. But it's not getting caught to be handled.
I'm using markitdown = "^0.0.1a3" on python = "^3.11"
same issue.
@aditya005
it works when you use it like this.
try:
result = md.convert(str(pdf_file))
except:
log.error(f"MarkItDown conversion failed for {pdf_file.name}: {e}")
print(f"DEBUG: Exception caught in conversion - {e}")
it works when you use it like this.
try: result = md.convert(str(pdf_file)) except: log.error(f"MarkItDown conversion failed for {pdf_file.name}: {e}") print(f"DEBUG: Exception caught in conversion - {e}")
Thanks this worked. Unable to get the exception object e. But that works for now.
This was because the exception was subclassing BaseException rather than Exception. This is now fixed. Sorry for the inconvenience.