markitdown
markitdown copied to clipboard
Python tool for converting files and office documents to Markdown.
Addresses https://github.com/microsoft/markitdown/issues/88. Adds new converter + new test.
Unsure about this - perhaps should be an optional dep? _Originally posted by @casperdcl in https://github.com/microsoft/markitdown/pull/100#discussion_r1889308792_
**Convert .docx to .md file and later that file is added into mkdocs docs folder.** After: `mkdocs serve` **I got the following errors:** ``` ERROR - Encoding error reading file:...
It would be incredibly helpful if MarkItDown could support the conversion of MHTML files to Markdown. MHTML (MIME HTML) files are a common format for saving web pages and preserving...
I install markitdown in Anaconda using 'pip install markitdown',but can't import markitdown in python, **the code is:** from markitdown import MarkItDown **the Errorinfo is:** cannot import name 'MarkItDown' from 'markitdown'...
UnicodeEncodeError: 'cp950' codec can't encode character '\uf09f' in position 139457: illegal multibyte sequence
I'm using the Python API to convert some German documents, and I narrowed down the problem in my conversion to MarkItDown. ``` md = MarkItDown() result = md.convert(file_name) markdown_text =...
I'd like to reopen an issue that I commented on (https://github.com/microsoft/markitdown/issues/281) that was subsequently closed. Perhaps I'm missing something basic, but XML files do not seem to work with this...
(Microsoft-MarkItDown) C:\Users\zaish01317\MarkItDown>markitdown "C:\Users\zaish01317\translation-agent\examples\sample-texts\Modular RAG Transforming RAG Systems into LEGO-like Reconfigurable Frameworks.pdf" > "C:\Users\zaish01317\translation-agent\examples\sample-texts\Modular RAG Transforming RAG Systems into LEGO-like Reconfigurable Frameworks.pdf.md" Traceback (most recent call last): File "C:\Users\zaish01317\.conda\envs\Microsoft-MarkItDown\lib\runpy.py", line 196,...
For every PDF file I tested the tool crashes with whatever UnicodeEncodeError. In every file it finds a different character to crash on. The problem is that it didn't even...