markitdown icon indicating copy to clipboard operation
markitdown copied to clipboard

fix docx parse error (docx testcase: \n in alt)

Open BetterAndBetterII opened this issue 10 months ago • 0 comments

In the test case of test.docx, there are actually some problems with the parsing of images.

It is not a problem with the image URI, but with the parsing of the alt.

The doc document allows the alt of the image to be multi-line text, but markdown does not actually allow alt to wrap.

My change is to replace the line breaks of multi-line alt.

Reproduce: run testcase with test.docx in testfile folder. (keep_data_uri=True)

BetterAndBetterII avatar Mar 31 '25 17:03 BetterAndBetterII