Table exctraction from PDF is advertised but completely absent

Open riccardomalavolti opened this issue 4 months ago • 3 comments

Version 0.1.3

docker run --rm -i markdown:latest < ~/example.pdf > output.md

where example.pdf is a native PDF (not a scanned document).

markitdown extracts the text but there's no sign of tables, the output is simply interleaved by newlines.

Sep 17 '25 06:09 riccardomalavolti

Looking at the source code, you can see that pdf is still using pdfminer. You can see the effect of converting pdf to md. Don't have too high expectations. Now the ocr model is used to realize the conversion of text, tables and formulas.

Sep 18 '25 11:09 bjfk2006

If you're still looking to accurately extract the tables from PDF check out this library

Oct 01 '25 14:10 emcf

Had to install and try this to find out the truth. :(

Nov 20 '25 15:11 boldandbusted