tungsten106

Results 3 issues of tungsten106

- Using Pymupdf package to extract image bbox and sorted with y-position, adding the MD formated image label as text to the output markdown file; - Image data saved in...

Using `pymupdf4llm` instead of `pdfminer` to parse pdf contents into markdown formats, as suggested by #131. Pros and Cons: - `pdfminer` extract texts only, generated files have no heading, titles,...

Added the requirements.txt file and update the README.md. (The `flash-attn==2.6.3` was commented out due to my low CUDA version, you could still keep that if it's helpful) The other changes...