tungsten106 issues

Repositories
Issues
Comments

Results 3 issues of


                                            tungsten106

Add image label to output MD file.

- Using Pymupdf package to extract image bbox and sorted with y-position, adding the MD formated image label as text to the output markdown file; - Image data saved in...

update: change pdf text parser to pymupdf4llm

Using `pymupdf4llm` instead of `pdfminer` to parse pdf contents into markdown formats, as suggested by #131. Pros and Cons: - `pdfminer` extract texts only, generated files have no heading, titles,...

Added a pip install option

Added the requirements.txt file and update the README.md. (The `flash-attn==2.6.3` was commented out due to my low CUDA version, you could still keep that if it's helpful) The other changes...