Pieter Marsman
Pieter Marsman
The `--all-text` is about the object tree, not where objects are visually displayed. So if an object is not nested inside an `LTFigure` is should be extracted when `--all-text` is...
Yes, I agree that that is desirable. Should be mentioned in the *CONTRIBUTING.md*
I cannot reproduce this bug. ``` $ python tools/pdf2txt.py ~/Downloads/pdfminer_bytes.pdf 1% ΚΑΛΥΤΕΡΟΙ ΚΑΘΕ ΜΕΡΑ Χειρότεροι κατά 1% καθημερινά επί ένα χρόνο. 0,99365 = 00,03 Καλύτεροι κατά 1% καθημερινά επί ένα...
@joaquimcampos Thanks for pointing that out :+1:
This was introduced by: 43c8fc8557528463c99598049b7005ae96ab8084
This happens because these text lines only contain white space. Previously, all text lines with a zero width or high were added directly under the page object. After the change...
@KunalGehlot Can you create a PR with that specific commit such that I can review and merge it?
I can replicate this issue with the newest version of pdfminer.six. Tried cleaning the pdf with mutools and running the code again, but no difference.
Hi @kelvin0, are you experiencing problems due to this issue? I assume that the clipping operator is more often used to exclude parts of a drawing, than it being used...
Feel free to create a PR. I can do reviews and merge it when ready. I don't mind if the first implementation only focusses on adding clipping-path behaviour and ignoring...