Pieter Marsman

Results 221 comments of Pieter Marsman

The `--all-text` is about the object tree, not where objects are visually displayed. So if an object is not nested inside an `LTFigure` is should be extracted when `--all-text` is...

Yes, I agree that that is desirable. Should be mentioned in the *CONTRIBUTING.md*

I cannot reproduce this bug. ``` $ python tools/pdf2txt.py ~/Downloads/pdfminer_bytes.pdf 1% ΚΑΛΥΤΕΡΟΙ ΚΑΘΕ ΜΕΡΑ Χειρότεροι κατά 1% καθημερινά επί ένα χρόνο. 0,99365 = 00,03 Καλύτεροι κατά 1% καθημερινά επί ένα...

@joaquimcampos Thanks for pointing that out :+1:

This was introduced by: 43c8fc8557528463c99598049b7005ae96ab8084

This happens because these text lines only contain white space. Previously, all text lines with a zero width or high were added directly under the page object. After the change...

@KunalGehlot Can you create a PR with that specific commit such that I can review and merge it?

I can replicate this issue with the newest version of pdfminer.six. Tried cleaning the pdf with mutools and running the code again, but no difference.

Hi @kelvin0, are you experiencing problems due to this issue? I assume that the clipping operator is more often used to exclude parts of a drawing, than it being used...

Feel free to create a PR. I can do reviews and merge it when ready. I don't mind if the first implementation only focusses on adding clipping-path behaviour and ignoring...