slbayer
slbayer
Random stranger here: the current version of `camelot` uses ghostscript, and the table detection script in `master` still uses `camelot`.
The problem is that the model was built with an old version of `sklearn` that had this module. According to the warning I get after doing several horrid things with...
Here's another use case: I have a package right now which contains a `resources` subdirectory that I want to have distributed with the package, and that subdirectory contains a Python...
Here's a modification of the fix proposed in #836 which addresses this issue: ``` def write_word(self) -> None: if len(self.working_text) > 0: txt = self._clean_text(self.working_text.strip()) if len(txt) > 0: bold_and_italic_styles...
Further testing reveals that if the string in the document had been ``, the angle brackets would not have been escaped properly either.
This needs to be fixed in two places. In release 20221105, in `converter.py`, line 934 should be `enc(self.working_text.strip()),` instead of `self.working_text.strip(),` and line 913 should be `self.write(enc(text))` instead of `self.write(text)`
Actually, I've now discovered something very closely related: if the `stripcontrol` attribute of the `HOCRConverter` is `False`, at least the `lxml` XML parser will fail on zero bytes (`\x00`). And...
E.g., in release 20221105, in `converter.py`, line 947, change `"\n"` to `"\n"` and at line 962 - 3, change ``` "\n" % (item.index, self.bbox_repr(item.bbox)) ``` to ``` "\n" % (ltpage.pageid,...
In commit [5114acd](https://github.com/pdfminer/pdfminer.six/commit/5114acdda61205009221ce4ebf2c68c144fc4ee5), the bug is at line 1005 in `convertor.py`: ``` if ( self.working_bbox[1] != item.bbox[1] or self.working_font != item.fontname or self.working_size != item.size ): self.write_word() self.working_bbox = item.bbox...
One of the problems with pattern-matching approaches, as opposed to something more statistical, is that it's pretty much all or nothing when you find an edge case. I don't know...