ocrd_segment icon indicating copy to clipboard operation
ocrd_segment copied to clipboard

Split regions if they are "split" by a separator

Open wrznr opened this issue 6 years ago • 6 comments

image

wrznr avatar Jan 16 '20 09:01 wrznr

Yes, I wonder what Tesseract is thinking when it does this. Such bad manners!

bertsky avatar Jan 16 '20 10:01 bertsky

Technically, what exactly would you propose to do? Calculate the point-set difference of the polygons, and then look at the resulting ~~interior sets~~ multipolygon as a sequence?

bertsky avatar Jan 16 '20 10:01 bertsky

Do you have an example where clipping is not enough to handle this?

bertsky avatar Jan 16 '20 10:01 bertsky

NO. It's just cosmetics.

wrznr avatar Jan 16 '20 10:01 wrznr

I thought about using https://shapely.readthedocs.io/en/stable/manual.html#splitting

wrznr avatar Jan 16 '20 10:01 wrznr

I thought about using https://shapely.readthedocs.io/en/stable/manual.html#splitting

Yes, that would work.

But there are other issues besides coordinates:

  • new IDs (append suffix to both, or just one)
  • fix/update reading order (insert at same position, or leave out)
  • splitting up existing text content or lines/words

bertsky avatar Jan 16 '20 10:01 bertsky