ocrd_segment icon indicating copy to clipboard operation
ocrd_segment copied to clipboard

Error in shapely/ocrd_segment

Open MehmedGIT opened this issue 3 years ago • 0 comments

The segment-repair processor in the following workflow:

ocrd process \
"olena-binarize -I OCR-D-IMG -O OCR-D-BIN -P impl sauvola" \
"anybaseocr-crop -I OCR-D-BIN -O OCR-D-CROP" \
"olena-binarize -I OCR-D-CROP -O OCR-D-BIN2 -P impl kim" \
"cis-ocropy-denoise -I OCR-D-BIN2 -O OCR-D-BIN-DENOISE -P level-of-operation page" \
"cis-ocropy-deskew -I OCR-D-BIN-DENOISE -O OCR-D-BIN-DENOISE-DESKEW -P level-of-operation page" \
"tesserocr-segment-region -I OCR-D-BIN-DENOISE-DESKEW -O OCR-D-SEG-REG" \
"segment-repair -I OCR-D-SEG-REG -O OCR-D-SEG-REPAIR -P plausibilize true" \
"cis-ocropy-deskew -I OCR-D-SEG-REPAIR -O OCR-D-SEG-REG-DESKEW -P level-of-operation region" \
"cis-ocropy-clip -I OCR-D-SEG-REG-DESKEW -O OCR-D-SEG-REG-DESKEW-CLIP -P level-of-operation region" \
"tesserocr-segment-line -I OCR-D-SEG-REG-DESKEW-CLIP -O OCR-D-SEG-LINE" \
"segment-repair -I OCR-D-SEG-LINE -O OCR-D-SEG-REPAIR-LINE -P sanitize true" \
"cis-ocropy-dewarp -I OCR-D-SEG-REPAIR-LINE -O OCR-D-SEG-LINE-RESEG-DEWARP" \
"calamari-recognize -I OCR-D-SEG-LINE-RESEG-DEWARP -O OCR-D-OCR -P checkpoint_dir qurator-gt4histocr-1.0"

executed on the DEFAULT file group inside this workspace: https://content.staatsbibliothek-berlin.de/dc/PPN631277528.mets.xml

produces the following error:

  12:45:52.522 INFO processor.RepairSegmentation - INPUT FILE 0 / PHYS_0001
  12:45:52.524 INFO ocrd.page_validator.validate - Validating input file 'FILE_0001_OCR-D-SEG-LINE'
  12:45:52.652 INFO processor.RepairSegmentation - INPUT FILE 1 / PHYS_0002
  12:45:52.654 INFO ocrd.page_validator.validate - Validating input file 'FILE_0002_OCR-D-SEG-LINE'
  12:45:52.776 INFO processor.RepairSegmentation - INPUT FILE 2 / PHYS_0003
  12:45:52.777 INFO ocrd.page_validator.validate - Validating input file 'FILE_0003_OCR-D-SEG-LINE'
  12:45:52.912 INFO processor.RepairSegmentation - INPUT FILE 3 / PHYS_0004
  12:45:52.914 INFO ocrd.page_validator.validate - Validating input file 'FILE_0004_OCR-D-SEG-LINE'
  12:45:53.017 INFO processor.RepairSegmentation - INPUT FILE 4 / PHYS_0005
  12:45:53.019 INFO ocrd.page_validator.validate - Validating input file 'FILE_0005_OCR-D-SEG-LINE'
  12:45:53.026 WARNING processor.RepairSegmentation - Fixed CoordinateValidityError for SeparatorRegion 'region0011'
  12:45:53.027 WARNING processor.RepairSegmentation - Fixed CoordinateValidityError for SeparatorRegion 'region0012'
  12:45:53.119 WARNING processor.RepairSegmentation - Zero contour area in region "region0000"
  12:45:53.730 WARNING processor.RepairSegmentation - Zero contour area in region "region0011"
  12:45:53.734 WARNING processor.RepairSegmentation - Zero contour area in region "region0012"
  12:45:54.609 INFO processor.RepairSegmentation - INPUT FILE 5 / PHYS_0006
  12:45:54.610 INFO ocrd.page_validator.validate - Validating input file 'FILE_0006_OCR-D-SEG-LINE'
  12:45:54.708 INFO processor.RepairSegmentation - INPUT FILE 6 / PHYS_0007
  12:45:54.710 INFO ocrd.page_validator.validate - Validating input file 'FILE_0007_OCR-D-SEG-LINE'
  12:45:54.812 WARNING processor.RepairSegmentation - Zero contour area in region "region0003"
  12:45:55.186 ERROR shapely.geos - TopologyException: side location conflict at 262 1071. This can occur if the input geometry is invalid.
  Traceback (most recent call last):
    File "/home/mm/venv37-ocrd/sub-venv/headless-tf1/bin/ocrd-segment-repair", line 8, in <module>
      sys.exit(ocrd_segment_repair())
    File "/home/mm/venv37-ocrd/sub-venv/headless-tf1/lib/python3.7/site-packages/click/core.py", line 1130, in __call__
      return self.main(*args, **kwargs)
    File "/home/mm/venv37-ocrd/sub-venv/headless-tf1/lib/python3.7/site-packages/click/core.py", line 1055, in main
      rv = self.invoke(ctx)
    File "/home/mm/venv37-ocrd/sub-venv/headless-tf1/lib/python3.7/site-packages/click/core.py", line 1404, in invoke
      return ctx.invoke(self.callback, **ctx.params)
    File "/home/mm/venv37-ocrd/sub-venv/headless-tf1/lib/python3.7/site-packages/click/core.py", line 760, in invoke
      return __callback(*args, **kwargs)
    File "/home/mm/venv37-ocrd/sub-venv/headless-tf1/lib/python3.7/site-packages/ocrd_segment/cli.py", line 21, in ocrd_segment_repair
      return ocrd_cli_wrap_processor(RepairSegmentation, *args, **kwargs)
    File "/home/mm/venv37-ocrd/sub-venv/headless-tf1/lib/python3.7/site-packages/ocrd/decorators/__init__.py", line 108, in ocrd_cli_wrap_processor
      run_processor(processorClass, ocrd_tool, mets, workspace=workspace, **kwargs)
    File "/home/mm/venv37-ocrd/sub-venv/headless-tf1/lib/python3.7/site-packages/ocrd/processor/helpers.py", line 88, in run_processor
      processor.process()
    File "/home/mm/venv37-ocrd/sub-venv/headless-tf1/lib/python3.7/site-packages/ocrd_segment/repair.py", line 188, in process
      padding=self.parameter['sanitize_padding'])
    File "/home/mm/venv37-ocrd/sub-venv/headless-tf1/lib/python3.7/site-packages/ocrd_segment/repair.py", line 559, in shrink_regions
      if len(contour) >= 3], scale=scale)
    File "/home/mm/venv37-ocrd/sub-venv/headless-tf1/lib/python3.7/site-packages/ocrd_segment/project.py", line 179, in join_polygons
      jointp = unary_union(polygons)
    File "/home/mm/venv37-ocrd/sub-venv/headless-tf1/lib/python3.7/site-packages/shapely/ops.py", line 161, in unary_union
      return geom_factory(lgeos.methods['unary_union'](collection))
    File "/home/mm/venv37-ocrd/sub-venv/headless-tf1/lib/python3.7/site-packages/shapely/geometry/base.py", line 73, in geom_factory
      raise ValueError("No Shapely geometry can be created from null value")
  ValueError: No Shapely geometry can be created from null value

This is the input image: FILE_0007_DEFAULT

MehmedGIT avatar Oct 26 '22 11:10 MehmedGIT