ocrd_segment
ocrd_segment copied to clipboard
Error in shapely/ocrd_segment
The segment-repair processor in the following workflow:
ocrd process \
"olena-binarize -I OCR-D-IMG -O OCR-D-BIN -P impl sauvola" \
"anybaseocr-crop -I OCR-D-BIN -O OCR-D-CROP" \
"olena-binarize -I OCR-D-CROP -O OCR-D-BIN2 -P impl kim" \
"cis-ocropy-denoise -I OCR-D-BIN2 -O OCR-D-BIN-DENOISE -P level-of-operation page" \
"cis-ocropy-deskew -I OCR-D-BIN-DENOISE -O OCR-D-BIN-DENOISE-DESKEW -P level-of-operation page" \
"tesserocr-segment-region -I OCR-D-BIN-DENOISE-DESKEW -O OCR-D-SEG-REG" \
"segment-repair -I OCR-D-SEG-REG -O OCR-D-SEG-REPAIR -P plausibilize true" \
"cis-ocropy-deskew -I OCR-D-SEG-REPAIR -O OCR-D-SEG-REG-DESKEW -P level-of-operation region" \
"cis-ocropy-clip -I OCR-D-SEG-REG-DESKEW -O OCR-D-SEG-REG-DESKEW-CLIP -P level-of-operation region" \
"tesserocr-segment-line -I OCR-D-SEG-REG-DESKEW-CLIP -O OCR-D-SEG-LINE" \
"segment-repair -I OCR-D-SEG-LINE -O OCR-D-SEG-REPAIR-LINE -P sanitize true" \
"cis-ocropy-dewarp -I OCR-D-SEG-REPAIR-LINE -O OCR-D-SEG-LINE-RESEG-DEWARP" \
"calamari-recognize -I OCR-D-SEG-LINE-RESEG-DEWARP -O OCR-D-OCR -P checkpoint_dir qurator-gt4histocr-1.0"
executed on the DEFAULT file group inside this workspace:
https://content.staatsbibliothek-berlin.de/dc/PPN631277528.mets.xml
produces the following error:
12:45:52.522 INFO processor.RepairSegmentation - INPUT FILE 0 / PHYS_0001
12:45:52.524 INFO ocrd.page_validator.validate - Validating input file 'FILE_0001_OCR-D-SEG-LINE'
12:45:52.652 INFO processor.RepairSegmentation - INPUT FILE 1 / PHYS_0002
12:45:52.654 INFO ocrd.page_validator.validate - Validating input file 'FILE_0002_OCR-D-SEG-LINE'
12:45:52.776 INFO processor.RepairSegmentation - INPUT FILE 2 / PHYS_0003
12:45:52.777 INFO ocrd.page_validator.validate - Validating input file 'FILE_0003_OCR-D-SEG-LINE'
12:45:52.912 INFO processor.RepairSegmentation - INPUT FILE 3 / PHYS_0004
12:45:52.914 INFO ocrd.page_validator.validate - Validating input file 'FILE_0004_OCR-D-SEG-LINE'
12:45:53.017 INFO processor.RepairSegmentation - INPUT FILE 4 / PHYS_0005
12:45:53.019 INFO ocrd.page_validator.validate - Validating input file 'FILE_0005_OCR-D-SEG-LINE'
12:45:53.026 WARNING processor.RepairSegmentation - Fixed CoordinateValidityError for SeparatorRegion 'region0011'
12:45:53.027 WARNING processor.RepairSegmentation - Fixed CoordinateValidityError for SeparatorRegion 'region0012'
12:45:53.119 WARNING processor.RepairSegmentation - Zero contour area in region "region0000"
12:45:53.730 WARNING processor.RepairSegmentation - Zero contour area in region "region0011"
12:45:53.734 WARNING processor.RepairSegmentation - Zero contour area in region "region0012"
12:45:54.609 INFO processor.RepairSegmentation - INPUT FILE 5 / PHYS_0006
12:45:54.610 INFO ocrd.page_validator.validate - Validating input file 'FILE_0006_OCR-D-SEG-LINE'
12:45:54.708 INFO processor.RepairSegmentation - INPUT FILE 6 / PHYS_0007
12:45:54.710 INFO ocrd.page_validator.validate - Validating input file 'FILE_0007_OCR-D-SEG-LINE'
12:45:54.812 WARNING processor.RepairSegmentation - Zero contour area in region "region0003"
12:45:55.186 ERROR shapely.geos - TopologyException: side location conflict at 262 1071. This can occur if the input geometry is invalid.
Traceback (most recent call last):
File "/home/mm/venv37-ocrd/sub-venv/headless-tf1/bin/ocrd-segment-repair", line 8, in <module>
sys.exit(ocrd_segment_repair())
File "/home/mm/venv37-ocrd/sub-venv/headless-tf1/lib/python3.7/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/home/mm/venv37-ocrd/sub-venv/headless-tf1/lib/python3.7/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/home/mm/venv37-ocrd/sub-venv/headless-tf1/lib/python3.7/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/mm/venv37-ocrd/sub-venv/headless-tf1/lib/python3.7/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/home/mm/venv37-ocrd/sub-venv/headless-tf1/lib/python3.7/site-packages/ocrd_segment/cli.py", line 21, in ocrd_segment_repair
return ocrd_cli_wrap_processor(RepairSegmentation, *args, **kwargs)
File "/home/mm/venv37-ocrd/sub-venv/headless-tf1/lib/python3.7/site-packages/ocrd/decorators/__init__.py", line 108, in ocrd_cli_wrap_processor
run_processor(processorClass, ocrd_tool, mets, workspace=workspace, **kwargs)
File "/home/mm/venv37-ocrd/sub-venv/headless-tf1/lib/python3.7/site-packages/ocrd/processor/helpers.py", line 88, in run_processor
processor.process()
File "/home/mm/venv37-ocrd/sub-venv/headless-tf1/lib/python3.7/site-packages/ocrd_segment/repair.py", line 188, in process
padding=self.parameter['sanitize_padding'])
File "/home/mm/venv37-ocrd/sub-venv/headless-tf1/lib/python3.7/site-packages/ocrd_segment/repair.py", line 559, in shrink_regions
if len(contour) >= 3], scale=scale)
File "/home/mm/venv37-ocrd/sub-venv/headless-tf1/lib/python3.7/site-packages/ocrd_segment/project.py", line 179, in join_polygons
jointp = unary_union(polygons)
File "/home/mm/venv37-ocrd/sub-venv/headless-tf1/lib/python3.7/site-packages/shapely/ops.py", line 161, in unary_union
return geom_factory(lgeos.methods['unary_union'](collection))
File "/home/mm/venv37-ocrd/sub-venv/headless-tf1/lib/python3.7/site-packages/shapely/geometry/base.py", line 73, in geom_factory
raise ValueError("No Shapely geometry can be created from null value")
ValueError: No Shapely geometry can be created from null value
This is the input image:
