ocrd_all icon indicating copy to clipboard operation
ocrd_all copied to clipboard

Makefile-test for ocrd-network

Open joschrew opened this issue 2 years ago • 7 comments

This adds a test for the ocrd-network with ocrd_all. Inspired by this merged pr. The tests are run with a makefile-task. To run this tests submodules are needed. Currently it is also based on multiple PR's because the Dockerfiles for the processors have to be modified.

Run the tests:

  • I used a vm with ubuntu 22.04, 8 CPU and 16 GB Mem for making sure it works. Nearly fresh vm, no preinstalled images etc.

clone this PR and fetch necessary submodules - git clone -b docker-tests https://github.com/OCR-D/ocrd_all.git - cd ocrd_all - git submodule update --recursive --init core/ ocrd_olena/ ocrd_anybaseocr/ ocrd_cis/

build the used images the first time: - docker-compose --file core/tests/network/docker-compose.yml --file tests/network/docker-compose.yml build ocrd-olena-binarize ocrd-anybaseocr-crop ocrd-cis-ocropy-denoise ocrd-cis-ocropy-clip ocrd-cis-ocropy-segment ocrd-cis-ocropy-dewarp - this should only be necessary once to build the used images - this takes on my machine about 40 Minutes

run the tests: make integration-test DOCKER_COMPOSE=docker-compose (or else: make integration-test)

Open questions/tasks:

  • Submodule-PR's should probably be merged before this PR
  • docker-compose could be changed to download processor-images (if made available with publishing images for the processors) instead of building with path to Dockerfiles
  • improve test invocation
  • ~~docker-compose currently uses for the core-test-container, which is used to run the tests, a self created Image because the test are not yet available in the published image (just a PR atm)~~

joschrew avatar Feb 08 '24 13:02 joschrew

  • this has to be done in an external ocrd-core copy (of the PR), it does not work as a submodule IIUC
  • Reason: when building, make assets tries to sync the submodule which does not work when core itself is a submodule
  • this can be skipped when the core-PR is merged and included / available in core image (see open tasks below)

you can set or pass NO_UPDATE=1, as documented in the readme and setup guide, to prevent any git submodule actions during make recipes.

bertsky avatar Feb 09 '24 19:02 bertsky

  • docker-compose could be changed to download processor-images (if made available with publishing images for the processors) instead of building with path to Dockerfiles

I agree, but ideally the compose file offers both ways, so the user can decide whether to re-build (docker compose build or docker compose up --build) or to download (docker compose pull or docker compose up --pull always). We should finally make all modules publish under Dockerhub or GHCR, as originally proposed here ...

bertsky avatar Feb 09 '24 19:02 bertsky

I believe we should try to generate the docker-compose files per module from their respective ocrd-tool.json (or the ocrd-all-tool.json) and then include all sub-compose files into a single top-level one, with some shared definitions, networks and volumes.

Generating could be done via another makefile rule. (But for the registry image names, we would need some additional field in the tool jsons, since it's unlikely these will become fully homogeneous and predictable.)

bertsky avatar Feb 09 '24 19:02 bertsky

  • docker-compose currently uses for the core-test-container, which is used to run the tests, a self created Image because the test are not yet available in the published image (just a PR atm)

you mean the ocrd_core_test stage of the core/Dockerfile? I don't think we should publish this as a prebuilt image – we can just clone the repo and build here. Same could be done for test stages of other modules.

bertsky avatar Feb 09 '24 19:02 bertsky

Thanks for your help/input.

May last commit updated the submodules (-PR's) and now the core image used is build with core/Dockerfile instead of building separately

joschrew avatar Feb 13 '24 10:02 joschrew

I believe we should try to generate the docker-compose files per module from their respective ocrd-tool.json (or the ocrd-all-tool.json) and then include all sub-compose files into a single top-level one, with some shared definitions, networks and volumes.

Generating could be done via another makefile rule. (But for the registry image names, we would need some additional field in the tool jsons, since it's unlikely these will become fully homogeneous and predictable.)

Another problem to address is persistent model storage. In #378 we already established that we need named volumes. These could also be shared via compose file, so all module containers contribute to a single ocrd-resources/ on the host. But (apart from the respective definitions in the ocrd_all/docker-compose.yml) we need to have the respective volume definitions (and XDG_DATA_HOME set) in the Dockerfiles of the modules. So far, there is no convention for that. I suggest we use /models everywhere (by symlinking $XDG_DATA_HOME/ocrd-resources to /models as was recently done in ocrd_all).

bertsky avatar Feb 14 '24 12:02 bertsky

Another problem to address is persistent model storage. In #378 we already established that we need named volumes. These could also be shared via compose file, so all module containers contribute to a single ocrd-resources/ on the host. But (apart from the respective definitions in the ocrd_all/docker-compose.yml) we need to have the respective volume definitions (and XDG_DATA_HOME set) in the Dockerfiles of the modules. So far, there is no convention for that. I suggest we use /models everywhere (by symlinking $XDG_DATA_HOME/ocrd-resources to /models as was recently done in ocrd_all).

I also find this important since core.ocrd_network and Operandi project heavily depend on a general way to reference ocrd processor models via volume mapping.

MehmedGIT avatar Feb 14 '24 13:02 MehmedGIT