docspell icon indicating copy to clipboard operation
docspell copied to clipboard

Processing queue problems using S3 MinIO Backend

Open acaciochinato opened this issue 2 years ago • 10 comments

Hello!

I'm testing a new setup with docker and MinIO to serve as object storage but all the files I upload stay in the process queue waiting to be processed for hours. Maybe I'm missing some settings in the docker-compose envs?

Using the default docker-compose provided, I removed all the dbs envs and inserted this:

      - DOCSPELL_SERVER_BACKEND_FILES_DEFAULT__STORE=minio
      - DOCSPELL_SERVER_BACKEND_FILES_STORES_MINIO_ACCESS__KEY=test
      - DOCSPELL_SERVER_BACKEND_FILES_STORES_MINIO_BUCKET=test
      - DOCSPELL_SERVER_BACKEND_FILES_STORES_MINIO_ENABLED=true
      - DOCSPELL_SERVER_BACKEND_FILES_STORES_MINIO_ENDPOINT=http://minio:9000
      - DOCSPELL_SERVER_BACKEND_FILES_STORES_MINIO_SECRET__KEY=
      - DOCSPELL_SERVER_BACKEND_FILES_STORES_MINIO_TYPE=s3

Here's my compose:

version: '3.8'
services:
  restserver:
    image: docspell/restserver:latest
    container_name: docspell-restserver
    restart: unless-stopped
    ports:
      - "7880:7880"
    environment:
      - TZ=Europe/Berlin
      - DOCSPELL_SERVER_INTERNAL__URL=http://docspell-restserver:7880
      - DOCSPELL_SERVER_ADMIN__ENDPOINT_SECRET=admin123
      - DOCSPELL_SERVER_AUTH_SERVER__SECRET=
      - DOCSPELL_SERVER_BIND_ADDRESS=0.0.0.0
      - DOCSPELL_SERVER_FULL__TEXT__SEARCH_ENABLED=true
      - DOCSPELL_SERVER_FULL__TEXT__SEARCH_SOLR_URL=http://docspell-solr:8983/solr/docspell
      - DOCSPELL_SERVER_INTEGRATION__ENDPOINT_ENABLED=true
      - DOCSPELL_SERVER_INTEGRATION__ENDPOINT_HTTP__HEADER_ENABLED=true
      - DOCSPELL_SERVER_INTEGRATION__ENDPOINT_HTTP__HEADER_HEADER__VALUE=integration-password123
      - DOCSPELL_SERVER_BACKEND_SIGNUP_MODE=open
      - DOCSPELL_SERVER_BACKEND_SIGNUP_NEW__INVITE__PASSWORD=
      - DOCSPELL_SERVER_BACKEND_ADDONS_ENABLED=false
      - DOCSPELL_SERVER_BACKEND_FILES_DEFAULT__STORE=minio
      - DOCSPELL_SERVER_BACKEND_FILES_STORES_MINIO_ACCESS__KEY=test
      - DOCSPELL_SERVER_BACKEND_FILES_STORES_MINIO_BUCKET=test
      - DOCSPELL_SERVER_BACKEND_FILES_STORES_MINIO_ENABLED=true
      - DOCSPELL_SERVER_BACKEND_FILES_STORES_MINIO_ENDPOINT=http://minio:9000
      - DOCSPELL_SERVER_BACKEND_FILES_STORES_MINIO_SECRET__KEY=
      - DOCSPELL_SERVER_BACKEND_FILES_STORES_MINIO_TYPE=s3
    depends_on:
      - solr
  joex:
    image: docspell/joex:latest
    container_name: docspell-joex
    restart: unless-stopped
    environment:
      - TZ=Europe/Berlin
      - DOCSPELL_JOEX_APP__ID=joex1
      - DOCSPELL_JOEX_PERIODIC__SCHEDULER_NAME=joex1
      - DOCSPELL_JOEX_SCHEDULER_NAME=joex1
      - DOCSPELL_JOEX_BASE__URL=http://localhost:7878
      - DOCSPELL_JOEX_BIND_ADDRESS=0.0.0.0
      - DOCSPELL_JOEX_FULL__TEXT__SEARCH_ENABLED=true
      - DOCSPELL_JOEX_FULL__TEXT__SEARCH_SOLR_URL=http://docspell-solr:8983/solr/docspell
      - DOCSPELL_JOEX_ADDONS_EXECUTOR__CONFIG_RUNNER=docker,trivial
      - DOCSPELL_JOEX_CONVERT_HTML__CONVERTER=weasyprint
      - DOCSPELL_SERVER_BACKEND_ADDONS_ENABLED=false
      - DOCSPELL_SERVER_BACKEND_FILES_DEFAULT__STORE=minio
      - DOCSPELL_SERVER_BACKEND_FILES_STORES_MINIO_ACCESS__KEY=test
      - DOCSPELL_SERVER_BACKEND_FILES_STORES_MINIO_BUCKET=test
      - DOCSPELL_SERVER_BACKEND_FILES_STORES_MINIO_ENABLED=true
      - DOCSPELL_SERVER_BACKEND_FILES_STORES_MINIO_ENDPOINT=http://minio:9000
      - DOCSPELL_SERVER_BACKEND_FILES_STORES_MINIO_SECRET__KEY=
      - DOCSPELL_SERVER_BACKEND_FILES_STORES_MINIO_TYPE=s3   
    ports:
      - "7878:7878"
    depends_on:
      - solr
  consumedir:
    image: docspell/dsc:latest
    container_name: docspell-consumedir
    command:
      - dsc
      - "-d"
      - "http://docspell-restserver:7880"
      - "watch"
      - "--delete"
      - "-ir"
      - "--not-matches"
      - "**/.*"
      - "--header"
      - "Docspell-Integration:integration-password123"
      - "/opt/docs"
    restart: unless-stopped
    volumes:
      - ./docs:/opt/docs
    depends_on:
      - restserver
  solr:
    image: solr:9
    container_name: docspell-solr
    restart: unless-stopped
    volumes:
      - docspell-solr_data:/var/solr
    command:
      - bash
      - -c
      - 'precreate-core docspell; exec solr -f -Dsolr.modules=analysis-extras'
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8983/solr/docspell/admin/ping"]
      interval: 1m
      timeout: 10s
      retries: 2
      start_period: 30s

volumes:
  docspell-solr_data:

acaciochinato avatar Apr 12 '23 03:04 acaciochinato

Thank you for the detailed report! There must be something going wrong in the file store library. I'll take a look as soon as I can.

eikek avatar Apr 12 '23 05:04 eikek

@eikek @acaciochinato it looks like you have a bug in your docker-compose: Joex environment variables should start with: DOCSPELL_JOEX_ instead of DOCSPELL_SERVER_

This is my docker-compose and it works fine (the only difference is the usage of environment variables):

  restserver:
    image: docspell/restserver:latest
    container_name: docspell-restserver
    restart: unless-stopped
    ports:
      - "7880:7880"
    environment:
      - TZ=Europe/Berlin
      - DOCSPELL_SERVER_INTERNAL__URL=http://docspell-restserver:7880
      - DOCSPELL_SERVER_ADMIN__ENDPOINT_SECRET=<redacted>
      - DOCSPELL_SERVER_AUTH_SERVER__SECRET=
      - DOCSPELL_SERVER_BACKEND_JDBC_PASSWORD=<redacted>
      - DOCSPELL_SERVER_BACKEND_JDBC_URL=jdbc:mariadb://services1:3306/docspell
      - DOCSPELL_SERVER_BACKEND_JDBC_USER=docspell
      - DOCSPELL_SERVER_BIND_ADDRESS=0.0.0.0
      - DOCSPELL_SERVER_FULL__TEXT__SEARCH_ENABLED=true
      - DOCSPELL_SERVER_FULL__TEXT__SEARCH_SOLR_URL=http://docspell-solr:8983/solr/docspell
      - DOCSPELL_SERVER_INTEGRATION__ENDPOINT_ENABLED=true
      - DOCSPELL_SERVER_INTEGRATION__ENDPOINT_HTTP__HEADER_ENABLED=true
      - DOCSPELL_SERVER_INTEGRATION__ENDPOINT_HTTP__HEADER_HEADER__VALUE=<redacted>
      - DOCSPELL_SERVER_BACKEND_SIGNUP_MODE=open
      - DOCSPELL_SERVER_BACKEND_SIGNUP_NEW__INVITE__PASSWORD=
      - DOCSPELL_SERVER_BACKEND_ADDONS_ENABLED=false
      - DOCSPELL_SERVER_BACKEND_FILES_DEFAULT__STORE=minio
      - DOCSPELL_SERVER_BACKEND_FILES_STORES_MINIO_ACCESS__KEY=<redacted>
      - DOCSPELL_SERVER_BACKEND_FILES_STORES_MINIO_BUCKET=docspell
      - DOCSPELL_SERVER_BACKEND_FILES_STORES_MINIO_ENABLED=true
      - DOCSPELL_SERVER_BACKEND_FILES_STORES_MINIO_ENDPOINT=http://services1:9001
      - DOCSPELL_SERVER_BACKEND_FILES_STORES_MINIO_SECRET__KEY=<redacted>
      - DOCSPELL_SERVER_BACKEND_FILES_STORES_MINIO_TYPE=s3
    depends_on:
      - solr

  joex:
    image: docspell/joex:latest
    container_name: docspell-joex
    ## For more memory add corresponding arguments, like below. Also see
    ## https://docspell.org/docs/configure/#jvm-options
    # command:
    #   - -J-Xmx3G
    restart: unless-stopped
    environment:
      - TZ=Europe/Berlin
      - DOCSPELL_JOEX_APP__ID=joex1
      - DOCSPELL_JOEX_PERIODIC__SCHEDULER_NAME=joex1
      - DOCSPELL_JOEX_SCHEDULER_NAME=joex1
      - DOCSPELL_JOEX_BASE__URL=http://docspell-joex:7878
      - DOCSPELL_JOEX_BIND_ADDRESS=0.0.0.0
      - DOCSPELL_JOEX_FULL__TEXT__SEARCH_ENABLED=true
      - DOCSPELL_JOEX_FULL__TEXT__SEARCH_SOLR_URL=http://docspell-solr:8983/solr/docspell
      - DOCSPELL_JOEX_JDBC_PASSWORD=<redacted>
      - DOCSPELL_JOEX_JDBC_URL=jdbc:mariadb://services1:3306/docspell
      - DOCSPELL_JOEX_JDBC_USER=docspell
      - DOCSPELL_JOEX_ADDONS_EXECUTOR__CONFIG_RUNNER=docker,trivial
      - DOCSPELL_JOEX_CONVERT_HTML__CONVERTER=weasyprint
      - DOCSPELL_JOEX_FILES_DEFAULT__STORE=minio
      - DOCSPELL_JOEX_FILES_STORES_MINIO_ACCESS__KEY=<redacted>
      - DOCSPELL_JOEX_FILES_STORES_MINIO_BUCKET=docspell
      - DOCSPELL_JOEX_FILES_STORES_MINIO_ENABLED=true
      - DOCSPELL_JOEX_FILES_STORES_MINIO_ENDPOINT=http://services1:9001
      - DOCSPELL_JOEX_FILES_STORES_MINIO_SECRET__KEY=<redacted>
      - DOCSPELL_JOEX_FILES_STORES_MINIO_TYPE=s3
    ports:
      - "7878:7878"
    depends_on:
      - solr

I honestly think that the way variables have to be named is confusing, but can't do anything about that.

jan-oratowski avatar Apr 14 '23 09:04 jan-oratowski

@jan-oratowski I tried here changing the variables names but the problem persist. I think that if it was from the variables the joex container would't even show as connected in the logs, right?

acaciochinato avatar Apr 14 '23 13:04 acaciochinato

Note that it should be DOCSPELL_JOEX_FILES, not DOCSPELL_SERVER* nor DOCSPELL_JOEX_BACKEND_FILES

Maybe you have another issue, but in my case joex container was not throwing errors but the processing queue was indeed failing. This issue made me realize the mistake in my configuration.

Also: remember that you need to remove and recreate the container in order for environment variable changes in the docker-compose to apply, IIRC.

alexbarcelo avatar Apr 20 '23 11:04 alexbarcelo

So finally could take a look. I tried reproducing with my dev setup that using v0.40.0 and latest master. I couldn't reproduce it there. 🤷🏼

The finding about the config values are correct: the joex service needs env variables prefixed with DOCSPELL_JOEX and the restserver service needs DOCSPELL_SERVER.

I honestly think that the way variables have to be named is confusing, but can't do anything about that.

What would be your suggestion about naming these variables? The have separate namespaces, because they are intended for very different services. Then their structure currently make up the one in the config file. But always open for other thoughts!

eikek avatar Apr 22 '23 09:04 eikek

Hi! Sorry for the delay. @eikek can you please post here the compose that you used to get it working with the MINIO storage?

I tried copying the compose from @jan-oratowski but the error persist, and nothing comes up in the logs.

acaciochinato avatar Jun 08 '23 00:06 acaciochinato

@acaciochinato you did issue a docker-compose down followed by a docker-compose up, right? because a restart does not update the environment variables. I am assuming that you changed an already existing docker-compose file and the error persists, instead of doing a fresh new docker-compose in a differently named folder and got the same error.

I tried by following @jan-oratowski example (updating with my values, ofc) and it worked for me, so that's why I am thinking on some dirty environment on your side. But that's a wild guess and I may be completely wrong.

alexbarcelo avatar Jun 08 '23 07:06 alexbarcelo

Hi @alexbarcelo! Yes, every test I did I removed the containers and created again. I forgot to mention but I'm using podman in rootless mode instead of docker. Maybe there's something about this?

It's strange though, the containers are running just fine, nothing shows up in the logs. The restserver can successfully upload files to the bucket, but the job stay stuck for processing forever.

acaciochinato avatar Jun 08 '23 13:06 acaciochinato

Hi @acaciochinato I simply used the default docker compose and also my dev setup which doesn't involve docker. Not sure how to dig further. Is docspell and/or minio giving any logs that could be interesting? Can you see where it hangs during processing?

eikek avatar Jun 13 '23 06:06 eikek

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. This only applies to 'question' issues. Always feel free to reopen or create new issues. Thank you!

github-actions[bot] avatar Jun 26 '24 02:06 github-actions[bot]