Processing queue problems using S3 MinIO Backend
Hello!
I'm testing a new setup with docker and MinIO to serve as object storage but all the files I upload stay in the process queue waiting to be processed for hours. Maybe I'm missing some settings in the docker-compose envs?
Using the default docker-compose provided, I removed all the dbs envs and inserted this:
- DOCSPELL_SERVER_BACKEND_FILES_DEFAULT__STORE=minio
- DOCSPELL_SERVER_BACKEND_FILES_STORES_MINIO_ACCESS__KEY=test
- DOCSPELL_SERVER_BACKEND_FILES_STORES_MINIO_BUCKET=test
- DOCSPELL_SERVER_BACKEND_FILES_STORES_MINIO_ENABLED=true
- DOCSPELL_SERVER_BACKEND_FILES_STORES_MINIO_ENDPOINT=http://minio:9000
- DOCSPELL_SERVER_BACKEND_FILES_STORES_MINIO_SECRET__KEY=
- DOCSPELL_SERVER_BACKEND_FILES_STORES_MINIO_TYPE=s3
Here's my compose:
version: '3.8'
services:
restserver:
image: docspell/restserver:latest
container_name: docspell-restserver
restart: unless-stopped
ports:
- "7880:7880"
environment:
- TZ=Europe/Berlin
- DOCSPELL_SERVER_INTERNAL__URL=http://docspell-restserver:7880
- DOCSPELL_SERVER_ADMIN__ENDPOINT_SECRET=admin123
- DOCSPELL_SERVER_AUTH_SERVER__SECRET=
- DOCSPELL_SERVER_BIND_ADDRESS=0.0.0.0
- DOCSPELL_SERVER_FULL__TEXT__SEARCH_ENABLED=true
- DOCSPELL_SERVER_FULL__TEXT__SEARCH_SOLR_URL=http://docspell-solr:8983/solr/docspell
- DOCSPELL_SERVER_INTEGRATION__ENDPOINT_ENABLED=true
- DOCSPELL_SERVER_INTEGRATION__ENDPOINT_HTTP__HEADER_ENABLED=true
- DOCSPELL_SERVER_INTEGRATION__ENDPOINT_HTTP__HEADER_HEADER__VALUE=integration-password123
- DOCSPELL_SERVER_BACKEND_SIGNUP_MODE=open
- DOCSPELL_SERVER_BACKEND_SIGNUP_NEW__INVITE__PASSWORD=
- DOCSPELL_SERVER_BACKEND_ADDONS_ENABLED=false
- DOCSPELL_SERVER_BACKEND_FILES_DEFAULT__STORE=minio
- DOCSPELL_SERVER_BACKEND_FILES_STORES_MINIO_ACCESS__KEY=test
- DOCSPELL_SERVER_BACKEND_FILES_STORES_MINIO_BUCKET=test
- DOCSPELL_SERVER_BACKEND_FILES_STORES_MINIO_ENABLED=true
- DOCSPELL_SERVER_BACKEND_FILES_STORES_MINIO_ENDPOINT=http://minio:9000
- DOCSPELL_SERVER_BACKEND_FILES_STORES_MINIO_SECRET__KEY=
- DOCSPELL_SERVER_BACKEND_FILES_STORES_MINIO_TYPE=s3
depends_on:
- solr
joex:
image: docspell/joex:latest
container_name: docspell-joex
restart: unless-stopped
environment:
- TZ=Europe/Berlin
- DOCSPELL_JOEX_APP__ID=joex1
- DOCSPELL_JOEX_PERIODIC__SCHEDULER_NAME=joex1
- DOCSPELL_JOEX_SCHEDULER_NAME=joex1
- DOCSPELL_JOEX_BASE__URL=http://localhost:7878
- DOCSPELL_JOEX_BIND_ADDRESS=0.0.0.0
- DOCSPELL_JOEX_FULL__TEXT__SEARCH_ENABLED=true
- DOCSPELL_JOEX_FULL__TEXT__SEARCH_SOLR_URL=http://docspell-solr:8983/solr/docspell
- DOCSPELL_JOEX_ADDONS_EXECUTOR__CONFIG_RUNNER=docker,trivial
- DOCSPELL_JOEX_CONVERT_HTML__CONVERTER=weasyprint
- DOCSPELL_SERVER_BACKEND_ADDONS_ENABLED=false
- DOCSPELL_SERVER_BACKEND_FILES_DEFAULT__STORE=minio
- DOCSPELL_SERVER_BACKEND_FILES_STORES_MINIO_ACCESS__KEY=test
- DOCSPELL_SERVER_BACKEND_FILES_STORES_MINIO_BUCKET=test
- DOCSPELL_SERVER_BACKEND_FILES_STORES_MINIO_ENABLED=true
- DOCSPELL_SERVER_BACKEND_FILES_STORES_MINIO_ENDPOINT=http://minio:9000
- DOCSPELL_SERVER_BACKEND_FILES_STORES_MINIO_SECRET__KEY=
- DOCSPELL_SERVER_BACKEND_FILES_STORES_MINIO_TYPE=s3
ports:
- "7878:7878"
depends_on:
- solr
consumedir:
image: docspell/dsc:latest
container_name: docspell-consumedir
command:
- dsc
- "-d"
- "http://docspell-restserver:7880"
- "watch"
- "--delete"
- "-ir"
- "--not-matches"
- "**/.*"
- "--header"
- "Docspell-Integration:integration-password123"
- "/opt/docs"
restart: unless-stopped
volumes:
- ./docs:/opt/docs
depends_on:
- restserver
solr:
image: solr:9
container_name: docspell-solr
restart: unless-stopped
volumes:
- docspell-solr_data:/var/solr
command:
- bash
- -c
- 'precreate-core docspell; exec solr -f -Dsolr.modules=analysis-extras'
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8983/solr/docspell/admin/ping"]
interval: 1m
timeout: 10s
retries: 2
start_period: 30s
volumes:
docspell-solr_data:
Thank you for the detailed report! There must be something going wrong in the file store library. I'll take a look as soon as I can.
@eikek @acaciochinato it looks like you have a bug in your docker-compose: Joex environment variables should start with: DOCSPELL_JOEX_ instead of DOCSPELL_SERVER_
This is my docker-compose and it works fine (the only difference is the usage of environment variables):
restserver:
image: docspell/restserver:latest
container_name: docspell-restserver
restart: unless-stopped
ports:
- "7880:7880"
environment:
- TZ=Europe/Berlin
- DOCSPELL_SERVER_INTERNAL__URL=http://docspell-restserver:7880
- DOCSPELL_SERVER_ADMIN__ENDPOINT_SECRET=<redacted>
- DOCSPELL_SERVER_AUTH_SERVER__SECRET=
- DOCSPELL_SERVER_BACKEND_JDBC_PASSWORD=<redacted>
- DOCSPELL_SERVER_BACKEND_JDBC_URL=jdbc:mariadb://services1:3306/docspell
- DOCSPELL_SERVER_BACKEND_JDBC_USER=docspell
- DOCSPELL_SERVER_BIND_ADDRESS=0.0.0.0
- DOCSPELL_SERVER_FULL__TEXT__SEARCH_ENABLED=true
- DOCSPELL_SERVER_FULL__TEXT__SEARCH_SOLR_URL=http://docspell-solr:8983/solr/docspell
- DOCSPELL_SERVER_INTEGRATION__ENDPOINT_ENABLED=true
- DOCSPELL_SERVER_INTEGRATION__ENDPOINT_HTTP__HEADER_ENABLED=true
- DOCSPELL_SERVER_INTEGRATION__ENDPOINT_HTTP__HEADER_HEADER__VALUE=<redacted>
- DOCSPELL_SERVER_BACKEND_SIGNUP_MODE=open
- DOCSPELL_SERVER_BACKEND_SIGNUP_NEW__INVITE__PASSWORD=
- DOCSPELL_SERVER_BACKEND_ADDONS_ENABLED=false
- DOCSPELL_SERVER_BACKEND_FILES_DEFAULT__STORE=minio
- DOCSPELL_SERVER_BACKEND_FILES_STORES_MINIO_ACCESS__KEY=<redacted>
- DOCSPELL_SERVER_BACKEND_FILES_STORES_MINIO_BUCKET=docspell
- DOCSPELL_SERVER_BACKEND_FILES_STORES_MINIO_ENABLED=true
- DOCSPELL_SERVER_BACKEND_FILES_STORES_MINIO_ENDPOINT=http://services1:9001
- DOCSPELL_SERVER_BACKEND_FILES_STORES_MINIO_SECRET__KEY=<redacted>
- DOCSPELL_SERVER_BACKEND_FILES_STORES_MINIO_TYPE=s3
depends_on:
- solr
joex:
image: docspell/joex:latest
container_name: docspell-joex
## For more memory add corresponding arguments, like below. Also see
## https://docspell.org/docs/configure/#jvm-options
# command:
# - -J-Xmx3G
restart: unless-stopped
environment:
- TZ=Europe/Berlin
- DOCSPELL_JOEX_APP__ID=joex1
- DOCSPELL_JOEX_PERIODIC__SCHEDULER_NAME=joex1
- DOCSPELL_JOEX_SCHEDULER_NAME=joex1
- DOCSPELL_JOEX_BASE__URL=http://docspell-joex:7878
- DOCSPELL_JOEX_BIND_ADDRESS=0.0.0.0
- DOCSPELL_JOEX_FULL__TEXT__SEARCH_ENABLED=true
- DOCSPELL_JOEX_FULL__TEXT__SEARCH_SOLR_URL=http://docspell-solr:8983/solr/docspell
- DOCSPELL_JOEX_JDBC_PASSWORD=<redacted>
- DOCSPELL_JOEX_JDBC_URL=jdbc:mariadb://services1:3306/docspell
- DOCSPELL_JOEX_JDBC_USER=docspell
- DOCSPELL_JOEX_ADDONS_EXECUTOR__CONFIG_RUNNER=docker,trivial
- DOCSPELL_JOEX_CONVERT_HTML__CONVERTER=weasyprint
- DOCSPELL_JOEX_FILES_DEFAULT__STORE=minio
- DOCSPELL_JOEX_FILES_STORES_MINIO_ACCESS__KEY=<redacted>
- DOCSPELL_JOEX_FILES_STORES_MINIO_BUCKET=docspell
- DOCSPELL_JOEX_FILES_STORES_MINIO_ENABLED=true
- DOCSPELL_JOEX_FILES_STORES_MINIO_ENDPOINT=http://services1:9001
- DOCSPELL_JOEX_FILES_STORES_MINIO_SECRET__KEY=<redacted>
- DOCSPELL_JOEX_FILES_STORES_MINIO_TYPE=s3
ports:
- "7878:7878"
depends_on:
- solr
I honestly think that the way variables have to be named is confusing, but can't do anything about that.
@jan-oratowski I tried here changing the variables names but the problem persist. I think that if it was from the variables the joex container would't even show as connected in the logs, right?
Note that it should be DOCSPELL_JOEX_FILES, not DOCSPELL_SERVER* nor DOCSPELL_JOEX_BACKEND_FILES
Maybe you have another issue, but in my case joex container was not throwing errors but the processing queue was indeed failing. This issue made me realize the mistake in my configuration.
Also: remember that you need to remove and recreate the container in order for environment variable changes in the docker-compose to apply, IIRC.
So finally could take a look. I tried reproducing with my dev setup that using v0.40.0 and latest master. I couldn't reproduce it there. 🤷🏼
The finding about the config values are correct: the joex service needs env variables prefixed with DOCSPELL_JOEX and the restserver service needs DOCSPELL_SERVER.
I honestly think that the way variables have to be named is confusing, but can't do anything about that.
What would be your suggestion about naming these variables? The have separate namespaces, because they are intended for very different services. Then their structure currently make up the one in the config file. But always open for other thoughts!
Hi! Sorry for the delay. @eikek can you please post here the compose that you used to get it working with the MINIO storage?
I tried copying the compose from @jan-oratowski but the error persist, and nothing comes up in the logs.
@acaciochinato you did issue a docker-compose down followed by a docker-compose up, right? because a restart does not update the environment variables. I am assuming that you changed an already existing docker-compose file and the error persists, instead of doing a fresh new docker-compose in a differently named folder and got the same error.
I tried by following @jan-oratowski example (updating with my values, ofc) and it worked for me, so that's why I am thinking on some dirty environment on your side. But that's a wild guess and I may be completely wrong.
Hi @alexbarcelo! Yes, every test I did I removed the containers and created again. I forgot to mention but I'm using podman in rootless mode instead of docker. Maybe there's something about this?
It's strange though, the containers are running just fine, nothing shows up in the logs. The restserver can successfully upload files to the bucket, but the job stay stuck for processing forever.
Hi @acaciochinato I simply used the default docker compose and also my dev setup which doesn't involve docker. Not sure how to dig further. Is docspell and/or minio giving any logs that could be interesting? Can you see where it hangs during processing?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. This only applies to 'question' issues. Always feel free to reopen or create new issues. Thank you!