[BUG] Worker process exits without doing anything
Describe the bug
The celery process for the worker exits after a few seconds without doing anything (it seems to run similar actions to the app, up to the point when it shows IRIS IS READY on port ..., and then displays an additional warning from celery since it does not like running as root before exiting). Afterwards, the container simply runs the while true; sleep 2; done loop.
To Reproduce Steps to reproduce the behavior:
- Create a
.envfile (checked out on v.2.4.20), update the passwords as mentioned in the guide (i.e.,POSTGRES_PASSWORD, etc., although unlikely to cause any differences), and remove the image tag variables to download the correct image versions (v2.4.20, instead oflatest); - Start the containers in Docker;
- Wait a minute for the containers to start;
- Inspect the worker container's processes using
docker compose exec worker ps -ef; - Observe that the only running processes are
bash(running the entrypoint script) andsleep 2, butceleryis missing.
Expected behavior The worker process should continue running (presumably)?
Server:
- OS: AlmaLinux 9.5 with the latest package updates installed
I had the same symptoms and finally figured out what was the issue after countless hours of troubleshooting. Not sure if this is the same root cause for you, but on my instance this was caused by the postgres password that I had set in my .env file that contained special chars that weren't correctly handled by the worker.
The issue was solved on my side by changing the password directly into postgres and in the .env file:
docker exec -it iriswebapp_db bash
psql -U postgres
ALTER USER postgres WITH PASSWORD 'new_secure_password';
\q
exit
Thanks for very much for the reply. Terribly sorry for not posting a response earlier and you also had to spend time debugging; I thought it only affected me since no one else had asked about it before.
I've also figured it out, and planned to make a PR with the corrections / add some checks for it (a check for the password to be alphanumeric and longer than 30 characters), but I didn't get to it; I'd planned it for a bit, and got to work on it yesterday, but had to do some rebasing, minor bug fixes, and other tests before opening the PRs.
The check also seems to be in line with the iris_helper.sh script they have added in the meantime, that hexdumps 16 bytes of random data (generates 32 characters). I will also add a PR on the docs to propose using a variety of cat /dev/urandom | tr -dc '[:alnum:]' | head -c 30, openssl rand -base64 300 | tr -dc '[:alnum:]' | head -c 30, or even what they propose using in helper script openssl rand -hex 16.
The issue seems to be with Celery - https://github.com/celery/celery/issues/1553, not IRIS itself. I've also attempted to add URL encoding on the password in the IRIS config script, but somewhere, somehow, there's a Python string interpolation error caused by the %XX encoding down the line, and it throws an exception. Converting the base64 password to URL-safe base64 (which changes the +/ characters to -_) could also be an option, but it does not fix the issue if other special characters are present.