Timeout in case of one-shot container
I have 7 services in my docker-compose.swarm.yml. Everything run as a daemon except storage-minio-client service.
The storage-minio-client acts like a one-shot container - executes command defined by an entrypoint and exits.
storage-minio-client:
image: ${CI_REGISTRY}/service/storage/minio:client
networks:
- local
deploy:
restart_policy:
condition: on-failure
placement:
constraints:
- node.labels.worker-stateless == 1
The image was build from Dockerfile:
FROM bitnami/minio-client
COPY setup-buckets.sh /tmp/setup-buckets.sh
ENTRYPOINT ["/tmp/setup-buckets.sh"]
The setup-buckets.sh creates buckets and exits with code 0.
This kind of container causes docker-stack-wait to timeout, however services were deployed successfully.
Also, Portainer shows complete status for the storage-minio-client service.
Logs from the script:
13:19:16 Status: Downloaded newer image for sudobmitch/docker-stack-wait:latest
13:19:17 Service storage-minio3 state: deployed
13:19:17 Service storage-minio4 state: deployed
13:19:17 Service storage-minio1 state: deployed
13:19:17 Service storage-minio-client state: replicating 0/1
13:19:17 Service storage-minio2 state: deployed
13:19:17 Service storage-nginx state: deployed
13:19:17 Service storage-php-web state: deployed
14:19:24 Error: Timeout exceeded
Looks like the script expects all services are in running state.
P.S. I found a workaround - added sleep 300 in my bash script that keeps the storage-minio-client container running, so the docker-stack-wait finishes successfully.
P.P.S. Similar issue was reported earlier.
I don't know a good way to differentiate between an expected exit, an unexpected exit, and a container that hasn't started yet. PR's welcome, but it should gracefully handle the different scenarios.
Ideally docker stack deploy would support mode: replicated-job, but that hasn't happened yet. Until then, filtering out the services you aren't interested in tracking may be a better option. E.g.: if you set the label wait-service: "true" on your services, you can then run
docker-stack-wait.sh -f label=wait-service=true $stack_name
I know about the filter option. But it's not suitable in my case because the docker-stack-wait is used in a CI/CD process to handle universal preparation for dozens of services (and it's increasing), and 99% of them need to be marked with a label to make the script wait. A developer (especially, newbie) should always remember that feature.
I think the better option is to introduce a new argument - ignore, so the script can skip services with an appropriate label. In this case all I need to do is to set a label on a few specific containers.
Any updates on this?
None here. I haven't seen any PRs that implement this while also differentiating between an expected exit, an unexpected exit, and a container that hasn't started yet.
Well, it's hard to reproduce the exact case.
As I mentioned before, the new argument (i.e. -ignore or -skip) that is the opposite of -f can help. It tells the script not to wait for services that are marked with a label.
Let me give you a situation - I have 7 services and I need to wait for 6 of them. Compare 2 configs:
-
docker-stack-wait.sh -f label=deploy.wait=true
services:
srv-1:
deploy:
labels:
wait: "true"
srv-2:
deploy:
labels:
wait: "true"
srv-3:
# don't need to wait for
srv-4:
deploy:
labels:
wait: "true"
srv-5:
deploy:
labels:
wait: "true"
srv-6:
deploy:
labels:
wait: "true"
srv-7:
deploy:
labels:
wait: "true"
Requires 6 labels.
-
docker-stack-wait.sh -skip label=deploy.wait=false
services:
srv-1:
...
srv-2:
...
srv-3:
deploy:
labels:
wait: "false" # mark only this service
srv-4:
...
srv-5:
...
srv-6:
...
srv-7:
...
Requires only 1 label.
Obviously, the second one is much easier to read and maintain.
@sudo-bmitch replicated-job is supported now by docker stack deploy, but your script still runs into the timeout. So can I assume these jobs are still not supported?
@pschichtel see my comment above https://github.com/sudo-bmitch/docker-stack-wait/issues/19#issuecomment-1309058348
ah sorry I missed that. In that case I might work on this soon-ish. I'm currently ignoring the affected services as suggested, as it still works out fine in this setup timing-wise, but I'd prefer a proper solution.
I'll probably wait and see how these turn out:
- https://github.com/docker/cli/pull/4258
- https://github.com/docker/cli/pull/4259