docker-stack-wait Timeout in case of one-shot container

I have 7 services in my docker-compose.swarm.yml. Everything run as a daemon except storage-minio-client service. The storage-minio-client acts like a one-shot container - executes command defined by an entrypoint and exits.

storage-minio-client:
    image: ${CI_REGISTRY}/service/storage/minio:client
    networks:
      - local
    deploy:
      restart_policy:
        condition: on-failure
      placement:
        constraints:
          - node.labels.worker-stateless == 1

The image was build from Dockerfile:

FROM bitnami/minio-client

COPY setup-buckets.sh /tmp/setup-buckets.sh

ENTRYPOINT ["/tmp/setup-buckets.sh"]

The setup-buckets.sh creates buckets and exits with code 0.

This kind of container causes docker-stack-wait to timeout, however services were deployed successfully. Also, Portainer shows complete status for the storage-minio-client service.

Logs from the script:

13:19:16  Status: Downloaded newer image for sudobmitch/docker-stack-wait:latest
13:19:17  Service storage-minio3 state: deployed
13:19:17  Service storage-minio4 state: deployed
13:19:17  Service storage-minio1 state: deployed
13:19:17  Service storage-minio-client state: replicating 0/1
13:19:17  Service storage-minio2 state: deployed
13:19:17  Service storage-nginx state: deployed
13:19:17  Service storage-php-web state: deployed
14:19:24  Error: Timeout exceeded

Looks like the script expects all services are in running state.

P.S. I found a workaround - added sleep 300 in my bash script that keeps the storage-minio-client container running, so the docker-stack-wait finishes successfully.

P.P.S. Similar issue was reported earlier.

Sep 09 '22 13:09 marden

I don't know a good way to differentiate between an expected exit, an unexpected exit, and a container that hasn't started yet. PR's welcome, but it should gracefully handle the different scenarios.

Ideally docker stack deploy would support mode: replicated-job, but that hasn't happened yet. Until then, filtering out the services you aren't interested in tracking may be a better option. E.g.: if you set the label wait-service: "true" on your services, you can then run

docker-stack-wait.sh -f label=wait-service=true $stack_name

Sep 10 '22 14:09 sudo-bmitch

I know about the filter option. But it's not suitable in my case because the docker-stack-wait is used in a CI/CD process to handle universal preparation for dozens of services (and it's increasing), and 99% of them need to be marked with a label to make the script wait. A developer (especially, newbie) should always remember that feature.

I think the better option is to introduce a new argument - ignore, so the script can skip services with an appropriate label. In this case all I need to do is to set a label on a few specific containers.

Sep 11 '22 20:09 marden

Any updates on this?

Nov 09 '22 12:11 marden

None here. I haven't seen any PRs that implement this while also differentiating between an expected exit, an unexpected exit, and a container that hasn't started yet.

Nov 09 '22 16:11 sudo-bmitch

Well, it's hard to reproduce the exact case.

As I mentioned before, the new argument (i.e. -ignore or -skip) that is the opposite of -f can help. It tells the script not to wait for services that are marked with a label.

Let me give you a situation - I have 7 services and I need to wait for 6 of them. Compare 2 configs:

docker-stack-wait.sh -f label=deploy.wait=true

services:
  srv-1:
      deploy:
        labels:
            wait: "true"
  srv-2:
      deploy:
        labels:
            wait: "true"
  srv-3:
      # don't need to wait for
  srv-4:
      deploy:
        labels:
            wait: "true"
  srv-5:
      deploy:
        labels:
            wait: "true"
  srv-6:
      deploy:
        labels:
            wait: "true"
  srv-7:
      deploy:
        labels:
            wait: "true"

Requires 6 labels.

docker-stack-wait.sh -skip label=deploy.wait=false

services:
  srv-1:
      ...
  srv-2:
      ...
  srv-3:
      deploy:
        labels:
            wait: "false"  # mark only this service
  srv-4:
      ...
  srv-5:
      ...
  srv-6:
      ...
  srv-7:
      ...

Requires only 1 label.

Obviously, the second one is much easier to read and maintain.

Nov 11 '22 16:11 marden

@sudo-bmitch replicated-job is supported now by docker stack deploy, but your script still runs into the timeout. So can I assume these jobs are still not supported?

May 25 '23 10:05 pschichtel

@pschichtel see my comment above https://github.com/sudo-bmitch/docker-stack-wait/issues/19#issuecomment-1309058348

May 25 '23 11:05 sudo-bmitch

ah sorry I missed that. In that case I might work on this soon-ish. I'm currently ignoring the affected services as suggested, as it still works out fine in this setup timing-wise, but I'd prefer a proper solution.

May 25 '23 11:05 pschichtel

I'll probably wait and see how these turn out:

https://github.com/docker/cli/pull/4258
https://github.com/docker/cli/pull/4259

Jun 12 '23 15:06 pschichtel