for-linux Docker container not responding to anything

[x] This is a bug report
[ ] This is a feature request
[ ] I searched existing issues before opening this one

Expected behavior

Docker container should run smoothly.

Actual behavior

Sometimes docker containers in the system not responds to anything and if we try to access it will always show up and running.

Steps to reproduce the behavior

Create a custom image with node:10.11.0 as a base image and any sample node project which listens on port 4000.

#Step 1.
FROM node:10.11.0

#Step 2
LABEL version="1.0"

#Step 3.
RUN mkdir -p /usr/src/core-api
WORKDIR /usr/src/core-api

#Step 4.
COPY package*.json ./
COPY tsconfig*.json ./
COPY tslint*.json ./

#Step 6..
RUN npm install pm2 -g

#Step 7
RUN cd /usr/src/core-api && npm install --production

#Step 8
COPY ./dist ./dist

#Step 9.
EXPOSE 4000

#Step 10.
CMD npm run start-docker

Push the custom image image to the repository.
Run the image with docker compose.

version: '2'
services:
  core:
    image: libsynadmin/libsynmp:core-api-staging-0.6.0.1
    restart: always
    volumes:
      - /home/rspurohit/core-api/public/podcast-images:/usr/src/core-api/dist/libsyn-mp.core/src/public/podcast-images
      - /home/rspurohit/core-api/public/campaign-documents:/usr/src/core-api/dist/libsyn-mp.core/src/public/campaign-documents
      - /home/rspurohit/core-api/public/network-images:/usr/src/core-api/dist/libsyn-mp.core/src/public/network-images
      - /home/rspurohit/core-api/public/user-profile-images:/usr/src/core-api/dist/libsyn-mp.core/src/public/user-profile-images
      - /home/rspurohit/core-api/public/smart-proposals:/usr/src/core-api/dist/libsyn-mp.core/src/public/smart-proposals
      - /home/rspurohit/core-api/public/insertion-orders:/usr/src/core-api/dist/libsyn-mp.core/src/public/insertion-orders
      - /home/rspurohit/core-api/public/invoice:/usr/src/core-api/dist/libsyn-mp.core/src/public/invoice
      - /home/rspurohit/core-api/public/payslip:/usr/src/core-api/dist/libsyn-mp.core/src/public/payslip
    ports:
      - '4000:4000'
      - '3000:3000'
    network_mode: bridge
    environment:
      - NODE_ENV=development-local

Container will be up and running but sometimes it stops reponding and we don't get any output from if we hit the url http://loalhost:4000. Eventhough container and docker are up and running.
There are 5 containers running in my system with same above configuration but different services and all of them stops responding at the same time.
Once i restart the docker service everything starts working noraml.

Output of docker version:

Client:
 Version:      18.03.1-ce
 API version:  1.37
 Go version:   go1.9.5
 Git commit:   9ee9f40
 Built:        Thu Apr 26 07:17:20 2018
 OS/Arch:      linux/amd64
 Experimental: false
 Orchestrator: swarm

Server:
 Engine:
  Version:      18.03.1-ce
  API version:  1.37 (minimum version 1.12)
  Go version:   go1.9.5
  Git commit:   9ee9f40
  Built:        Thu Apr 26 07:15:30 2018
  OS/Arch:      linux/amd64
  Experimental: false

Output of docker info:

Containers: 6
 Running: 5
 Paused: 0
 Stopped: 1
Images: 13
Server Version: 18.03.1-ce
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 773c489c9c1b21a6d78b5c538cd395416ec50f88
runc version: 4fc53a81fb7c994640722ac585fa9ca548971871
init version: 949e6fa
Security Options:
 apparmor
 seccomp
  Profile: default
Kernel Version: 4.4.0-128-generic
Operating System: Ubuntu 16.04.4 LTS
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 7.702GiB
Name: lsyn-bgqa
ID: RFOD:4VQN:N4DY:IBTV:FADZ:V6CW:6ZMB:FJL2:365D:43FI:WKNP:MI5W
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Username: libsynadmin
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

Additional environment details (AWS, VirtualBox, physical, etc.)

It is a VM created with below details.

NAME="Ubuntu" VERSION="16.04.4 LTS (Xenial Xerus)" ID=ubuntu ID_LIKE=debian PRETTY_NAME="Ubuntu 16.04.4 LTS" VERSION_ID="16.04" HOME_URL="http://www.ubuntu.com/" SUPPORT_URL="http://help.ubuntu.com/" BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/" VERSION_CODENAME=xenial UBUNTU_CODENAME=xenial

Dec 07 '18 06:12 Prince269090

The same happens regularly with different containers on my machine for Docker version 18.09.2 and Ubuntu 18.04. The only solution is to stop docker service and then start it back.

Feb 20 '19 13:02 heroInCommunity

We experienced the same: rare events (like once in 2 months of running) for one of containers in a Swarm on RHEL 7. Docker is CE v20.10.6. The symptoms are:

The container stops responding on any of the ports.
Restart of a container or the Swarm doesn't help.
Rebuild of the image doesn't help.
Restart of the Docker service solves the issue.

This is quite scary.

May 21 '21 22:05 DKroot

The issue still exists. Restarting anything doesn't help. Did anyone found a solution for this?

Aug 17 '22 19:08 FairyTail2000

Same issue coming to me Restarting docker container solves the issue but it is scary

Oct 14 '22 08:10 anandkhotpccoe

We notice this with Docker version 20.10.14, build a224086 on Ubuntu 18.04.6 LTS.

Jan 16 '23 09:01 JoepCodemijn

Also on Docker version 20.10.8, build 3967b7d on Ubuntu 18.04.3 LTS Same symptoms mentioned by @DKroot

Feb 01 '23 09:02 GC-Elia

@GC-Elia Could you paste the output of docker info and dump a stack trace log file (see the procedure here) and attach it here?

Feb 03 '23 17:02 akerouanton

@akerouanton docker info was unresponsive, just like any other docker command except docker ps I will update with the stack trace next time it happens (already restarted the service)

Feb 04 '23 19:02 GC-Elia

did you guys figured out, what would be the issue ? do you know a troubleshooting guide for docker containers ? In my case, the process is running, even show some CPU usage, but the docker container is not responding to any command for that particullary container. The other containers are accessible.

Mar 07 '23 00:03 danipopa

I have the same problem with Docker version 20.10.17, build 100c701 Linux xxx 5.15.0-41-generic #44~20.04.1-Ubuntu SMP Fri Jun 24 13:27:29 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

Can a deadlock within the container cause this? My container is completely unresponsive. No stop, kill, inspect, logs.

Mar 27 '23 07:03 KarstenVogt

Hello, we're also seeing this in an Openstack environment under a kolla-ansible deployment with docker 20.10.18 under CentOS 7 and 5.4.211-1.el7.elrepo.x86_64 kernel.

hanging container is a OVS vswitchd container 1e6ef5c89390 kolla/centos-source-openvswitch-vswitchd:train "dumb-init --single-…" 6 months ago Up 6 months openvswitch_vswitchd

docker ps and docker info works, but stats restart and exec etc. do not. There are no mentions in dmesg or journalctl of anything regarding to do with this hang for docker or the hanging container.

The processes are still working and we will hang back on dumping a trace until we manage to empty the node of instances at which point we'll dump a stack trace and attach that as well.

Attached docker info and stack trace dump. docker-info.txt goroutine-stacks-2023-05-25T084842Z.log

May 01 '23 13:05 stefanlupsa

After trying to first reload and restart the docker service it hung on the service restart and we had new logs from containerd

May 25 08:56:09 compute16 dockerd: time="2023-05-25T08:56:09.666961806Z" level=info msg="Container failed to exit within 10s of signal 15 - using the force" container=1e6ef5c89390002ae65864a21a5dc0fe60341a352b5921ca40921a85ca23ea93
May 25 08:56:11 compute16 containerd: time="2023-05-25T08:56:11.674275056Z" level=error msg="get state for 1e6ef5c89390002ae65864a21a5dc0fe60341a352b5921ca40921a85ca23ea93" error="context deadline exceeded: unknown"
May 25 08:56:11 compute16 containerd: time="2023-05-25T08:56:11.676327116Z" level=warning msg="unknown status" status=0

only after killing the docker pid could we get the container and docker service fully responding again.

May 25 '23 09:05 stefanlupsa

Same experience with Docker 23.0.1:

Client: Docker Engine - Community
 Version:           23.0.1
 API version:       1.42
 Go version:        go1.19.5
 Git commit:        a5ee5b1
 Built:             Thu Feb  9 19:51:00 2023
 OS/Arch:           linux/amd64
 Context:           default

Server: Docker Engine - Community
 Engine:
  Version:          23.0.1
  API version:      1.42 (minimum version 1.12)
  Go version:       go1.19.5
  Git commit:       bc3805a
  Built:            Thu Feb  9 19:48:42 2023
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.6.16
  GitCommit:        31aa4358a36870b21a992d3ad2bef29e1d693bec
 runc:
  Version:          1.1.4
  GitCommit:        v1.1.4-0-g5fd4c4d
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

System:

Linux ... 5.4.17-2136.315.5.el7uek.x86_64 #2 SMP Wed Dec 21 19:57:57 PST 2022 x86_64 x86_64 x86_64 GNU/Linux

Some people reported here, that restarting docker helped - was not the case for me.

I had to restart the whole machine and after restart the container was still there with status Dead.

The inspection of the dead container shows the following:

"State": {
    "Status": "dead",
    "Running": false,
    "Paused": false,
    "Restarting": false,
    "OOMKilled": false,
    "Dead": true,
    "Pid": 0,
    "ExitCode": 255,
    "Error": "",
    "StartedAt": "2023-07-18T06:52:21.535364308Z",
    "FinishedAt": "2023-07-31T11:19:59.586319264+02:00",
    "Health": {
        "Status": "unhealthy",
        "FailingStreak": 71,
        "Log": [
            {
                "Start": "2023-07-31T11:10:12.685367803+02:00",
                "End": "2023-07-31T11:10:53.982444816+02:00",
                "ExitCode": -1,
                "Output": "timed out starting health check for container 12dfd695fa58e71ebda2ea12b832a616bb10dbe84aa1e4e5fd72bbb6359833d3"
            },
            {
                "Start": "2023-07-31T11:13:25.124506079+02:00",
                "End": "2023-07-31T11:13:50.756552511+02:00",
                "ExitCode": -1,
                "Output": "cannot exec in a stopped state: unknown"
            },
            {
                "Start": "2023-07-31T11:14:20.915017447+02:00",
                "End": "2023-07-31T11:14:50.756872688+02:00",
                "ExitCode": -1,
                "Output": "cannot exec in a stopped state: unknown"
            },
            {
                "Start": "2023-07-31T11:15:33.030246325+02:00",
                "End": "2023-07-31T11:15:50.758445901+02:00",
                "ExitCode": -1,
                "Output": "cannot exec in a stopped state: unknown"
            },
            {
                "Start": "2023-07-31T11:16:20.799750496+02:00",
                "End": "2023-07-31T11:16:50.800662276+02:00",
                "ExitCode": -1,
                "Output": "timed out starting health check for container 12dfd695fa58e71ebda2ea12b832a616bb10dbe84aa1e4e5fd72bbb6359833d3"
            }
        ]
    }
}

Note the health log entries with output "cannot exec in a stopped state: unknown": the container was not stopped, its state was running (while unhealthy). Maybe, this could help?

The dead container can be removed like any other with docker rm.

Jul 31 '23 09:07 givanov2