runc can not wait process in the container exits when share pid namespace
Description
1.docker run with --pid=host 2.other processes in the container except the init process D live 3.docker rm -f $containerdID
Steps to reproduce the issue
Describe the results you received and expected
i received:
1.containerd-shim and init process reaped
2.container cgroup residue
What version of runc are you using?
[root@localhost ~]# runc --version runc version 1.1.3 commit: 02a436f4f2efd8c5a2ec5c4ed3d196242d4edb77 spec: 1.0.2-dev go: go1.17.3 libseccomp: 2.5.3
Host OS information
No response
Host kernel information
No response
I think I figured out why cgroup residue. when container run with --pid=host,runc delete will deal with process in signalAllProcesses func, because of one of container process(not init) is D status, that process will not exit.but int signalAllProcesses func, p.wait is invalid,it do not wait all processes exits. finnal, init process exit,but D status process not, containerd-shim process exit. cgroup cleanup failed and no more chance to clean up
I think I figured out why cgroup residue. when container run with --pid=host,runc delete will deal with process in signalAllProcesses func, because of one of container process(not init) is D status, that process will not exit.but int signalAllProcesses func, p.wait is invalid,it do not wait all processes exits. finnal, init process exit,but D status process not, containerd-shim process exit. cgroup cleanup failed and no more chance to clean up
So how should we make sure that all processes in the container have exited?
We have recently made some changes in that area (in particular, see https://github.com/opencontainers/runc/pull/4102). Plus, you are using a somewhat old version of runc (1.1.3), the latest one is 1.1.10.
I suggest you try a version compiled from HEAD (this is a future 1.2.0), and let us know if it fixes your problem.
Another thing is, one can only wait(2) for its own child, thus, say, runc delete or runc kill can not wait for any container processes, as they are not the children of this instance of runc.
And, if the process can't be killed because it is stuck in D state, there's nothing runc can do (except for returning an error which I think is happening after #4102).