runc icon indicating copy to clipboard operation
runc copied to clipboard

execfifo maybe be blocked all the time if containerd is killed in abnomal scenario

Open kamizjw opened this issue 3 years ago • 6 comments

Openat will be blocked until the fifo on the other side in opened, but in some

abnomal scenario(e.g. containerd is killed), Openat maybe be blocked all the time.

Signed-off-by: zhongjiawei [email protected]

kamizjw avatar Sep 06 '22 11:09 kamizjw

How can we test this?

AkihiroSuda avatar Sep 07 '22 10:09 AkihiroSuda

How can we test this?

kamizjw avatar Sep 13 '22 01:09 kamizjw

OK, so this is a breaking change.

Currently, runc start can be called at any time after runc create. We don't know if there are any users that rely on the current behavior.

Overall, I think, something like this should be done, but with a change to documentation and an ability to revert to an old behavior, or making the old behavior the default, and the new one optional (say add --timeout flag for runc create`).

WDYT @AkihiroSuda @cyphar ?

kolyshkin avatar Sep 14 '22 21:09 kolyshkin

add timeout option is good idea for any scenario

kamizjw avatar Sep 15 '22 01:09 kamizjw

The timeout can be option. But it might be hard to integrate with existing system. Currently, many scenarios has timeout during the RPC call. IMO, if we can make sure that there is always record (runC state), the containerd/CRI-O/other components can cleanup it. Just my two cents.

fuweid avatar Sep 15 '22 02:09 fuweid

NACK. The semantics of runc create and runc start are defined by the spec. There is no mechanism for us to have a timeout (nor should there be one built-in even if it's configurable and opt-in IMHO -- it is trivial for the runc user to implement a timeout by calling runc start after the desired timeout period has elapsed). It seems like there is a containerd bug which this is attempting to fix -- I humbly suggest fixing the issue in containerd.

If containerd wants to avoid this entirely, they can use runc run but I suspect they're using the create+start mechanism the way it is intended (to do various operations on the container before the main process is properly started). As such, it is possible for some operation to take longer than expected, and a timeout would result in the main process starting before the container is ready.

cyphar avatar Sep 19 '22 05:09 cyphar