runc icon indicating copy to clipboard operation
runc copied to clipboard

update state earlier to avoid cgroup leak when process failed

Open kamizjw opened this issue 3 years ago • 6 comments

update state earlier to avoid cgroup leak when process failed if process stuck in somewhere. upper caller like containerd may have a timeout for process launching.

process will be killed after this timeout, and then call runc delete to retrieve its resource like cgroup and perform poststop hook.

if process got stuck right before updating state, and after cgroup applied, like prestart-hook. In such case, runc delete xxx will do nothing because state file is missing, runc is not aware of this container. so process cgroup will stay and never get removed.

Signed-off-by: zhongjiawei [email protected]

kamizjw avatar Sep 06 '22 12:09 kamizjw

@kamizjw thank you for your PR. Do you think it's possible to create a test case for this issue?

kolyshkin avatar Sep 06 '22 19:09 kolyshkin

@kamizjw thank you for your PR. Do you think it's possible to create a test case for this issue?

Does constructing a scenario verification that exits unexpectedly cause leakage?

kamizjw avatar Sep 07 '22 01:09 kamizjw

@kamizjw thank you for your PR. Do you think it's possible to create a test case for this issue?

Does constructing a scenario verification that exits unexpectedly cause leakage?

Something like this. Maybe you can use prestart-hook to do that.

Having a test case as part of the PR would be very nice.

kolyshkin avatar Sep 08 '22 01:09 kolyshkin

@kamizjw thank you for your PR. Do you think it's possible to create a test case for this issue?

Does constructing a scenario verification that exits unexpectedly cause leakage?

Something like this. Maybe you can use prestart-hook to do that.

Having a test case as part of the PR would be very nice.

I'm not good at writing go test cases, can you help me write one?thank you very much

kamizjw avatar Sep 13 '22 01:09 kamizjw

I'm not good at writing go test cases, can you help me write one?thank you very much

Write any kind of reproducer for the issue you're solving with this patch. This can be a script, or a text description of what steps to take. Once we have it, I'll help you to convert a reproducer to the test case.

kolyshkin avatar Sep 14 '22 19:09 kolyshkin

I'm not good at writing go test cases, can you help me write one?thank you very much

Write any kind of reproducer for the issue you're solving with this patch. This can be a script, or a text description of what steps to take. Once we have it, I'll help you to convert a reproducer to the test case.

1.Prepare a script to execute in the prestart-hook phase, which takes a while to execute 2.Configure the execution of this script under the prestart option of the configuration file 3.Make sure that the runc delete command is executed when runc runs the prestart-hook

kamizjw avatar Sep 15 '22 01:09 kamizjw

update state earlier to avoid cgroup leak when process failed if process stuck in somewhere. upper caller like containerd may have a timeout for process launching.

Perhaps the best approach to avoid this situation is to specify the hook timeout for the prestart hook, and make sure it is less than the internal containerd timeout.

kolyshkin avatar Sep 29 '22 01:09 kolyshkin

hook timeout

update state earlier to avoid cgroup leak when process failed if process stuck in somewhere. upper caller like containerd may have a timeout for process launching.

Perhaps the best approach to avoid this situation is to specify the hook timeout for the prestart hook, and make sure it is less than the internal containerd timeout.

this needs to consider different application scenarios, and updating state.json immediately is the most concise way

kamizjw avatar Sep 29 '22 02:09 kamizjw