Add container event support to containerd
This PR adds container event support (https://github.com/kubernetes/enhancements/issues/3386) to containerd, by implementing GetContainerEvents() CRI API that supports single-caller and returns all container events (i.e. not filtered by container ID or other filters) to the caller.
Hi @ruiwen-zhao. Thanks for your PR.
I'm waiting for a containerd member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.
Once the patch is verified, the new status will be reflected by the ok-to-test label.
I understand the commands that are listed here.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
/ok-to-test
Some CI checks failed. Most of the failures seem irrelevant:
go: gopkg.in/[email protected]: unrecognized import path "gopkg.in/yaml.v2": reading https://gopkg.in/yaml.v2?go-get=1: 502 Bad Gateway
server response: Cannot obtain refs from GitHub: error reading from GitHub: net/http: request canceled (Client.Timeout exceeded while reading body)
make: *** [integration] Error 1
Makefile:[21](https://github.com/containerd/containerd/runs/7813786735?check_suite_focus=true#step:9:22)3: recipe for target 'integration' failed
One failure on the new test added TestContainerEvents:
Error Trace: /home/runner/work/containerd/containerd/container_event_test.go:49
Error: Received unexpected error:
assertContainerEventResponse: wrong sandbox name. Expected sandbox_foobar2, got sandbox
Test: TestContainerEvents
Looking.
btw I am getting
+ run_crictl
+ /usr/local/bin/crictl --runtime-endpoint=unix:///run/containerd-test/containerd.sock info
FATA[0000] getting status of runtime failed: rpc error: code = Unimplemented desc = unknown service runtime.v1alpha2.RuntimeService
when running sudo make cri-integration. Looking into it as well.
Update: adding CONTAINERD_RUNTIME solves this issue ( sudo FOCUS=TestContainerEvents CONTAINERD_RUNTIME=io.containerd.runtime.v1.linux make cri-integration), but now I am seeing
time="2022-08-12T22:24:22.752372163Z" level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:sandbox_foobar2,Uid:52fdfc072182654f163f5f0f9a621d729566c74d10037c4d7bbb0407d1e2c649,Namespace:events-81855ad8681d0d86d1e91e00167939cb6694d2c422acd208a0072939487f6999,Attempt:0,} failed, error" error="failed to create containerd task: cgroups: cgroup mountpoint does not exist: unknown"
Still looking
Update: adding CONTAINERD_RUNTIME solves this issue ( sudo FOCUS=TestContainerEvents CONTAINERD_RUNTIME=io.containerd.runtime.v1.linux make cri-integration)
Can you try io.containerd.runc.v2 (this should be the default) and io.containerd.runc.v1 as well?
, but now I am seeing
"failed to create containerd task: cgroups: cgroup mountpoint does not exist: unknown"
Are you perhaps using a machine with cgroups v2 configured? I believe the v1 shim depends on cgroups v1.
Can you try
io.containerd.runc.v2(this should be the default) andio.containerd.runc.v1as well?
Thanks Samuel! sudo CONTAINERD_RUNTIME=io.containerd.runc.v2 make cri-integration works for me.
Also I have figured out the previous failure:
Error Trace: /home/runner/work/containerd/containerd/container_event_test.go:49
Error: Received unexpected error:
assertContainerEventResponse: wrong sandbox name. Expected sandbox_foobar2, got sandbox
Test: TestContainerEvents
When running all tests together, all tests will make containerd generate events and send them to the channel. The events will be stuck in the channel because no one is reading from it. When container_event_test starts and reads from the channel, it will first read the events emitted by previous tests. So I added some logic to drain the event channel before asserting the event content.
/retest
/retest
Rebased the PR against main branch, which has CRI changes now.
I also manually tested this change against https://github.com/harche/kubernetes/tree/evented_pleg_experimental, by deploying a simple pod, and can verify it works.
@ruiwen-zhao how goes.. needs a rebase.
Needs rebase.. and needs changes to pick up the merged pr from https://github.com/kubernetes/kubernetes/pull/111384
Did the rebase.
For cri-api change, the v1alpha2 API has been removed in https://github.com/kubernetes/cri-api/commit/cd17af26e578534d864885e3c0f6f576119c3ad9, before the evented pleg related CRI change is merged. So I guess we need to remove the v1alpha2 dependency from containerd repo first.
Created https://github.com/containerd/containerd/pull/7656 to remove v1alpha2 dependency.
Uh so sbserver tests are part of CI testing, and they are failing:
Test: TestContainerEvents
asm_amd64.s:1594:
Error Trace: /home/runner/work/containerd/containerd/container_event_test.go:112
/home/runner/work/containerd/containerd/asm_amd64.s:1594
Error: Received unexpected error:
rpc error: code = Unimplemented desc = method GetContainerEvents not implemented
So i cannot leave out the sandbox server implementation.
The three windows failures are all
mv: cannot remove 'C:/Windows/Temp/test-integration/containerd.log': Device or resource busy
The linux failure is
convert_test.go:50: failed to copy: httpReadSeeker: failed open: failed to do request: Get "https://production.cloudflare.docker.com/registry-v2/docker/registry/v2/blobs/sha256/6c/6c41b8874ee0854485419a5a94634aa66bc23a57c48fd903878825c0962af39b/data?verify=1669158604-LaKrXNpdRtYaT1TKeXDUH5wlOUg%3D": read tcp 10.1.0.101:40942->104.18.121.25:443: read: connection reset by peer
I dont think these are related to this change, so if I may:
/retest
I've kicked off a rerun.
waiting for confirmation that https://github.com/kubernetes/kubernetes/pull/114351 is merged into a k/k v1.26 release
Updated the code to skip sending events with nil PodSandboxStatus. See the TODO issue (https://github.com/containerd/containerd/issues/7785) for context and next step.
samuelkarp requested review from mikebrow
I hit the wrong button there, sorry!
@fuweid and @samuelkarp expressed interest in reviewing