containerd icon indicating copy to clipboard operation
containerd copied to clipboard

Add container event support to containerd

Open ruiwen-zhao opened this issue 3 years ago • 9 comments

This PR adds container event support (https://github.com/kubernetes/enhancements/issues/3386) to containerd, by implementing GetContainerEvents() CRI API that supports single-caller and returns all container events (i.e. not filtered by container ID or other filters) to the caller.

ruiwen-zhao avatar Jun 17 '22 18:06 ruiwen-zhao

Hi @ruiwen-zhao. Thanks for your PR.

I'm waiting for a containerd member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Jun 17 '22 18:06 k8s-ci-robot

/ok-to-test

samuelkarp avatar Jun 17 '22 18:06 samuelkarp

Some CI checks failed. Most of the failures seem irrelevant:

go: gopkg.in/[email protected]: unrecognized import path "gopkg.in/yaml.v2": reading https://gopkg.in/yaml.v2?go-get=1: 502 Bad Gateway
	server response: Cannot obtain refs from GitHub: error reading from GitHub: net/http: request canceled (Client.Timeout exceeded while reading body)
make: *** [integration] Error 1
Makefile:[21](https://github.com/containerd/containerd/runs/7813786735?check_suite_focus=true#step:9:22)3: recipe for target 'integration' failed

One failure on the new test added TestContainerEvents:

        	Error Trace:	/home/runner/work/containerd/containerd/container_event_test.go:49
        	Error:      	Received unexpected error:
        	            	assertContainerEventResponse: wrong sandbox name. Expected sandbox_foobar2, got sandbox
        	Test:       	TestContainerEvents

Looking.

ruiwen-zhao avatar Aug 12 '22 21:08 ruiwen-zhao

btw I am getting

+ run_crictl
+ /usr/local/bin/crictl --runtime-endpoint=unix:///run/containerd-test/containerd.sock info
FATA[0000] getting status of runtime failed: rpc error: code = Unimplemented desc = unknown service runtime.v1alpha2.RuntimeService

when running sudo make cri-integration. Looking into it as well.

Update: adding CONTAINERD_RUNTIME solves this issue ( sudo FOCUS=TestContainerEvents CONTAINERD_RUNTIME=io.containerd.runtime.v1.linux make cri-integration), but now I am seeing

time="2022-08-12T22:24:22.752372163Z" level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:sandbox_foobar2,Uid:52fdfc072182654f163f5f0f9a621d729566c74d10037c4d7bbb0407d1e2c649,Namespace:events-81855ad8681d0d86d1e91e00167939cb6694d2c422acd208a0072939487f6999,Attempt:0,} failed, error" error="failed to create containerd task: cgroups: cgroup mountpoint does not exist: unknown"

Still looking

ruiwen-zhao avatar Aug 12 '22 21:08 ruiwen-zhao

Update: adding CONTAINERD_RUNTIME solves this issue ( sudo FOCUS=TestContainerEvents CONTAINERD_RUNTIME=io.containerd.runtime.v1.linux make cri-integration)

Can you try io.containerd.runc.v2 (this should be the default) and io.containerd.runc.v1 as well?

, but now I am seeing "failed to create containerd task: cgroups: cgroup mountpoint does not exist: unknown"

Are you perhaps using a machine with cgroups v2 configured? I believe the v1 shim depends on cgroups v1.

samuelkarp avatar Aug 12 '22 22:08 samuelkarp

Can you try io.containerd.runc.v2 (this should be the default) and io.containerd.runc.v1 as well?

Thanks Samuel! sudo CONTAINERD_RUNTIME=io.containerd.runc.v2 make cri-integration works for me.

Also I have figured out the previous failure:

Error Trace:	/home/runner/work/containerd/containerd/container_event_test.go:49
        	Error:      	Received unexpected error:
        	            	assertContainerEventResponse: wrong sandbox name. Expected sandbox_foobar2, got sandbox
        	Test:       	TestContainerEvents

When running all tests together, all tests will make containerd generate events and send them to the channel. The events will be stuck in the channel because no one is reading from it. When container_event_test starts and reads from the channel, it will first read the events emitted by previous tests. So I added some logic to drain the event channel before asserting the event content.

ruiwen-zhao avatar Aug 22 '22 22:08 ruiwen-zhao

/retest

ruiwen-zhao avatar Aug 23 '22 02:08 ruiwen-zhao

/retest

ruiwen-zhao avatar Aug 23 '22 18:08 ruiwen-zhao

Rebased the PR against main branch, which has CRI changes now.

I also manually tested this change against https://github.com/harche/kubernetes/tree/evented_pleg_experimental, by deploying a simple pod, and can verify it works.

ruiwen-zhao avatar Sep 13 '22 21:09 ruiwen-zhao

@ruiwen-zhao how goes.. needs a rebase.

mikebrow avatar Oct 17 '22 16:10 mikebrow

Needs rebase.. and needs changes to pick up the merged pr from https://github.com/kubernetes/kubernetes/pull/111384

mikebrow avatar Nov 10 '22 15:11 mikebrow

Did the rebase.

For cri-api change, the v1alpha2 API has been removed in https://github.com/kubernetes/cri-api/commit/cd17af26e578534d864885e3c0f6f576119c3ad9, before the evented pleg related CRI change is merged. So I guess we need to remove the v1alpha2 dependency from containerd repo first.

Created https://github.com/containerd/containerd/pull/7656 to remove v1alpha2 dependency.

ruiwen-zhao avatar Nov 11 '22 00:11 ruiwen-zhao

Uh so sbserver tests are part of CI testing, and they are failing:

        	Test:       	TestContainerEvents
    asm_amd64.s:1594: 
        	Error Trace:	/home/runner/work/containerd/containerd/container_event_test.go:112
        	            				/home/runner/work/containerd/containerd/asm_amd64.s:1594
        	Error:      	Received unexpected error:
        	            	rpc error: code = Unimplemented desc = method GetContainerEvents not implemented

So i cannot leave out the sandbox server implementation.

ruiwen-zhao avatar Nov 21 '22 23:11 ruiwen-zhao

The three windows failures are all

mv: cannot remove 'C:/Windows/Temp/test-integration/containerd.log': Device or resource busy

The linux failure is

convert_test.go:50: failed to copy: httpReadSeeker: failed open: failed to do request: Get "https://production.cloudflare.docker.com/registry-v2/docker/registry/v2/blobs/sha256/6c/6c41b8874ee0854485419a5a94634aa66bc23a57c48fd903878825c0962af39b/data?verify=1669158604-LaKrXNpdRtYaT1TKeXDUH5wlOUg%3D": read tcp 10.1.0.101:40942->104.18.121.25:443: read: connection reset by peer

I dont think these are related to this change, so if I may:

/retest

ruiwen-zhao avatar Nov 22 '22 23:11 ruiwen-zhao

I've kicked off a rerun.

samuelkarp avatar Nov 23 '22 00:11 samuelkarp

waiting for confirmation that https://github.com/kubernetes/kubernetes/pull/114351 is merged into a k/k v1.26 release

mikebrow avatar Dec 08 '22 18:12 mikebrow

Updated the code to skip sending events with nil PodSandboxStatus. See the TODO issue (https://github.com/containerd/containerd/issues/7785) for context and next step.

ruiwen-zhao avatar Dec 08 '22 19:12 ruiwen-zhao

samuelkarp requested review from mikebrow

I hit the wrong button there, sorry!

samuelkarp avatar Dec 08 '22 22:12 samuelkarp

@fuweid and @samuelkarp expressed interest in reviewing

mikebrow avatar Dec 08 '22 23:12 mikebrow