runwasi icon indicating copy to clipboard operation
runwasi copied to clipboard

Fix `ci` parallel test issue

Open dierbei opened this issue 2 years ago • 11 comments

The current test looks like the following: https://github.com/containerd/runwasi/blob/7b2adf09dfc9d1ed99988bc83c71a4c58f236344/Makefile#L72-L82 https://github.com/containerd/runwasi/blob/7b2adf09dfc9d1ed99988bc83c71a4c58f236344/crates/containerd-shim-wasmedge/src/tests.rs#L71-L73

The use of --test-threads=1 and #[serial] ensures that tests do not interfere with each other.

We would like to remove --test-threads=1, #[serial] , and still have ci pass successfully.

dierbei avatar Sep 13 '23 08:09 dierbei

The main reason why the tests can't run in parallel is because when we create a container we modify the global state of the process (things like changing the process current directory).

This means that we can't run them in parallel if they all run in the same process.

I am happy with them having to run in serial, the main issue is that without --test-threads=1 they sporadically fail in CI, while I would have expected that using #[serial] would be enough.

If you really want them to run in parallel, you would have to make sure that each test runs in its own process. I coulnt'd find how to do that with cargo test, but it's possible with cargo nextest. Even then, I think there would be some extra work to get the test passing.

I didn't go further with nextest because:

  • when running test with --no-capture (to see their logs) nextest runs the test in parallel anyway.
  • it would bring in an extra dependency
  • the tests in serial don't take very long, I don't know if it's worth the effort.

I would still like to understand what causes the sporadic failures in CI when running without --test-threads=1. Since we are using #[serial], the tests should be running in serial anyway, I would like to understand how --test-threads interacts with our code.

jprendes avatar Sep 13 '23 08:09 jprendes

@jprendes After my testing, it fails after removing --test-threads=1.

serial does not seem to be fully guaranteed to pass the test.

Is this the case for you?

dierbei avatar Sep 13 '23 09:09 dierbei

Since I'm new to Container Runtime, I'd like to understand it step by step.

For time, I will develop it in my spare time.

dierbei avatar Sep 13 '23 09:09 dierbei

I see that in CI, but I can't reproduce locally. Can you reproduce locally?

jprendes avatar Sep 13 '23 09:09 jprendes

It happens to me in ci too.

dierbei avatar Sep 13 '23 10:09 dierbei

I also came across it during ci.

So far it looks like parallel testing seems to conflict, I'm not sure exactly, I still need to test it.

dierbei avatar Sep 14 '23 00:09 dierbei

@jprendes Here's some errors I've encountered that I don't quite understand at the moment.

[2023-09-14T03:35:33Z ERROR libcontainer::process::container_main_process] failed to close intermediate process receiver: failed unix syscalls
[2023-09-14T03:35:33Z DEBUG libcontainer::capabilities] reset all caps
[2023-09-14T03:35:33Z ERROR libcontainer::container::builder_impl] failed to run container process err=Channel(BaseChannelError(Nix(EBADF)))
[2023-09-14T03:35:33Z DEBUG libcontainer::capabilities] dropping bounding capabilities to Some({Kill, AuditWrite, NetBindService})
[2023-09-14T03:48:38Z DEBUG libcontainer::container::container] Save container status: Container { state: State { oci_version: "v1.0.2", id: "mp9yBu9u", status: Stopped, pid: Some(3386203), bundle: "/tmp/.tmp9yBu9u", annotations: Some({}), created: Some(2023-09-14T03:48:18.307455081Z), creator: Some(0), use_systemd: false, clean_up_intel_rdt_subdirectory: Some(false) }, root: "/tmp/.tmp9yBu9u/runwasi/test_namespace/mp9yBu9u" } in "/tmp/.tmp9yBu9u/runwasi/test_namespace/mp9yBu9u"
Error: error waiting for module to finish: timed out waiting on channel

dierbei avatar Sep 14 '23 03:09 dierbei

I thought about it, maybe we use K8s someday,

then we will also encounter some parallelism, what do you think?

dierbei avatar Sep 14 '23 03:09 dierbei

@jprendes I have a very confusing area where I use vscode to debug unit tests.

But the code doesn't seem to run here, do you know why that is?

image

https://github.com/containerd/runwasi/blob/f9f3d22606a2712f0fd9ee07366cc4a8d2d2a023/crates/containerd-shim-wasm/src/container/executor.rs#L43C1-L65C2

dierbei avatar Sep 14 '23 09:09 dierbei

hm... I haven't tried, but debugging here could be a bit tricky, as this code runs in a different process than the one the debugger starts. The initial process clones itself a few times as youki does the set up of the contianer.

Is there a flag you could use in gdb/lldb to tell it to also debug child processes?

jprendes avatar Sep 14 '23 09:09 jprendes

I've tried giving each unit test a random program name, but I'm still having some problems.

        let part = &dir.to_string_lossy().to_string()[7..];
        let instance = WasiInstance::new(part.to_string(), Some(&cfg));

I'll think about what else I can do.

dierbei avatar Sep 14 '23 11:09 dierbei