runtime icon indicating copy to clipboard operation
runtime copied to clipboard

cri-o tests hangs randomly

Open jcvenegas opened this issue 8 years ago • 1 comments

Some crio-tests has beeing failing randomly in our CI leanding to a timeout.

Examples: http://cc-jenkins-ci.westus2.cloudapp.azure.com/job/clear-containers-runtime-azure-ubuntu-17-04/6/console

00:15:36.557 ok 11 ctr execsync conflicting with conmon flags parsing
00:20:36.557 Build timed out (after 5 minutes). Marking the build as aborted.
00:20:36.565 Build was aborted
00:20:36.627 Setting status of c1314eec2006a3eb85deae4768cc6739d0962d51 to FAILURE with url http://cc-jenkins-ci.westus2.cloudapp.azure.com/job/clear-containers-runtime-azure-ubuntu-17-04/6/ and message: 'Build finished. '
00:20:36.627 Using context: jenkins-ci-ubuntu-17-04
00:20:36.841 Finished: ABORTED

https://semaphoreci.com/clearcontainers/agent/branches/pull-request-98/builds/7

ok 14 ctr execsync failure
ok 15 ctr execsync exit code
ok 16 ctr execsync std{out,err}
execution expired

https://semaphoreci.com/clearcontainers/proxy/branches/master/builds/10

# time="2017-09-01 20:29:28.215535045Z" level=debug msg="unmounted container "3eaa48f6ff38c43be4aa94426054a9dce713c0b9e6412db96cd36dee4898543c"" 
# time="2017-09-01 20:29:28.265646348Z" level=debug msg="unmounted container "13b7967c8be2761ec02ccd5f63c1a10544445efdcfad804209a681256c729eee"" 
# time="2017-09-01 20:29:28.268698737Z" level=debug msg="RemovePodSandboxResponse &RemovePodSandboxResponse{}" 
# 13b7967c8be2761ec02ccd5f63c1a10544445efdcfad804209a681256c729eee
# time="2017-09-01 20:29:28.318542867Z" level=debug msg="StopPodSandboxRequest &StopPodSandboxRequest{PodSandboxId:86b4752ecb54e8f28542b5c16945764ab6f657ba51623d674e3381a3fcc507f1,}" 
# time="2017-09-01 20:29:28.318586296Z" level=info msg="Got netns path /var/run/netns/cni-5f297bf9-70c7-52d3-0452-dc812c53c1ef" 
# time="2017-09-01 20:29:28.318591222Z" level=info msg="Using netns path redhat.test.crio" 
# time="2017-09-01 20:29:28.318595673Z" level=info msg="About to run with conf.Network.Type=bridge" 
# time="2017-09-01 20:29:28.414921036Z" level=debug msg="event: "/var/run/crio/exits/544144978e2b1de30e58613816e28faa849a6fc5e4856316d9bdee60aeee5650.09Q45Y": RENAME" 
# time="2017-09-01 20:29:28.414953919Z" level=debug msg="event: "/var/run/crio/exits/544144978e2b1de30e58613816e28faa849a6fc5e4856316d9bdee60aeee5650": CREATE" 
# time="2017-09-01 20:29:28.414965809Z" level=debug msg="container or sandbox exited: 544144978e2b1de30e58613816e28faa849a6fc5e4856316d9bdee60aeee5650" 
# time="2017-09-01 20:29:28.414980279Z" level=debug msg="container exited and found: 544144978e2b1de30e58613816e28faa849a6fc5e4856316d9bdee60aeee5650" 
# time="2017-09-01 20:29:28.438641009Z" level=debug msg="unmounted container "544144978e2b1de30e58613816e28faa849a6fc5e4856316d9bdee60aeee5650"" 
# time="2017-09-01 20:29:28.495017409Z" level=debug msg="event: "/var/run/crio/exits/86b4752ecb54e8f28542b5c16945764ab6f657ba51623d674e3381a3fcc507f1.SVZ25Y": RENAME" 
# time="2017-09-01 20:29:28.495051718Z" level=debug msg="event: "/var/run/crio/exits/86b4752ecb54e8f28542b5c16945764ab6f657ba51623d674e3381a3fcc507f1": CREATE" 
# time="2017-09-01 20:29:28.495056648Z" level=debug msg="container or sandbox exited: 86b4752ecb54e8f28542b5c16945764ab6f657ba51623d674e3381a3fcc507f1" 
# time="2017-09-01 20:29:28.495072782Z" level=debug msg="sandbox exited and found: 86b4752ecb54e8f28542b5c16945764ab6f657ba51623d674e3381a3fcc507f1" 
# time="2017-09-01 20:29:28.607879609Z" level=debug msg="unmounted container "86b4752ecb54e8f28542b5c16945764ab6f657ba51623d674e3381a3fcc507f1"" 
# time="2017-09-01 20:29:28.607903021Z" level=debug msg="StopPodSandboxResponse 86b4752ecb54e8f28542b5c16945764ab6f657ba51623d674e3381a3fcc507f1: &StopPodSandboxResponse{}" 
# 86b4752ecb54e8f28542b5c16945764ab6f657ba51623d674e3381a3fcc507f1
# time="2017-09-01 20:29:28.657452445Z" level=debug msg="RemovePodSandboxRequest &RemovePodSandboxRequest{PodSandboxId:86b4752ecb54e8f28542b5c16945764ab6f657ba51623d674e3381a3fcc507f1,}" 
# removing the pod sandbox failed: rpc error: code = 2 desc = failed to delete container k8s_podsandbox1-redis_pod2_redhat.test.crio_redhat-test-crio_0 in pod sandbox 86b4752ecb54e8f28542b5c16945764ab6f657ba51623d674e3381a3fcc507f1: `/usr/local/bin/cc-runtime delete --force 544144978e2b1de30e58613816e28faa849a6fc5e4856316d9bdee60aeee5650` failed: Container ID does not exist
#   (exit status 1)
# rm: cannot remove ‘/tmp/tmp.cAspKH0YLV/crio/aufs/mnt/c681b76e20dc5a5e0b284571748a2a9a9ba28d71d32daa18d6522cfc2f4efdad’: Device or resource busy
# rm: cannot remove ‘/tmp/tmp.cAspKH0YLV/crio/aufs/diff/c681b76e20dc5a5e0b284571748a2a9a9ba28d71d32daa18d6522cfc2f4efdad’: Device or resource busy
# rm: cannot remove ‘/tmp/tmp.cAspKH0YLV/crio-run/aufs-containers/06d2d3bea8f39b3995a2c5617183d51749c66d16b3ff867d78041b765970ba99/userdata/shm’: Device or resource busy
ok 9 ctr list label filtering
....
execution expired

https://semaphoreci.com/james-o-hunt/runtime/branches/update-dev-guide/builds/1

# time="2017-09-04 08:53:55.977124242Z" level=debug msg="container "5060443c46b81af744c96f57310151adc38700dc4cd5e67d1649af2249b21cb9" has work directory "/tmp/tmp.SYVduqRwqV/crio/aufs-containers/5060443c46b81af744c96f57310151adc38700dc4cd5e67d1649af2249b21cb9/userdata"" 
# time="2017-09-04 08:53:55.977167031Z" level=debug msg="container "5060443c46b81af744c96f57310151adc38700dc4cd5e67d1649af2249b21cb9" has run directory "/tmp/tmp.SYVduqRwqV/crio-run/aufs-containers/5060443c46b81af744c96f57310151adc38700dc4cd5e67d1649af2249b21cb9/userdata"" 
# time="2017-09-04 08:53:55.978703948Z" level=debug msg="mounted container "5060443c46b81af744c96f57310151adc38700dc4cd5e67d1649af2249b21cb9" at "/tmp/tmp.SYVduqRwqV/crio/aufs/mnt/b9f9bce0e4ed24f925db42d68ad823354ecbc39d1bbeaf51036a9fe42871c35d"" 
# time="2017-09-04 08:53:55.978753675Z" level=debug msg="OCI process args [docker-entrypoint.sh docker-entrypoint.sh redis-server]" 
# time="2017-09-04 08:53:55.978768159Z" level=debug msg="CONTAINER USER: " 
# time="2017-09-04 08:53:55.978905935Z" level=debug msg="UID: 0, GID: 0, Groups: [0 1 2 3 4 6 10 11 20 26 27]" 
# time="2017-09-04 08:53:55.979299782Z" level=debug msg="running conmon: /home/runner/workspace/src/github.com/kubernetes-incubator/cri-o/conmon/conmon" args=[-c 5060443c46b81af744c96f57310151adc38700dc4cd5e67d1649af2249b21cb9 -u 5060443c46b81af744c96f57310151adc38700dc4cd5e67d1649af2249b21cb9 -r /usr/local/bin/cc-runtime -b /tmp/tmp.SYVduqRwqV/crio-run/aufs-containers/5060443c46b81af744c96f57310151adc38700dc4cd5e67d1649af2249b21cb9/userdata -p /tmp/tmp.SYVduqRwqV/crio-run/aufs-containers/5060443c46b81af744c96f57310151adc38700dc4cd5e67d1649af2249b21cb9/userdata/pidfile -l /var/log/crio/pods/5b93ffad586b45c5cfda64534b557900e7293586508dd70bb7cf64408cf1b3a6/5060443c46b81af744c96f57310151adc38700dc4cd5e67d1649af2249b21cb9.log --exit-dir /var/run/crio/exits] 
# time="2017-09-04 08:53:55.991012812Z" level=debug msg="Received container pid: 15600" 
# time="2017-09-04 08:53:55.998568386Z" level=debug msg="CreateContainerResponse: &CreateContainerResponse{ContainerId:5060443c46b81af744c96f57310151adc38700dc4cd5e67d1649af2249b21cb9,}" 
# 5060443c46b81af744c96f57310151adc38700dc4cd5e67d1649af2249b21cb9
# time="2017-09-04 08:53:56.053205290Z" level=debug msg="StartContainerRequest &StartContainerRequest{ContainerId:5060443c46b81af744c96f57310151adc38700dc4cd5e67d1649af2249b21cb9,}" 
# time="2017-09-04 08:53:56.154114879Z" level=debug msg="StartContainerResponse &StartContainerResponse{}" 
# 5060443c46b81af744c96f57310151adc38700dc4cd5e67d1649af2249b21cb9
# time="2017-09-04 08:53:56.226971951Z" level=debug msg="ExecSyncRequest &ExecSyncRequest{ContainerId:5060443c46b81af744c96f57310151adc38700dc4cd5e67d1649af2249b21cb9,Cmd:[echo HELLO],Timeout:0,}" 
# time="2017-09-04 08:53:56.303692136Z" level=info msg="Received container exit code: 0, message: " 
# time="2017-09-04 08:53:56.303806019Z" level=debug msg="ExecSyncResponse: &ExecSyncResponse{Stdout:[72 69 76 76 79 10],Stderr:[],ExitCode:0,}" 
# Stdout:
# HELLO
# 
# Stderr:
# 
# Exit code: 0
# time="2017-09-04 08:53:56.359246281Z" level=debug msg="ExecSyncRequest &ExecSyncRequest{ContainerId:5060443c46b81af744c96f57310151adc38700dc4cd5e67d1649af2249b21cb9,Cmd:[sleep 10],Timeout:1,}" 
# time="2017-09-04 08:53:58.005346118Z" level=info msg="Received container exit code: -1, message: command timed out" 
# execing command in container failed: rpc error: code = 2 desc = command error: command timed out, stdout: , stderr: , exit code -1
# time="2017-09-04 08:53:58.059741347Z" level=debug msg="StopPodSandboxRequest &StopPodSandboxRequest{PodSandboxId:5b93ffad586b45c5cfda64534b557900e7293586508dd70bb7cf64408cf1b3a6,}" 
# time="2017-09-04 08:53:58.059792089Z" level=info msg="Got netns path /var/run/netns/cni-91a0d9c1-3a3d-7fd7-3f1b-4d5a076bb0e3" 
# time="2017-09-04 08:53:58.059797141Z" level=info msg="Using netns path redhat.test.crio" 
# time="2017-09-04 08:53:58.059802007Z" level=info msg="About to run with conf.Network.Type=bridge" 
# time="2017-09-04 08:53:58.599426146Z" level=debug msg="event: "/var/run/crio/exits/5060443c46b81af744c96f57310151adc38700dc4cd5e67d1649af2249b21cb9.02RO5Y": CREATE" 
# time="2017-09-04 08:53:58.599449269Z" level=debug msg="container or sandbox exited: 5060443c46b81af744c96f57310151adc38700dc4cd5e67d1649af2249b21cb9.02RO5Y" 
# time="2017-09-04 08:53:58.599660840Z" level=debug msg="event: "/var/run/crio/exits/5b93ffad586b45c5cfda64534b557900e7293586508dd70bb7cf64408cf1b3a6.X4RO5Y": RENAME" 
# time="2017-09-04 08:53:58.599693987Z" level=debug msg="event: "/var/run/crio/exits/5b93ffad586b45c5cfda64534b557900e7293586508dd70bb7cf64408cf1b3a6": CREATE" 
# time="2017-09-04 08:53:58.599702770Z" level=debug msg="container or sandbox exited: 5b93ffad586b45c5cfda64534b557900e7293586508dd70bb7cf64408cf1b3a6" 
# time="2017-09-04 08:53:58.599730785Z" level=debug msg="sandbox exited and found: 5b93ffad586b45c5cfda64534b557900e7293586508dd70bb7cf64408cf1b3a6" 
# time="2017-09-04 08:53:58.615029824Z" level=debug msg="event: "/var/run/crio/exits/5060443c46b81af744c96f57310151adc38700dc4cd5e67d1649af2249b21cb9.02RO5Y": RENAME" 
# time="2017-09-04 08:53:58.615180126Z" level=debug msg="event: "/var/run/crio/exits/5060443c46b81af744c96f57310151adc38700dc4cd5e67d1649af2249b21cb9": CREATE" 
# time="2017-09-04 08:53:58.615193743Z" level=debug msg="container or sandbox exited: 5060443c46b81af744c96f57310151adc38700dc4cd5e67d1649af2249b21cb9" 
# time="2017-09-04 08:53:58.615223380Z" level=debug msg="container exited and found: 5060443c46b81af744c96f57310151adc38700dc4cd5e67d1649af2249b21cb9" 
# time="2017-09-04 08:53:58.664182227Z" level=debug msg="unmounted container "5060443c46b81af744c96f57310151adc38700dc4cd5e67d1649af2249b21cb9"" 
# time="2017-09-04 08:53:58.713855405Z" level=debug msg="unmounted container "5b93ffad586b45c5cfda64534b557900e7293586508dd70bb7cf64408cf1b3a6"" 
# time="2017-09-04 08:53:58.713882851Z" level=debug msg="StopPodSandboxResponse 5b93ffad586b45c5cfda64534b557900e7293586508dd70bb7cf64408cf1b3a6: &StopPodSandboxResponse{}" 
# 5b93ffad586b45c5cfda64534b557900e7293586508dd70bb7cf64408cf1b3a6
# time="2017-09-04 08:53:58.764366082Z" level=debug msg="RemovePodSandboxRequest &RemovePodSandboxRequest{PodSandboxId:5b93ffad586b45c5cfda64534b557900e7293586508dd70bb7cf64408cf1b3a6,}" 
# removing the pod sandbox failed: rpc error: code = 2 desc = failed to delete container k8s_podsandbox1-redis_podsandbox1_redhat.test.crio_redhat-test-crio_0 in pod sandbox 5b93ffad586b45c5cfda64534b557900e7293586508dd70bb7cf64408cf1b3a6: `/usr/local/bin/cc-runtime delete --force 5060443c46b81af744c96f57310151adc38700dc4cd5e67d1649af2249b21cb9` failed: Container ID does not exist
#   (exit status 1)
# rm: cannot remove ‘/tmp/tmp.SYVduqRwqV/crio/aufs/diff/9a20900e83a8128eb5ff6aecf5e7ea1413f41dd3aed33296f9bd912c689d94a9’: Device or resource busy
# rm: cannot remove ‘/tmp/tmp.SYVduqRwqV/crio/aufs/diff/2059d805c90e034cb773d9722232ef018a72143dd31113b470fb876baeccd700’: Device or resource busy
# rm: cannot remove ‘/tmp/tmp.SYVduqRwqV/crio/aufs/diff/b9f9bce0e4ed24f925db42d68ad823354ecbc39d1bbeaf51036a9fe42871c35d’: Device or resource busy
# rm: cannot remove ‘/tmp/tmp.SYVduqRwqV/crio/aufs/diff/506cca7d37fc37f0140102c4f93d6c2d8f18639a243e3dcb1699bae18e6d16cb’: Device or resource busy
# rm: cannot remove ‘/tmp/tmp.SYVduqRwqV/crio/aufs/diff/9b0fd8e044dc410955882c4c52ef03a6aae081d9236477f1075e61f0265fe90d’: Device or resource busy
# rm: cannot remove ‘/tmp/tmp.SYVduqRwqV/crio/aufs/diff/66417f5f79db00657dc3397ddd812bb65d0a7c7ed906884afb5d5e6ec264455b’: Device or resource busy
# rm: cannot remove ‘/tmp/tmp.SYVduqRwqV/crio/aufs/diff/a3bf1306518a32448efedb8a511a5f8b378ad8ede3d04c9c0f4e4dc458f1bbcc’: Device or resource busy
# rm: cannot remove ‘/tmp/tmp.SYVduqRwqV/crio/aufs/diff/4400655bee3bb180ec2d8a715ad6835330352206b53ed2ff12bf0522bc07ecdf’: Device or resource busy
# rm: cannot remove ‘/tmp/tmp.SYVduqRwqV/crio/aufs/diff/493953455f301f6609dca0a2b0a70cd9f49d2e87c5bcabbdaa40877e00cab69d’: Device or resource busy
ok 13 ctr device add
ok 14 ctr execsync failure
ok 15 ctr execsync exit code
ok 16 ctr execsync std{out,err}
execution expired

jcvenegas avatar Sep 05 '17 13:09 jcvenegas

This is causing a lot of lost time due to having to re-run the CI (30-40 minutes each run).

jodh-intel avatar Sep 20 '17 15:09 jodh-intel