No healthy node available in the cluster. -- Issue with swarm
We are using swarm : 1.2.0 Steps to reproduce:
get 2 or 3 baremetal nodes of centos 7 run net_demo_installer script then use this document https://github.com/contiv/netplugin/blob/master/test/systemtests/How-to-Run.md to trigger our system-tests
Error log:
NFO[0428] Starting netmaster on swarm-baremetal-node1
INFO[0430] Starting netmaster on swarm-baremetal-node2
INFO[0437] Starting a container running "sleep 60m" on swarm-baremetal-node2
INFO[0437] Starting a container running "sleep 60m" on swarm-baremetal-node1
INFO[0437] cmd "docker run -itd --name=private-srv0-0-1 --net=private-srv0-0 contiv/alpine sleep 60m" failed: output below
INFO[0437] docker: Error response from daemon: No healthy node available in the cluster.
See 'docker run --help'.
INFO[0437] cmd "docker run -itd --name=private-srv0-1-0 --net=private-srv0-1 contiv/alpine sleep 60m" failed: output below
INFO[0437] docker: Error response from daemon: No healthy node available in the cluster.
See 'docker run --help'.
ERRO[0437] Container id "docker: Error response from daemon: No healthy node available in the cluster.\nSee 'docker run --help'." is invalid
ERRO[0437] Container id "docker: Error response from daemon: No healthy node available in the cluster.\nSee 'docker run --help'." is invalid
INFO[0439] ============================= systemtestSuite.TestPolicyBasicVXLAN completed ==========================
----------------------------------------------------------------------
FAIL: policy_test.go:13: systemtestSuite.TestPolicyBasicVXLAN
policy_test.go:14:
s.testPolicyBasic(c, "vxlan")
policy_test.go:89:
c.Assert(err, IsNil)
... value *ssh.ExitError = &ssh.ExitError{Waitmsg:ssh.Waitmsg{status:127, signal:"", msg:"", lang:""}} ("Process exited with: 127. Reason was: ()")
INFO[0439] Cleaning up containers on swarm-baremetal-node1
INFO[0439] Cleaning up containers on swarm-baremetal-node2
INFO[0440] Checking for errors on swarm-baremetal-node1
ERRO[0440] Errors in logfiles on swarm-baremetal-node1:
grep: /tmp/net*: No such file or directory
==========================
Netplugin Log : https://gist.github.com/gaurav-dalvi/7389da18f09677949707f825e3e17216
Netmaster Log : https://gist.github.com/gaurav-dalvi/d12556cd1a0aa145b323d5f7f6edd085
Duplicate of https://github.com/contiv/netplugin/issues/652?
https://gist.github.com/gaurav-dalvi/e308414fad9b29baeae5fd0bd21d4ab3
@jojimt @gkvijay : Could you please take a look ? I am seeing this issue for long time now. This happens only on baremetal / VM testing and not on Vagrant VMs.
Docker swarm 1.2.5
https://gist.github.com/gaurav-dalvi/5d51641ba4e43aad7d1ae1002ed8c3d4 netmaster logs
Please give the docker version and the output of 'docker info' from swarm.
its docker 1.11
Docker swarm output seemed to be fine. I dont have tht testbed to give it to you.
any update on this one @gkvijay
@gaurav-dalvi Please close this issue if you are not seeing it now
I was getting the same issue until I redeployed the swarm and increased number of agents to "2" and it worked for me !!!