IP allocation was seeded by different peers error when joining nodes
What you expected to happen?
When joining nodes to a cluster while weave is running as a Daemonset, the new weave pod should connect to other nodes.
What happened?
INFO: 2021/07/20 19:58:46.261622 ->[172.31.5.186:6783|a6:66:22:93:29:51(ip-172-31-5-186)]: connection shutting down due to error: IP allocation was seeded by different peers (received: [a6:66:22:93:29:51(ip-172-31-5-186)], ours: [8a:d2:32:87:dc:db(ip-172-31-6-167)])
How to reproduce it?
Use weave 2.7.1+ installed as a DaemonSet then join a new node with kubeadm. It happens sometimes but not always. It never happens on 2.6.5.
Anything else we need to know?
Versions:
$ weave version
2.8.1
$ containerd version
1.4.4
$ uname -a
Linux ip-172-31-5-186 5.4.0-1041-aws #43-Ubuntu SMP Fri Mar 19 22:06:16 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.10", GitCommit:"98d5dc5d36d34a7ee13368a7893dcb400ec4e566", GitTreeState:"clean", BuildDate:"2021-04-15T03:28:42Z", GoVersion:"go1.15.10", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.10", GitCommit:"98d5dc5d36d34a7ee13368a7893dcb400ec4e566", GitTreeState:"clean", BuildDate:"2021-04-15T03:20:25Z", GoVersion:"go1.15.10", Compiler:"gc", Platform:"linux/amd64"}
Logs:
$ docker logs weave
or, if using Kubernetes:
$ kubectl logs -n kube-system <weave-net-pod> weave
INFO: 2021/07/20 19:58:44.533248 Command line options: map[conn-limit:200 datapath:datapath db-prefix:/weavedb/weave-net docker-api: expect-npc:true http-addr:127.0.0.1:6784 ipalloc-init:consensus=1 ipalloc-range:10.32.0.0/20 log-level:info metrics-addr:0.0.0.0:6782 name:8a:d2:32:87:dc:db nickname:ip-172-31-6-167 no-dns:true no-masq-local:true port:6783]
INFO: 2021/07/20 19:58:44.533303 weave 2.8.1
INFO: 2021/07/20 19:58:45.716752 Bridge type is bridged_fastdp
INFO: 2021/07/20 19:58:45.716782 Communication between peers via untrusted networks is encrypted.
INFO: 2021/07/20 19:58:46.006785 Our name is 8a:d2:32:87:dc:db(ip-172-31-6-167)
INFO: 2021/07/20 19:58:46.006842 Launch detected - using supplied peer list: [172.31.5.186]
INFO: 2021/07/20 19:58:46.006867 Using "no-masq-local" LocalRangeTracker
INFO: 2021/07/20 19:58:46.006873 Checking for pre-existing addresses on weave bridge
INFO: 2021/07/20 19:58:46.007890 [allocator 8a:d2:32:87:dc:db] No valid persisted data
INFO: 2021/07/20 19:58:46.013141 [allocator 8a:d2:32:87:dc:db] Initialising via deferred consensus
INFO: 2021/07/20 19:58:46.013209 Sniffing traffic on datapath (via ODP)
INFO: 2021/07/20 19:58:46.013668 ->[172.31.5.186:6783] attempting connection
INFO: 2021/07/20 19:58:46.016169 Listening for HTTP control messages on 127.0.0.1:6784
INFO: 2021/07/20 19:58:46.016204 Listening for metrics requests on 0.0.0.0:6782
INFO: 2021/07/20 19:58:46.017452 ->[172.31.5.186:6783|a6:66:22:93:29:51(ip-172-31-5-186)]: connection ready; using protocol version 2
INFO: 2021/07/20 19:58:46.017648 overlay_switch ->[a6:66:22:93:29:51(ip-172-31-5-186)] using fastdp
INFO: 2021/07/20 19:58:46.017685 ->[172.31.5.186:6783|a6:66:22:93:29:51(ip-172-31-5-186)]: connection added (new peer)
INFO: 2021/07/20 19:58:46.017720 Setting up IPsec between 8a:d2:32:87:dc:db(ip-172-31-6-167) and a6:66:22:93:29:51(ip-172-31-5-186)
INFO: 2021/07/20 19:58:46.017804 ipsec: InitSALocal: 172.31.5.186 -> 172.31.6.167 :6784 0x7328eb30
INFO: 2021/07/20 19:58:46.244095 adding entry 10.32.0.0/20 to weaver-no-masq-local of 0
INFO: 2021/07/20 19:58:46.244124 added entry 10.32.0.0/20 to weaver-no-masq-local of 0
INFO: 2021/07/20 19:58:46.261622 ->[172.31.5.186:6783|a6:66:22:93:29:51(ip-172-31-5-186)]: connection shutting down due to error: IP allocation was seeded by different peers (received: [a6:66:22:93:29:51(ip-172-31-5-186)], ours: [8a:d2:32:87:dc:db(ip-172-31-6-167)])
INFO: 2021/07/20 19:58:46.261707 ->[172.31.5.186:6783|a6:66:22:93:29:51(ip-172-31-5-186)]: connection deleted
$ kubectl get events
22m Normal Scheduled pod/weave-net-wpt77 Successfully assigned kube-system/weave-net-wpt77 to ip-172-31-6-167
22m Warning FailedMount pod/weave-net-wpt77 MountVolume.SetUp failed for volume "weave-net-token-kt2w5" : failed to sync secret cache: timed out waiting for the condition
Adding logs from the first weave pod in the daemonset
$ kubectl -n kube-system get pods -l name=weave-net
NAME READY STATUS RESTARTS AGE
weave-net-wpt77 2/2 Running 0 37m
weave-net-zrcnp 2/2 Running 1 52m
$ kubectl -n kube-system logs weave-net-zrcnp -c weave
DEBU: 2021/07/20 19:42:59.239039 [kube-peers] Checking peer "a6:66:22:93:29:51" against list &{[]}
Peer not in list; removing persisted data
INFO: 2021/07/20 19:42:59.330218 Command line options: map[conn-limit:200 datapath:datapath db-prefix:/weavedb/weave-net docker-api: expect-npc:true http-addr:127.0.0.1:6784 ipalloc-init:consensus=0 ipalloc-range:10.32.0.0/20 log-level:info metrics-addr:0.0.0.0:6782 name:a6:66:22:93:29:51 nickname:ip-172-31-5-186 no-dns:true no-masq-local:true port:6783]
INFO: 2021/07/20 19:42:59.330256 weave 2.8.1
INFO: 2021/07/20 19:43:00.189071 Bridge type is bridged_fastdp
INFO: 2021/07/20 19:43:00.189094 Communication between peers via untrusted networks is encrypted.
INFO: 2021/07/20 19:43:00.412553 Our name is a6:66:22:93:29:51(ip-172-31-5-186)
INFO: 2021/07/20 19:43:00.412593 Launch detected - using supplied peer list: []
INFO: 2021/07/20 19:43:00.412617 Using "no-masq-local" LocalRangeTracker
INFO: 2021/07/20 19:43:00.412623 Checking for pre-existing addresses on weave bridge
INFO: 2021/07/20 19:43:00.413570 [allocator a6:66:22:93:29:51] No valid persisted data
INFO: 2021/07/20 19:43:00.430737 [allocator a6:66:22:93:29:51] Initialising via deferred consensus
INFO: 2021/07/20 19:43:00.430821 Sniffing traffic on datapath (via ODP)
INFO: 2021/07/20 19:43:00.434519 Listening for HTTP control messages on 127.0.0.1:6784
INFO: 2021/07/20 19:43:00.434555 Listening for metrics requests on 0.0.0.0:6782
INFO: 2021/07/20 19:43:01.403816 [kube-peers] Added myself to peer list &{[{a6:66:22:93:29:51 ip-172-31-5-186}]}
DEBU: 2021/07/20 19:43:01.412212 [kube-peers] Nodes that have disappeared: map[]
INFO: 2021/07/20 19:43:01.423916 Assuming quorum size of 1
INFO: 2021/07/20 19:43:01.424012 adding entry 10.32.0.0/20 to weaver-no-masq-local of 0
INFO: 2021/07/20 19:43:01.424035 added entry 10.32.0.0/20 to weaver-no-masq-local of 0
10.32.0.1
DEBU: 2021/07/20 19:43:01.548065 registering for updates for node delete events
WARN: 2021/07/20 19:43:17.002603 [allocator]: Delete: no addresses for 6228433a122cae07ecadcdd0c15a0d756647bee66df0280b163bca9fc9adede9
WARN: 2021/07/20 19:43:17.061381 [allocator]: Delete: no addresses for ade7f32d47e420177d611ccb26b64744e3258e8443452ce0ab2c5c0c33d4eb84
WARN: 2021/07/20 19:43:18.414037 [allocator]: Delete: no addresses for 6228433a122cae07ecadcdd0c15a0d756647bee66df0280b163bca9fc9adede9
WARN: 2021/07/20 19:43:18.429535 [allocator]: Delete: no addresses for ade7f32d47e420177d611ccb26b64744e3258e8443452ce0ab2c5c0c33d4eb84
WARN: 2021/07/20 19:43:30.178501 [allocator]: Delete: no addresses for ddc56d0ae7e2c549b8d8a75f0cf5033479510350868b2309eade68b4b39c261c
WARN: 2021/07/20 19:43:30.306579 [allocator]: Delete: no addresses for ddc56d0ae7e2c549b8d8a75f0cf5033479510350868b2309eade68b4b39c261c
WARN: 2021/07/20 19:43:46.149578 [allocator]: Delete: no addresses for 6228433a122cae07ecadcdd0c15a0d756647bee66df0280b163bca9fc9adede9
WARN: 2021/07/20 19:43:46.341428 [allocator]: Delete: no addresses for ddc56d0ae7e2c549b8d8a75f0cf5033479510350868b2309eade68b4b39c261c
WARN: 2021/07/20 19:43:46.522150 [allocator]: Delete: no addresses for ade7f32d47e420177d611ccb26b64744e3258e8443452ce0ab2c5c0c33d4eb84
WARN: 2021/07/20 19:44:20.223305 [allocator]: Delete: no addresses for d8391bd1ae52207565b5e7988076d67ebd987c8b3854f4ccae722dcfd027de37
WARN: 2021/07/20 19:44:58.455047 [allocator]: Delete: no addresses for c855331c281c9f384687f998fa97d2a8698d4fa8ebf0b8e79fd765fe1e6012b7
WARN: 2021/07/20 19:45:02.294243 [allocator]: Delete: no addresses for 06fcd42a50aa2996f68edd9f0f09a2aaf8c307fe05217db36779cee9fcc30529
WARN: 2021/07/20 19:45:02.310922 [allocator]: Delete: no addresses for 350becfc49098128281097bdb8e57608510380eb39dc3dba90535145603812bc
WARN: 2021/07/20 19:45:46.725664 [allocator]: Delete: no addresses for 06fcd42a50aa2996f68edd9f0f09a2aaf8c307fe05217db36779cee9fcc30529
WARN: 2021/07/20 19:45:46.939546 [allocator]: Delete: no addresses for 350becfc49098128281097bdb8e57608510380eb39dc3dba90535145603812bc
INFO: 2021/07/20 19:51:53.047391 ->[172.31.6.167:47596] connection accepted
INFO: 2021/07/20 19:51:53.048531 ->[172.31.6.167:47596] connection shutting down due to error during handshake: failed to receive remote protocol header: EOF
INFO: 2021/07/20 19:58:45.999242 ->[172.31.6.167:36989] connection accepted
INFO: 2021/07/20 19:58:46.001957 ->[172.31.6.167:36989|8a:d2:32:87:dc:db(ip-172-31-6-167)]: connection ready; using protocol version 2
INFO: 2021/07/20 19:58:46.003011 overlay_switch ->[8a:d2:32:87:dc:db(ip-172-31-6-167)] using fastdp
INFO: 2021/07/20 19:58:46.003061 ->[172.31.6.167:36989|8a:d2:32:87:dc:db(ip-172-31-6-167)]: connection added (new peer)
INFO: 2021/07/20 19:58:46.003426 Setting up IPsec between a6:66:22:93:29:51(ip-172-31-5-186) and 8a:d2:32:87:dc:db(ip-172-31-6-167)
INFO: 2021/07/20 19:58:46.003798 ipsec: InitSALocal: 172.31.6.167 -> 172.31.5.186 :6784 0x96066430
INFO: 2021/07/20 19:58:46.128415 fastdp ->[172.31.6.167:6784|8a:d2:32:87:dc:db(ip-172-31-6-167)]: IPSec init SA remote
INFO: 2021/07/20 19:58:46.128448 ipsec: InitSARemote: 172.31.5.186 -> 172.31.6.167 :6784 0x7328eb30
INFO: 2021/07/20 19:58:46.131741 overlay_switch ->[8a:d2:32:87:dc:db(ip-172-31-6-167)] using sleeve
INFO: 2021/07/20 19:58:46.131776 ->[172.31.6.167:36989|8a:d2:32:87:dc:db(ip-172-31-6-167)]: connection fully established
INFO: 2021/07/20 19:58:46.132637 sleeve ->[172.31.6.167:6783|8a:d2:32:87:dc:db(ip-172-31-6-167)]: Effective MTU verified at 8915
INFO: 2021/07/20 19:58:46.246786 ->[172.31.6.167:36989|8a:d2:32:87:dc:db(ip-172-31-6-167)]: connection shutting down due to error: read tcp 172.31.5.186:6783->172.31.6.167:36989: read: connection reset by peer
INFO: 2021/07/20 19:58:46.246857 ->[172.31.6.167:36989|8a:d2:32:87:dc:db(ip-172-31-6-167)]: connection deleted
INFO: 2021/07/20 19:58:46.246899 Removed unreachable peer 8a:d2:32:87:dc:db(ip-172-31-6-167)
INFO: 2021/07/20 19:58:46.246961 Destroying IPsec between a6:66:22:93:29:51(ip-172-31-5-186) and 8a:d2:32:87:dc:db(ip-172-31-6-167)
INFO: 2021/07/20 19:58:46.246975 ipsec: destroy: in 172.31.6.167 -> 172.31.5.186 0x96066430
INFO: 2021/07/20 19:58:46.320697 ipsec: destroy: out 172.31.5.186 -> 172.31.6.167 0x7328eb30
INFO: 2021/07/20 19:58:48.548774 ->[172.31.6.167:45403] connection accepted
INFO: 2021/07/20 19:58:48.549796 ->[172.31.6.167:45403|8a:d2:32:87:dc:db(ip-172-31-6-167)]: connection ready; using protocol version 2
INFO: 2021/07/20 19:58:48.550343 overlay_switch ->[8a:d2:32:87:dc:db(ip-172-31-6-167)] using fastdp
INFO: 2021/07/20 19:58:48.550463 ->[172.31.6.167:45403|8a:d2:32:87:dc:db(ip-172-31-6-167)]: connection added (new peer)
INFO: 2021/07/20 19:58:48.550504 Setting up IPsec between a6:66:22:93:29:51(ip-172-31-5-186) and 8a:d2:32:87:dc:db(ip-172-31-6-167)
INFO: 2021/07/20 19:58:48.550579 ipsec: InitSALocal: 172.31.6.167 -> 172.31.5.186 :6784 0x3e355f40
INFO: 2021/07/20 19:58:48.650979 ->[172.31.6.167:45403|8a:d2:32:87:dc:db(ip-172-31-6-167)]: connection shutting down due to error: IP allocation was seeded by different peers (received: [8a:d2:32:87:dc:db(ip-172-31-6-167)], ours: [a6:66:22:93:29:51(ip-172-31-5-186)])
...
To add a little more color, this issue happens infrequently.
I've spent some time digging through the code and logs. it is difficult to tell what is going wrong as the logs are sparse and I am not too familiar with the project.
I've tried to reproduce this issue 4 more times since with debug logs enabled with no luck. I will continue to try.
Adding some more detail. Debug logs were captured from a server that was reused after weave reset. Rebooting the server after the reset and before re-installing fixes the issue. A restart is not required with weave 2.6.5.