Peer add is failing with backend network
Observed behavior
Peer add is failing with backend network
Expected/desired behavior
Peer add should be success
Details on how to reproduce (minimal and precise)
- Have 3 node set-up with external etcd. In one machine(say n1) as 2 nic/ip.
- Peer add n1 with one ip(i.e 10.70.35.80) from node n2. Peer add is failing.
[root@dhcp35-122 ~]# glustercli peer add 10.70.35.80
Peer add failed
Response headers:
X-Request-Id: 0af89bd5-1d07-49f7-89dd-f86c652d956a
X-Gluster-Cluster-Id: 10f3fb83-326a-4e2a-97f1-7c6a5c9537f6
X-Gluster-Peer-Id: 5934c470-a583-42e4-a285-58ca93db53d4
Response body:
failed to send join cluster request
- Now tried peer add n1 with another ip(i.e, 10.70.35.121) form node n2. Peer add was success.
[root@dhcp35-122 ~]# glustercli peer add 10.70.35.121
Peer add successful
+--------------------------------------+-----------------------------------+--------------------+--------------------+
| ID | NAME | CLIENT ADDRESSES | PEER ADDRESSES |
+--------------------------------------+-----------------------------------+--------------------+--------------------+
| 7446bf45-00ae-4407-a42f-230330d956ae | dhcp35-121.lab.eng.blr.redhat.com | 127.0.0.1:24007 | 10.70.35.121:24008 |
| | | 10.70.35.121:24007 | |
| | | 10.70.35.80:24007 | |
+--------------------------------------+-----------------------------------+--------------------+--------------------+
Information about the environment:
- Glusterd2 version used (e.g. v4.1.0 or master):
[root@dhcp35-122 ~]# glusterd2 --version
glusterd version: v6.0-dev.28.git1b19aeb
git SHA: 1b19aeb
go version: go1.9.4
go OS/arch: linux/amd64
- Operating system used:
[root@dhcp35-229 ~]# cat /etc/centos-release
CentOS Linux release 7.5.1804 (Core)
- Glusterd2 compiled from sources, as a package (rpm/deb), or container:
- Using External ETCD: (yes/no, if yes ETCD version):
yes, etcdmain: etcd Version: 3.3.8
- If container, which container image:
- Using kubernetes, openshift, or direct install:
- If kubernetes/openshift, is gluster running inside kubernetes/openshift or outside:
Other useful information
[root@dhcp35-122 ~]# cat /etc/glusterd2/glusterd2.toml
localstatedir = "/var/lib/glusterd2"
logdir = "/var/log/glusterd2"
logfile = "glusterd2.log"
loglevel = "INFO"
rundir = "/var/run/glusterd2"
defaultpeerport = "24008"
peeraddress = ":24008"
clientaddress = ":24007"
#restauth should be set to false to disable REST authentication in glusterd2
restauth = false
etcdendpoints = "http://10.70.35.10:2379"
noembed = true
Log:
time="2018-11-16 13:30:55.198246" level=info msg="peer disconnected from store" id=3ca95c6f-80fc-4964-832d-5439ee6765dd source="[liveness.go:51:events.(*livenessWatcher).Watch]"
time="2018-11-16 13:34:02.753821" level=error msg="failed RPC call" error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial tcp 10.70.35.80:24008: getsockopt: connection refused\"" remote="10.70.35.80:24008" rpc=PeerService.Join source="[peer-rpc-clnt.go:47:peers.(*peerSvcClnt).JoinCluster]"
time="2018-11-16 13:34:02.753962" level=error msg="sending Join request failed" error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial tcp 10.70.35.80:24008: getsockopt: connection refused\"" peer="10.70.35.80:24008" reqid=1085af62-50f9-4ea8-afed-43ff1d6a570a source="[addpeer.go:82:peers.addPeerHandler]"
time="2018-11-16 13:34:02.754089" level=info msg="127.0.0.1 - - [16/Nov/2018:13:34:02 +0530] \"POST /v1/peers HTTP/1.1\" 500 72" reqid=1085af62-50f9-4ea8-afed-43ff1d6a570a
@atinmu Can you decide the priority for this issue ? Does this needs to be taken up now ?
Error while dialing dial tcp 10.70.35.80:24008: getsockopt: connection refused"" peer="10.70.35.80:24008"
says that connection refused on 24008 port.
defaultpeerport = "24008" peeraddress = ":24008" clientaddress = ":24007"
and from the config, I see the glusterd2 is listening on all the interfaces
if you want to run glusterd2 on anyone for the nic you need to do changes in the configuration file
peeraddress = "<IP>:24008" clientaddress = "<IP>:24007"
@Akarsha-rai can you paste the glustercli peer list output?
and some more info on the scenerio you are trying out.
@Akarsha-rai If you want the grpc server to listen on all IP, you can set peeraddress = "0.0.0.0:24008" in config file
@Akarsha-rai If you want the grpc server to listen on all IP, you can set peeraddress = "0.0.0.0:24008" in config file
I think we should mention this in doc so that it does not confuse the user. @Akarsha-rai Can you verify this?
I tried giving peer addresses = "0.0.0.0:24008" in config file and was able to peer add with backend network.
But I faced few issues:
-
Suppose node n1 has 2 ip( a & b), when I add 'b' from node n2 peer add was successful. Later when I try to add 'a' , peer add will fail with error saying "peer is part of another cluster". Shouldn't it fail with error "Peer exists with given addresses"?
-
If node n1 has 2 ip(a & b), when I tried add 'b' from node n1 peer add was successful.
[root@dhcp35-121 ~]# glustercli peer add 10.70.35.80
Peer add successful
+--------------------------------------+-----------------------------------+--------------------+-------------------+
| ID | NAME | CLIENT ADDRESSES | PEER ADDRESSES |
+--------------------------------------+-----------------------------------+--------------------+-------------------+
| 327548aa-db90-485e-9439-d9ff117609c1 | dhcp35-121.lab.eng.blr.redhat.com | 127.0.0.1:24007 | 10.70.35.80:24008 |
| | | 10.70.35.121:24007 | 0.0.0.0:24008 |
| | | 10.70.35.80:24007 | |
+--------------------------------------+-----------------------------------+--------------------+-------------------+
[root@dhcp35-121 ~]# glustercli peer status
+--------------------------------------+-----------------------------------+--------------------+-------------------+--------+-------+
| ID | NAME | CLIENT ADDRESSES | PEER ADDRESSES | ONLINE | PID |
+--------------------------------------+-----------------------------------+--------------------+-------------------+--------+-------+
| 327548aa-db90-485e-9439-d9ff117609c1 | dhcp35-121.lab.eng.blr.redhat.com | 127.0.0.1:24007 | 10.70.35.80:24008 | yes | 10274 |
| | | 10.70.35.121:24007 | 0.0.0.0:24008 | | |
| | | 10.70.35.80:24007 | | | |
| f0eb23bb-5447-48da-bfd8-0b255ecf6f84 | dhcp35-121.lab.eng.blr.redhat.com | 127.0.0.1:24007 | 0.0.0.0:24008 | yes | 10274 |
| | | 10.70.35.121:24007 | | | |
| | | 10.70.35.80:24007 | | | |
+--------------------------------------+-----------------------------------+--------------------+-------------------+--------+-------+
I think checking client addresses as well as peer addresses before adding the peer should solve this problem. @aravindavk any suggestions?
@aravindavk Is this a valid scenario in a opinionated GCS cluster?
Not applicable in GCS setup, both client and peer addresses are same in gcs setup