glusterd2 icon indicating copy to clipboard operation
glusterd2 copied to clipboard

Peer add is failing with backend network

Open Akarsha-rai opened this issue 7 years ago • 8 comments

Observed behavior

Peer add is failing with backend network

Expected/desired behavior

Peer add should be success

Details on how to reproduce (minimal and precise)

  1. Have 3 node set-up with external etcd. In one machine(say n1) as 2 nic/ip.
  2. Peer add n1 with one ip(i.e 10.70.35.80) from node n2. Peer add is failing.
[root@dhcp35-122 ~]# glustercli peer add 10.70.35.80
Peer add failed

Response headers:
X-Request-Id: 0af89bd5-1d07-49f7-89dd-f86c652d956a
X-Gluster-Cluster-Id: 10f3fb83-326a-4e2a-97f1-7c6a5c9537f6
X-Gluster-Peer-Id: 5934c470-a583-42e4-a285-58ca93db53d4

Response body:
failed to send join cluster request
  1. Now tried peer add n1 with another ip(i.e, 10.70.35.121) form node n2. Peer add was success.
[root@dhcp35-122 ~]# glustercli peer add 10.70.35.121
Peer add successful
+--------------------------------------+-----------------------------------+--------------------+--------------------+
|                  ID                  |               NAME                |  CLIENT ADDRESSES  |   PEER ADDRESSES   |
+--------------------------------------+-----------------------------------+--------------------+--------------------+
| 7446bf45-00ae-4407-a42f-230330d956ae | dhcp35-121.lab.eng.blr.redhat.com | 127.0.0.1:24007    | 10.70.35.121:24008 |
|                                      |                                   | 10.70.35.121:24007 |                    |
|                                      |                                   | 10.70.35.80:24007  |                    |
+--------------------------------------+-----------------------------------+--------------------+--------------------+

Information about the environment:

- Glusterd2 version used (e.g. v4.1.0 or master): 

[root@dhcp35-122 ~]# glusterd2 --version
glusterd version: v6.0-dev.28.git1b19aeb
git SHA: 1b19aeb
go version: go1.9.4
go OS/arch: linux/amd64

- Operating system used:

[root@dhcp35-229 ~]# cat /etc/centos-release
CentOS Linux release 7.5.1804 (Core)

- Glusterd2 compiled from sources, as a package (rpm/deb), or container: 
- Using External ETCD: (yes/no, if yes ETCD version):
yes, etcdmain: etcd Version: 3.3.8
- If container, which container image: 
- Using kubernetes, openshift, or direct install: 
- If kubernetes/openshift, is gluster running inside kubernetes/openshift or outside: 

Other useful information

[root@dhcp35-122 ~]# cat /etc/glusterd2/glusterd2.toml 
localstatedir = "/var/lib/glusterd2"
logdir = "/var/log/glusterd2"
logfile = "glusterd2.log"
loglevel = "INFO"
rundir = "/var/run/glusterd2"
defaultpeerport = "24008"
peeraddress = ":24008"
clientaddress = ":24007"
#restauth should be set to false to disable REST authentication in glusterd2
restauth = false
etcdendpoints = "http://10.70.35.10:2379"
noembed = true

Log:

time="2018-11-16 13:30:55.198246" level=info msg="peer disconnected from store" id=3ca95c6f-80fc-4964-832d-5439ee6765dd source="[liveness.go:51:events.(*livenessWatcher).Watch]"
time="2018-11-16 13:34:02.753821" level=error msg="failed RPC call" error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial tcp 10.70.35.80:24008: getsockopt: connection refused\"" remote="10.70.35.80:24008" rpc=PeerService.Join source="[peer-rpc-clnt.go:47:peers.(*peerSvcClnt).JoinCluster]"
time="2018-11-16 13:34:02.753962" level=error msg="sending Join request failed" error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial tcp 10.70.35.80:24008: getsockopt: connection refused\"" peer="10.70.35.80:24008" reqid=1085af62-50f9-4ea8-afed-43ff1d6a570a source="[addpeer.go:82:peers.addPeerHandler]"
time="2018-11-16 13:34:02.754089" level=info msg="127.0.0.1 - - [16/Nov/2018:13:34:02 +0530] \"POST /v1/peers HTTP/1.1\" 500 72" reqid=1085af62-50f9-4ea8-afed-43ff1d6a570a

Akarsha-rai avatar Nov 16 '18 09:11 Akarsha-rai

@atinmu Can you decide the priority for this issue ? Does this needs to be taken up now ?

vpandey-RH avatar Nov 19 '18 07:11 vpandey-RH

Error while dialing dial tcp 10.70.35.80:24008: getsockopt: connection refused"" peer="10.70.35.80:24008"

says that connection refused on 24008 port.

defaultpeerport = "24008" peeraddress = ":24008" clientaddress = ":24007"

and from the config, I see the glusterd2 is listening on all the interfaces

if you want to run glusterd2 on anyone for the nic you need to do changes in the configuration file

peeraddress = "<IP>:24008" clientaddress = "<IP>:24007" @Akarsha-rai can you paste the glustercli peer list output? and some more info on the scenerio you are trying out.

Madhu-1 avatar Nov 22 '18 08:11 Madhu-1

@Akarsha-rai If you want the grpc server to listen on all IP, you can set peeraddress = "0.0.0.0:24008" in config file

oshankkumar avatar Nov 22 '18 09:11 oshankkumar

@Akarsha-rai If you want the grpc server to listen on all IP, you can set peeraddress = "0.0.0.0:24008" in config file

I think we should mention this in doc so that it does not confuse the user. @Akarsha-rai Can you verify this?

rishubhjain avatar Nov 22 '18 10:11 rishubhjain

I tried giving peer addresses = "0.0.0.0:24008" in config file and was able to peer add with backend network.

But I faced few issues:

  1. Suppose node n1 has 2 ip( a & b), when I add 'b' from node n2 peer add was successful. Later when I try to add 'a' , peer add will fail with error saying "peer is part of another cluster". Shouldn't it fail with error "Peer exists with given addresses"?

  2. If node n1 has 2 ip(a & b), when I tried add 'b' from node n1 peer add was successful.

[root@dhcp35-121 ~]# glustercli peer add 10.70.35.80
Peer add successful
+--------------------------------------+-----------------------------------+--------------------+-------------------+
|                  ID                  |               NAME                |  CLIENT ADDRESSES  |  PEER ADDRESSES   |
+--------------------------------------+-----------------------------------+--------------------+-------------------+
| 327548aa-db90-485e-9439-d9ff117609c1 | dhcp35-121.lab.eng.blr.redhat.com | 127.0.0.1:24007    | 10.70.35.80:24008 |
|                                      |                                   | 10.70.35.121:24007 | 0.0.0.0:24008     |
|                                      |                                   | 10.70.35.80:24007  |                   |
+--------------------------------------+-----------------------------------+--------------------+-------------------+

[root@dhcp35-121 ~]# glustercli peer status
+--------------------------------------+-----------------------------------+--------------------+-------------------+--------+-------+
|                  ID                  |               NAME                |  CLIENT ADDRESSES  |  PEER ADDRESSES   | ONLINE |  PID  |
+--------------------------------------+-----------------------------------+--------------------+-------------------+--------+-------+
| 327548aa-db90-485e-9439-d9ff117609c1 | dhcp35-121.lab.eng.blr.redhat.com | 127.0.0.1:24007    | 10.70.35.80:24008 | yes    | 10274 |
|                                      |                                   | 10.70.35.121:24007 | 0.0.0.0:24008     |        |       |
|                                      |                                   | 10.70.35.80:24007  |                   |        |       |
| f0eb23bb-5447-48da-bfd8-0b255ecf6f84 | dhcp35-121.lab.eng.blr.redhat.com | 127.0.0.1:24007    | 0.0.0.0:24008     | yes    | 10274 |
|                                      |                                   | 10.70.35.121:24007 |                   |        |       |
|                                      |                                   | 10.70.35.80:24007  |                   |        |       |
+--------------------------------------+-----------------------------------+--------------------+-------------------+--------+-------+

Akarsha-rai avatar Nov 26 '18 06:11 Akarsha-rai

I think checking client addresses as well as peer addresses before adding the peer should solve this problem. @aravindavk any suggestions?

rishubhjain avatar Nov 26 '18 06:11 rishubhjain

@aravindavk Is this a valid scenario in a opinionated GCS cluster?

atinmu avatar Nov 30 '18 11:11 atinmu

Not applicable in GCS setup, both client and peer addresses are same in gcs setup

aravindavk avatar Dec 02 '18 06:12 aravindavk