Ernie Dunbar
2016-Apr-06 18:42 UTC
[Gluster-users] Error "Failed to find host nfs1.lightspeed.ca" when adding a new node to the cluster.
I've already successfully created a Gluster cluster, but when I try to add a new node, gluster on the new node claims it can't find the hostname of the first node in the cluster. I've added the hostname nfs1.lightspeed.ca to /etc/hosts like this: root at nfs3:/home/ernied# cat /etc/hosts 127.0.0.1 localhost 192.168.1.31 nfs1.lightspeed.ca nfs1 192.168.1.32 nfs2.lightspeed.ca nfs2 127.0.1.1 nfs3.lightspeed.ca nfs3 # The following lines are desirable for IPv6 capable hosts ::1 localhost ip6-localhost ip6-loopback ff02::1 ip6-allnodes ff02::2 ip6-allrouters I can ping the hostname: root at nfs3:/home/ernied# ping -c 3 nfs1 PING nfs1.lightspeed.ca (192.168.1.31) 56(84) bytes of data. 64 bytes from nfs1.lightspeed.ca (192.168.1.31): icmp_seq=1 ttl=64 time=0.148 ms 64 bytes from nfs1.lightspeed.ca (192.168.1.31): icmp_seq=2 ttl=64 time=0.126 ms 64 bytes from nfs1.lightspeed.ca (192.168.1.31): icmp_seq=3 ttl=64 time=0.133 ms --- nfs1.lightspeed.ca ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 1998ms rtt min/avg/max/mdev = 0.126/0.135/0.148/0.016 ms I can get gluster to probe the hostname: root at nfs3:/home/ernied# gluster peer probe nfs1 peer probe: success. Host nfs1 port 24007 already in peer list But if I try to create the brick on the new node, it says that the host can't be found? Um... root at nfs3:/home/ernied# gluster volume create gv2 replica 3 nfs1.lightspeed.ca:/brick1/gv2/ nfs2.lightspeed.ca:/brick1/gv2/ nfs3.lightspeed.ca:/brick1/gv2 volume create: gv2: failed: Failed to find host nfs1.lightspeed.ca Our logs from /var/log/glusterfs/etc-glusterfs-glusterd.vol.log: [2016-04-06 18:19:18.107459] E [MSGID: 106452] [glusterd-utils.c:5825:glusterd_new_brick_validate] 0-management: Failed to find host nfs1.lightspeed.ca [2016-04-06 18:19:18.107496] E [MSGID: 106536] [glusterd-volume-ops.c:1364:glusterd_op_stage_create_volume] 0-management: Failed to find host nfs1.lightspeed.ca [2016-04-06 18:19:18.107516] E [MSGID: 106301] [glusterd-syncop.c:1281:gd_stage_op_phase] 0-management: Staging of operation 'Volume Create' failed on localhost : Failed to find host nfs1.lightspeed.ca [2016-04-06 18:19:18.231864] E [MSGID: 106170] [glusterd-handshake.c:1051:gd_validate_mgmt_hndsk_req] 0-management: Request from peer 192.168.1.31:65530 has an entry in peerinfo, but uuid does not match [2016-04-06 18:19:18.231919] E [MSGID: 106170] [glusterd-handshake.c:1060:gd_validate_mgmt_hndsk_req] 0-management: Rejecting management handshake request from unknown peer 192.168.1.31:65530 That error about the entry in peerinfo doesn't match anything in Google besides the source code for Gluster. My guess is that my earlier unsuccessful attempts to add this node before v3.7.10 have created a conflict that needs to be cleared.
Ernie Dunbar
2016-Apr-06 22:34 UTC
[Gluster-users] Error "Failed to find host nfs1.lightspeed.ca" when adding a new node to the cluster.
On 2016-04-06 11:42, Ernie Dunbar wrote:> I've already successfully created a Gluster cluster, but when I try to > add a new node, gluster on the new node claims it can't find the > hostname of the first node in the cluster. > > I've added the hostname nfs1.lightspeed.ca to /etc/hosts like this: > > root at nfs3:/home/ernied# cat /etc/hosts > 127.0.0.1 localhost > 192.168.1.31 nfs1.lightspeed.ca nfs1 > 192.168.1.32 nfs2.lightspeed.ca nfs2 > 127.0.1.1 nfs3.lightspeed.ca nfs3 > > > # The following lines are desirable for IPv6 capable hosts > ::1 localhost ip6-localhost ip6-loopback > ff02::1 ip6-allnodes > ff02::2 ip6-allrouters > > I can ping the hostname: > > root at nfs3:/home/ernied# ping -c 3 nfs1 > PING nfs1.lightspeed.ca (192.168.1.31) 56(84) bytes of data. > 64 bytes from nfs1.lightspeed.ca (192.168.1.31): icmp_seq=1 ttl=64 > time=0.148 ms > 64 bytes from nfs1.lightspeed.ca (192.168.1.31): icmp_seq=2 ttl=64 > time=0.126 ms > 64 bytes from nfs1.lightspeed.ca (192.168.1.31): icmp_seq=3 ttl=64 > time=0.133 ms > > --- nfs1.lightspeed.ca ping statistics --- > 3 packets transmitted, 3 received, 0% packet loss, time 1998ms > rtt min/avg/max/mdev = 0.126/0.135/0.148/0.016 ms > > I can get gluster to probe the hostname: > > root at nfs3:/home/ernied# gluster peer probe nfs1 > peer probe: success. Host nfs1 port 24007 already in peer list > > But if I try to create the brick on the new node, it says that the > host can't be found? Um... > > root at nfs3:/home/ernied# gluster volume create gv2 replica 3 > nfs1.lightspeed.ca:/brick1/gv2/ nfs2.lightspeed.ca:/brick1/gv2/ > nfs3.lightspeed.ca:/brick1/gv2 > volume create: gv2: failed: Failed to find host nfs1.lightspeed.ca > > Our logs from /var/log/glusterfs/etc-glusterfs-glusterd.vol.log: > > [2016-04-06 18:19:18.107459] E [MSGID: 106452] > [glusterd-utils.c:5825:glusterd_new_brick_validate] 0-management: > Failed to find host nfs1.lightspeed.ca > [2016-04-06 18:19:18.107496] E [MSGID: 106536] > [glusterd-volume-ops.c:1364:glusterd_op_stage_create_volume] > 0-management: Failed to find host nfs1.lightspeed.ca > [2016-04-06 18:19:18.107516] E [MSGID: 106301] > [glusterd-syncop.c:1281:gd_stage_op_phase] 0-management: Staging of > operation 'Volume Create' failed on localhost : Failed to find host > nfs1.lightspeed.ca > [2016-04-06 18:19:18.231864] E [MSGID: 106170] > [glusterd-handshake.c:1051:gd_validate_mgmt_hndsk_req] 0-management: > Request from peer 192.168.1.31:65530 has an entry in peerinfo, but > uuid does not match > [2016-04-06 18:19:18.231919] E [MSGID: 106170] > [glusterd-handshake.c:1060:gd_validate_mgmt_hndsk_req] 0-management: > Rejecting management handshake request from unknown peer > 192.168.1.31:65530 > > That error about the entry in peerinfo doesn't match anything in > Google besides the source code for Gluster. My guess is that my > earlier unsuccessful attempts to add this node before v3.7.10 have > created a conflict that needs to be cleared.More interesting, is what happens when I try to add the third server to the brick from the first gluster server: root at nfs1:/home/ernied# gluster volume add-brick gv2 replica 3 nfs3:/brick1/gv2 volume add-brick: failed: One or more nodes do not support the required op-version. Cluster op-version must atleast be 30600. Yet, when I view the operating version in /var/lib/glusterd/glusterd.info: root at nfs1:/home/ernied# cat /var/lib/glusterd/glusterd.info UUID=1207917a-23bc-4bae-8238-cd691b7082c7 operating-version=30501 root at nfs2:/home/ernied# cat /var/lib/glusterd/glusterd.info UUID=e394fcec-41da-482a-9b30-089f717c5c06 operating-version=30501 root at nfs3:/home/ernied# cat /var/lib/glusterd/glusterd.info UUID=ae191e96-9cd6-4e2b-acae-18f2cc45e6ed operating-version=30501 I see that the operating version is the same on all nodes!