Davy Croonen
2015-Sep-11 12:11 UTC
[Gluster-users] Gluster 3.6.4 peer rejected while doing probe
Atin Please see the requested attachments. KR Davy> On 11 Sep 2015, at 14:03, Atin Mukherjee <amukherj at redhat.com> wrote: > > Could you attach the contents of /var/lib/glusterd/vol/<volname>/info > file from both the nodes? > > ~Atin > > On 09/11/2015 04:50 PM, Davy Croonen wrote: >> Thanks for your quick respons. >> >> As reported in the log the checksums are indeed not the same. On >> gfs01a-dcg it is 'info=1266454712? and on gfs02a-dcg it is >> 'info=2613085848?. Of course my next question is how can I fix this? >> >> I already tried by stopping the gluster daemon on gfs02a-dcg, deleting >> the entire vols directory and starting the gluster daemon again. On the >> gfs01a-dcg host I now did a gluster peer status which shows: >> >> Hostname: gfs02a-dcg.intnet.be <http://gfs02a-dcg.intnet.be> >> Uuid: 29592d5b-242b-43b5-afc5-5f9a1496d59f >> State: Peer in Cluster (Connected) >> >> But, the checksum of the public volume is still not the same on >> gfs01a-dcg and gfs02a-dcg and also running a gluster peer status on >> gfs01b-dcg (the replica of gfs01a-dcg) gives me: >> >> Hostname: gfs02a-dcg.intnet.be <http://gfs02a-dcg.intnet.be> >> Uuid: 29592d5b-242b-43b5-afc5-5f9a1496d59f >> State: Peer Rejected (Connected) >> >> So my question remains any way to fix this? >> >> Kind regards >> >> Davy >> >>> On 11 Sep 2015, at 12:39, Mohammed Rafi K C <rkavunga at redhat.com >>> <mailto:rkavunga at redhat.com>> wrote: >>> >>> Can you check the checksum of the volume "public" in both of the >>> current nodes. Checksums are located in >>> (/var/lib/glusterd/vols/public/cksum). >>> >>> Regards >>> Rafi KC >>> >>> On 09/11/2015 03:24 PM, Davy Croonen wrote: >>>> Hi all >>>> >>>> We have a production cluster with 2 nodes (gfs01a and gfs01b) in a >>>> distributed replicate setup with glusterfs 3.6.4. We want to expand >>>> the volume with 2 extra nodes (gfs02a and gfs02b) because we are >>>> running out of diskspace. Therefor we deployed 2 extra nodes with >>>> glusterfs 3.6.4. >>>> >>>> Now, while probing the 2 new nodes from a node in the existing >>>> cluster we got the following error: >>>> >>>> root at gfs01a-dcg:~# gluster peer probe gfs02a-dcg.intnet.be >>>> <http://gfs02a-dcg.intnet.be/> >>>> peer probe: success. >>>> root at gfs01a-dcg:~# gluster peer status >>>> Number of Peers: 2 >>>> >>>> Hostname: gfs01b-dcg.intnet.be <http://gfs01b-dcg.intnet.be/> >>>> Uuid: cfc83cf2-b719-40c7-afea-b23accc714c3 >>>> State: Peer in Cluster (Connected) >>>> >>>> Hostname: gfs02a-dcg.intnet.be <http://gfs02a-dcg.intnet.be/> >>>> Uuid: 29592d5b-242b-43b5-afc5-5f9a1496d59f >>>> *State: Peer Rejected (Connected)* >>>> >>>> In the log file /var/log/glusterfs/etc-glusterfs-glusterd.vol.log the >>>> following entries are written: >>>> >>>> [2015-09-11 09:37:49.405906] I >>>> [glusterd-handler.c:1031:__glusterd_handle_cli_probe] 0-glusterd: >>>> Received CLI probe req gfs02a-dcg.intnet.be >>>> <http://gfs02a-dcg.intnet.be/> 24007 >>>> [2015-09-11 09:37:49.428630] I >>>> [glusterd-handler.c:3198:glusterd_probe_begin] 0-glusterd: Unable to >>>> find peerinfo for host: gfs02a-dcg.intnet.be >>>> <http://gfs02a-dcg.intnet.be/> (24007) >>>> [2015-09-11 09:37:49.438636] I >>>> [rpc-clnt.c:969:rpc_clnt_connection_init] 0-management: setting >>>> frame-timeout to 600 >>>> [2015-09-11 09:37:49.440513] I >>>> [glusterd-handler.c:3131:glusterd_friend_add] 0-management: connect >>>> returned 0 >>>> [2015-09-11 09:37:49.474316] I >>>> [glusterd-rpc-ops.c:245:__glusterd_probe_cbk] 0-management: Received >>>> probe resp from uuid: 29592d5b-242b-43b5-afc5-5f9a1496d59f, host: >>>> gfs02a-dcg.intnet.be <http://gfs02a-dcg.intnet.be/> >>>> [2015-09-11 09:37:49.481801] I >>>> [glusterd-rpc-ops.c:387:__glusterd_probe_cbk] 0-glusterd: Received >>>> resp to probe req >>>> [2015-09-11 09:37:51.650265] I >>>> [glusterd-rpc-ops.c:437:__glusterd_friend_add_cbk] 0-glusterd: >>>> Received ACC from uuid: 29592d5b-242b-43b5-afc5-5f9a1496d59f, host: >>>> gfs02a-dcg.intnet.be <http://gfs02a-dcg.intnet.be/>, port: 0 >>>> [2015-09-11 09:37:51.665861] I >>>> [glusterd-handshake.c:1119:__glusterd_mgmt_hndsk_versions_ack] >>>> 0-management: using the op-version 30603 >>>> [2015-09-11 09:37:51.690170] I >>>> [glusterd-handler.c:2543:__glusterd_handle_probe_query] 0-glusterd: >>>> Received probe from uuid: 29592d5b-242b-43b5-afc5-5f9a1496d59f >>>> [2015-09-11 09:37:51.692652] I >>>> [glusterd-handler.c:2595:__glusterd_handle_probe_query] 0-glusterd: >>>> Responded to gfs02a-dcg.intnet.be <http://gfs02a-dcg.intnet.be/>, >>>> op_ret: 0, op_errno: 0, ret: 0 >>>> [2015-09-11 09:37:51.706203] I >>>> [glusterd-handler.c:2232:__glusterd_handle_incoming_friend_req] >>>> 0-glusterd: Received probe from uuid: >>>> 29592d5b-242b-43b5-afc5-5f9a1496d59f >>>> *[2015-09-11 09:37:51.708909] E [MSGID: 106010] >>>> [glusterd-utils.c:3297:glusterd_compare_friend_volume] 0-management: >>>> Version of Cksums public differ. local cksum = 1932535021, remote >>>> cksum = 2474653383 on peer gfs02a-dcg.intnet.be >>>> <http://gfs02a-dcg.intnet.be/>* >>>> [2015-09-11 09:37:51.709026] I >>>> [glusterd-handler.c:3367:glusterd_xfer_friend_add_resp] 0-glusterd: >>>> Responded to gfs02a-dcg.intnet.be <http://gfs02a-dcg.intnet.be/> (0), >>>> ret: 0 >>>> [2015-09-11 09:37:55.537231] I >>>> [glusterd-handler.c:1241:__glusterd_handle_cli_list_friends] >>>> 0-glusterd: Received cli list req >>>> >>>> The exact same error appears while probing the second node (gfs02b). >>>> >>>> Anyone any idea how to solve this? >>>> >>>> Thanks in advance. >>>> >>>> Kind regards >>>> Davy >>>> >>>> >>>> _______________________________________________ >>>> Gluster-users mailing list >>>> Gluster-users at gluster.org >>>> http://www.gluster.org/mailman/listinfo/gluster-users >>> >> >> >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> http://www.gluster.org/mailman/listinfo/gluster-users >>-------------- next part -------------- A non-text attachment was scrubbed... Name: gfs01a-dcg.info Type: application/octet-stream Size: 457 bytes Desc: gfs01a-dcg.info URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150911/c39c83d4/attachment.obj> -------------- next part -------------- A non-text attachment was scrubbed... Name: gfs02a-dcg.info Type: application/octet-stream Size: 559 bytes Desc: gfs02a-dcg.info URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150911/c39c83d4/attachment-0001.obj>
Atin Mukherjee
2015-Sep-14 05:43 UTC
[Gluster-users] Gluster 3.6.4 peer rejected while doing probe
Davy, This seems to be an issue which we also faced couple of months back during upgrade testing and a bugzilla [1] was raised for the same. At the time we didn't have the work around to make peer probe work, but somehow I managed to get the workaround today. Could you do an explicit volume set on the existing cluster and then do a peer probe? Let me know if that works. [1] https://bugzilla.redhat.com/show_bug.cgi?id=1248895 Thanks, Atin On 09/11/2015 05:41 PM, Davy Croonen wrote:> Atin > > Please see the requested attachments. > > KR > Davy > >> On 11 Sep 2015, at 14:03, Atin Mukherjee <amukherj at redhat.com> wrote: >> >> Could you attach the contents of /var/lib/glusterd/vol/<volname>/info >> file from both the nodes? >> >> ~Atin >> >> On 09/11/2015 04:50 PM, Davy Croonen wrote: >>> Thanks for your quick respons. >>> >>> As reported in the log the checksums are indeed not the same. On >>> gfs01a-dcg it is 'info=1266454712? and on gfs02a-dcg it is >>> 'info=2613085848?. Of course my next question is how can I fix this? >>> >>> I already tried by stopping the gluster daemon on gfs02a-dcg, deleting >>> the entire vols directory and starting the gluster daemon again. On the >>> gfs01a-dcg host I now did a gluster peer status which shows: >>> >>> Hostname: gfs02a-dcg.intnet.be <http://gfs02a-dcg.intnet.be> >>> Uuid: 29592d5b-242b-43b5-afc5-5f9a1496d59f >>> State: Peer in Cluster (Connected) >>> >>> But, the checksum of the public volume is still not the same on >>> gfs01a-dcg and gfs02a-dcg and also running a gluster peer status on >>> gfs01b-dcg (the replica of gfs01a-dcg) gives me: >>> >>> Hostname: gfs02a-dcg.intnet.be <http://gfs02a-dcg.intnet.be> >>> Uuid: 29592d5b-242b-43b5-afc5-5f9a1496d59f >>> State: Peer Rejected (Connected) >>> >>> So my question remains any way to fix this? >>> >>> Kind regards >>> >>> Davy >>> >>>> On 11 Sep 2015, at 12:39, Mohammed Rafi K C <rkavunga at redhat.com >>>> <mailto:rkavunga at redhat.com>> wrote: >>>> >>>> Can you check the checksum of the volume "public" in both of the >>>> current nodes. Checksums are located in >>>> (/var/lib/glusterd/vols/public/cksum). >>>> >>>> Regards >>>> Rafi KC >>>> >>>> On 09/11/2015 03:24 PM, Davy Croonen wrote: >>>>> Hi all >>>>> >>>>> We have a production cluster with 2 nodes (gfs01a and gfs01b) in a >>>>> distributed replicate setup with glusterfs 3.6.4. We want to expand >>>>> the volume with 2 extra nodes (gfs02a and gfs02b) because we are >>>>> running out of diskspace. Therefor we deployed 2 extra nodes with >>>>> glusterfs 3.6.4. >>>>> >>>>> Now, while probing the 2 new nodes from a node in the existing >>>>> cluster we got the following error: >>>>> >>>>> root at gfs01a-dcg:~# gluster peer probe gfs02a-dcg.intnet.be >>>>> <http://gfs02a-dcg.intnet.be/> >>>>> peer probe: success. >>>>> root at gfs01a-dcg:~# gluster peer status >>>>> Number of Peers: 2 >>>>> >>>>> Hostname: gfs01b-dcg.intnet.be <http://gfs01b-dcg.intnet.be/> >>>>> Uuid: cfc83cf2-b719-40c7-afea-b23accc714c3 >>>>> State: Peer in Cluster (Connected) >>>>> >>>>> Hostname: gfs02a-dcg.intnet.be <http://gfs02a-dcg.intnet.be/> >>>>> Uuid: 29592d5b-242b-43b5-afc5-5f9a1496d59f >>>>> *State: Peer Rejected (Connected)* >>>>> >>>>> In the log file /var/log/glusterfs/etc-glusterfs-glusterd.vol.log the >>>>> following entries are written: >>>>> >>>>> [2015-09-11 09:37:49.405906] I >>>>> [glusterd-handler.c:1031:__glusterd_handle_cli_probe] 0-glusterd: >>>>> Received CLI probe req gfs02a-dcg.intnet.be >>>>> <http://gfs02a-dcg.intnet.be/> 24007 >>>>> [2015-09-11 09:37:49.428630] I >>>>> [glusterd-handler.c:3198:glusterd_probe_begin] 0-glusterd: Unable to >>>>> find peerinfo for host: gfs02a-dcg.intnet.be >>>>> <http://gfs02a-dcg.intnet.be/> (24007) >>>>> [2015-09-11 09:37:49.438636] I >>>>> [rpc-clnt.c:969:rpc_clnt_connection_init] 0-management: setting >>>>> frame-timeout to 600 >>>>> [2015-09-11 09:37:49.440513] I >>>>> [glusterd-handler.c:3131:glusterd_friend_add] 0-management: connect >>>>> returned 0 >>>>> [2015-09-11 09:37:49.474316] I >>>>> [glusterd-rpc-ops.c:245:__glusterd_probe_cbk] 0-management: Received >>>>> probe resp from uuid: 29592d5b-242b-43b5-afc5-5f9a1496d59f, host: >>>>> gfs02a-dcg.intnet.be <http://gfs02a-dcg.intnet.be/> >>>>> [2015-09-11 09:37:49.481801] I >>>>> [glusterd-rpc-ops.c:387:__glusterd_probe_cbk] 0-glusterd: Received >>>>> resp to probe req >>>>> [2015-09-11 09:37:51.650265] I >>>>> [glusterd-rpc-ops.c:437:__glusterd_friend_add_cbk] 0-glusterd: >>>>> Received ACC from uuid: 29592d5b-242b-43b5-afc5-5f9a1496d59f, host: >>>>> gfs02a-dcg.intnet.be <http://gfs02a-dcg.intnet.be/>, port: 0 >>>>> [2015-09-11 09:37:51.665861] I >>>>> [glusterd-handshake.c:1119:__glusterd_mgmt_hndsk_versions_ack] >>>>> 0-management: using the op-version 30603 >>>>> [2015-09-11 09:37:51.690170] I >>>>> [glusterd-handler.c:2543:__glusterd_handle_probe_query] 0-glusterd: >>>>> Received probe from uuid: 29592d5b-242b-43b5-afc5-5f9a1496d59f >>>>> [2015-09-11 09:37:51.692652] I >>>>> [glusterd-handler.c:2595:__glusterd_handle_probe_query] 0-glusterd: >>>>> Responded to gfs02a-dcg.intnet.be <http://gfs02a-dcg.intnet.be/>, >>>>> op_ret: 0, op_errno: 0, ret: 0 >>>>> [2015-09-11 09:37:51.706203] I >>>>> [glusterd-handler.c:2232:__glusterd_handle_incoming_friend_req] >>>>> 0-glusterd: Received probe from uuid: >>>>> 29592d5b-242b-43b5-afc5-5f9a1496d59f >>>>> *[2015-09-11 09:37:51.708909] E [MSGID: 106010] >>>>> [glusterd-utils.c:3297:glusterd_compare_friend_volume] 0-management: >>>>> Version of Cksums public differ. local cksum = 1932535021, remote >>>>> cksum = 2474653383 on peer gfs02a-dcg.intnet.be >>>>> <http://gfs02a-dcg.intnet.be/>* >>>>> [2015-09-11 09:37:51.709026] I >>>>> [glusterd-handler.c:3367:glusterd_xfer_friend_add_resp] 0-glusterd: >>>>> Responded to gfs02a-dcg.intnet.be <http://gfs02a-dcg.intnet.be/> (0), >>>>> ret: 0 >>>>> [2015-09-11 09:37:55.537231] I >>>>> [glusterd-handler.c:1241:__glusterd_handle_cli_list_friends] >>>>> 0-glusterd: Received cli list req >>>>> >>>>> The exact same error appears while probing the second node (gfs02b). >>>>> >>>>> Anyone any idea how to solve this? >>>>> >>>>> Thanks in advance. >>>>> >>>>> Kind regards >>>>> Davy >>>>> >>>>> >>>>> _______________________________________________ >>>>> Gluster-users mailing list >>>>> Gluster-users at gluster.org >>>>> http://www.gluster.org/mailman/listinfo/gluster-users >>>> >>> >>> >>> >>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users at gluster.org >>> http://www.gluster.org/mailman/listinfo/gluster-users >>>