Mohammed Rafi K C
2016-Feb-25 20:23 UTC
[Gluster-users] Gluster 3.7.6 add new node state Peer Rejected (Connected)
On 02/26/2016 01:32 AM, Steve Dainard wrote:> I haven't done anything more than peer thus far, so I'm a bit confused > as to how the volume info fits in, can you expand on this a bit? > > Failed commits? Is this split brain on the replica volumes? I don't > get any return from 'gluster volume heal <volname> info' on all the > replica volumes, but if I try a gluster volume heal <volname> full I > get: 'Launching heal operation to perform full self heal on volume > <volname> has been unsuccessful'.forget about this. it is not for metadata selfheal .> > I have 5 volumes total. > > 'Replica 3' volumes running on gluster01/02/03: > vm-storage > iso-storage > export-domain-storage > env-modules > > And one distributed only volume 'storage' info shown below: > > *From existing host gluster01/02:* > type=0 > count=4 > status=1 > sub_count=0 > stripe_count=1 > replica_count=1 > disperse_count=0 > redundancy_count=0 > version=25 > transport-type=0 > volume-id=26d355cb-c486-481f-ac16-e25390e73775 > username=eb9e2063-6ba8-4d16-a54f-2c7cf7740c4c > password> op-version=3 > client-op-version=3 > quota-version=1 > parent_volname=N/A > restored_from_snap=00000000-0000-0000-0000-000000000000 > snap-max-hard-limit=256 > features.quota-deem-statfs=on > features.inode-quota=on > diagnostics.brick-log-level=WARNING > features.quota=on > performance.readdir-ahead=on > performance.cache-size=1GB > performance.stat-prefetch=on > brick-0=10.0.231.50:-mnt-raid6-storage-storage > brick-1=10.0.231.51:-mnt-raid6-storage-storage > brick-2=10.0.231.52:-mnt-raid6-storage-storage > brick-3=10.0.231.53:-mnt-raid6-storage-storage > > *From existing host gluster03/04:* > type=0 > count=4 > status=1 > sub_count=0 > stripe_count=1 > replica_count=1 > disperse_count=0 > redundancy_count=0 > version=25 > transport-type=0 > volume-id=26d355cb-c486-481f-ac16-e25390e73775 > username=eb9e2063-6ba8-4d16-a54f-2c7cf7740c4c > password> op-version=3 > client-op-version=3 > quota-version=1 > parent_volname=N/A > restored_from_snap=00000000-0000-0000-0000-000000000000 > snap-max-hard-limit=256 > features.quota-deem-statfs=on > features.inode-quota=on > performance.stat-prefetch=on > performance.cache-size=1GB > performance.readdir-ahead=on > features.quota=on > diagnostics.brick-log-level=WARNING > brick-0=10.0.231.50:-mnt-raid6-storage-storage > brick-1=10.0.231.51:-mnt-raid6-storage-storage > brick-2=10.0.231.52:-mnt-raid6-storage-storage > brick-3=10.0.231.53:-mnt-raid6-storage-storage > > So far between gluster01/02 and gluster03/04 the configs are the same, > although the ordering is different for some of the features. > > On gluster05/06 the ordering is different again, and the > quota-version=0 instead of 1.This is why the peer shows as rejected. Can you check the op-version of all the glusterd including the one which is in reject state. you can find out the op-version here in /var/lib/glusterd/glusterd.info Rafi KC> > *From new hosts gluster05/gluster06:* > type=0 > count=4 > status=1 > sub_count=0 > stripe_count=1 > replica_count=1 > disperse_count=0 > redundancy_count=0 > version=25 > transport-type=0 > volume-id=26d355cb-c486-481f-ac16-e25390e73775 > username=eb9e2063-6ba8-4d16-a54f-2c7cf7740c4c > password> op-version=3 > client-op-version=3 > quota-version=0 > parent_volname=N/A > restored_from_snap=00000000-0000-0000-0000-000000000000 > snap-max-hard-limit=256 > performance.stat-prefetch=on > performance.cache-size=1GB > performance.readdir-ahead=on > features.quota=on > diagnostics.brick-log-level=WARNING > features.inode-quota=on > features.quota-deem-statfs=on > brick-0=10.0.231.50:-mnt-raid6-storage-storage > brick-1=10.0.231.51:-mnt-raid6-storage-storage > brick-2=10.0.231.52:-mnt-raid6-storage-storage > brick-3=10.0.231.53:-mnt-raid6-storage-storage > > Also, I forgot to mention that when I initially peer'd the two new > hosts, glusterd crashed on gluster03 and had to be restarted (log > attached) but has been fine since. > > Thanks, > Steve > > On Thu, Feb 25, 2016 at 11:27 AM, Mohammed Rafi K C > <rkavunga at redhat.com <mailto:rkavunga at redhat.com>> wrote: > > > > On 02/25/2016 11:45 PM, Steve Dainard wrote: >> Hello, >> >> I upgraded from 3.6.6 to 3.7.6 a couple weeks ago. I just peered >> 2 new nodes to a 4 node cluster and gluster peer status is: >> >> # gluster peer status *<-- from node gluster01* >> Number of Peers: 5 >> >> Hostname: 10.0.231.51 >> Uuid: b01de59a-4428-486b-af49-cb486ab44a07 >> State: Peer in Cluster (Connected) >> >> Hostname: 10.0.231.52 >> Uuid: 75143760-52a3-4583-82bb-a9920b283dac >> State: Peer in Cluster (Connected) >> >> Hostname: 10.0.231.53 >> Uuid: 2c0b8bb6-825a-4ddd-9958-d8b46e9a2411 >> State: Peer in Cluster (Connected) >> >> Hostname: 10.0.231.54 *<-- new node gluster05* >> Uuid: 408d88d6-0448-41e8-94a3-bf9f98255d9c >> *State: Peer Rejected (Connected)* >> >> Hostname: 10.0.231.55 *<-- new node gluster06* >> Uuid: 9c155c8e-2cd1-4cfc-83af-47129b582fd3 >> *State: Peer Rejected (Connected)* > > Looks like your configuration files are mismatching, ie the > checksum calculation differs on this two node than the others, > > Did you had any failed commit ? > > Compare your /var/lib/glusterd/<volname>/info of the failed node > against good one, mostly you could see some difference. > > can you paste the /var/lib/glusterd/<volname>/info ? > > Regards > Rafi KC > > >> * >> * >> I followed the write-up >> here: http://www.gluster.org/community/documentation/index.php/Resolving_Peer_Rejected >> and the two new nodes peer'd properly but after a reboot of the >> two new nodes I'm seeing the same Peer Rejected (Connected) State. >> >> I've attached logs from an existing node, and the two new nodes. >> >> Thanks for any suggestions, >> Steve >> >> >> >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org <mailto:Gluster-users at gluster.org> >> http://www.gluster.org/mailman/listinfo/gluster-users > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160226/112aad19/attachment.html>
Mohammed Rafi K C
2016-Feb-25 20:49 UTC
[Gluster-users] Gluster 3.7.6 add new node state Peer Rejected (Connected)
On 02/26/2016 01:53 AM, Mohammed Rafi K C wrote:> > > On 02/26/2016 01:32 AM, Steve Dainard wrote: >> I haven't done anything more than peer thus far, so I'm a bit >> confused as to how the volume info fits in, can you expand on this a bit? >> >> Failed commits? Is this split brain on the replica volumes? I don't >> get any return from 'gluster volume heal <volname> info' on all the >> replica volumes, but if I try a gluster volume heal <volname> full I >> get: 'Launching heal operation to perform full self heal on volume >> <volname> has been unsuccessful'. > > forget about this. it is not for metadata selfheal . > >> >> I have 5 volumes total. >> >> 'Replica 3' volumes running on gluster01/02/03: >> vm-storage >> iso-storage >> export-domain-storage >> env-modules >> >> And one distributed only volume 'storage' info shown below: >> >> *From existing host gluster01/02:* >> type=0 >> count=4 >> status=1 >> sub_count=0 >> stripe_count=1 >> replica_count=1 >> disperse_count=0 >> redundancy_count=0 >> version=25 >> transport-type=0 >> volume-id=26d355cb-c486-481f-ac16-e25390e73775 >> username=eb9e2063-6ba8-4d16-a54f-2c7cf7740c4c >> password>> op-version=3 >> client-op-version=3 >> quota-version=1 >> parent_volname=N/A >> restored_from_snap=00000000-0000-0000-0000-000000000000 >> snap-max-hard-limit=256 >> features.quota-deem-statfs=on >> features.inode-quota=on >> diagnostics.brick-log-level=WARNING >> features.quota=on >> performance.readdir-ahead=on >> performance.cache-size=1GB >> performance.stat-prefetch=on >> brick-0=10.0.231.50:-mnt-raid6-storage-storage >> brick-1=10.0.231.51:-mnt-raid6-storage-storage >> brick-2=10.0.231.52:-mnt-raid6-storage-storage >> brick-3=10.0.231.53:-mnt-raid6-storage-storage >> >> *From existing host gluster03/04:* >> type=0 >> count=4 >> status=1 >> sub_count=0 >> stripe_count=1 >> replica_count=1 >> disperse_count=0 >> redundancy_count=0 >> version=25 >> transport-type=0 >> volume-id=26d355cb-c486-481f-ac16-e25390e73775 >> username=eb9e2063-6ba8-4d16-a54f-2c7cf7740c4c >> password>> op-version=3 >> client-op-version=3 >> quota-version=1 >> parent_volname=N/A >> restored_from_snap=00000000-0000-0000-0000-000000000000 >> snap-max-hard-limit=256 >> features.quota-deem-statfs=on >> features.inode-quota=on >> performance.stat-prefetch=on >> performance.cache-size=1GB >> performance.readdir-ahead=on >> features.quota=on >> diagnostics.brick-log-level=WARNING >> brick-0=10.0.231.50:-mnt-raid6-storage-storage >> brick-1=10.0.231.51:-mnt-raid6-storage-storage >> brick-2=10.0.231.52:-mnt-raid6-storage-storage >> brick-3=10.0.231.53:-mnt-raid6-storage-storage >> >> So far between gluster01/02 and gluster03/04 the configs are the >> same, although the ordering is different for some of the features. >> >> On gluster05/06 the ordering is different again, and the >> quota-version=0 instead of 1. > > This is why the peer shows as rejected. Can you check the op-version > of all the glusterd including the one which is in reject state. you > can find out the op-version here in /var/lib/glusterd/glusterd.infoIf all the op-version are same and 3.7.6, then to work-around the issue, you can manually make it quota-version=1, and restarting the glusterd will solve the problem, But I would strongly recommend you to figure out the RCA. May be you can file a bug for this. Rafi> > Rafi KC > >> >> *From new hosts gluster05/gluster06:* >> type=0 >> count=4 >> status=1 >> sub_count=0 >> stripe_count=1 >> replica_count=1 >> disperse_count=0 >> redundancy_count=0 >> version=25 >> transport-type=0 >> volume-id=26d355cb-c486-481f-ac16-e25390e73775 >> username=eb9e2063-6ba8-4d16-a54f-2c7cf7740c4c >> password>> op-version=3 >> client-op-version=3 >> quota-version=0 >> parent_volname=N/A >> restored_from_snap=00000000-0000-0000-0000-000000000000 >> snap-max-hard-limit=256 >> performance.stat-prefetch=on >> performance.cache-size=1GB >> performance.readdir-ahead=on >> features.quota=on >> diagnostics.brick-log-level=WARNING >> features.inode-quota=on >> features.quota-deem-statfs=on >> brick-0=10.0.231.50:-mnt-raid6-storage-storage >> brick-1=10.0.231.51:-mnt-raid6-storage-storage >> brick-2=10.0.231.52:-mnt-raid6-storage-storage >> brick-3=10.0.231.53:-mnt-raid6-storage-storage >> >> Also, I forgot to mention that when I initially peer'd the two new >> hosts, glusterd crashed on gluster03 and had to be restarted (log >> attached) but has been fine since. >> >> Thanks, >> Steve >> >> On Thu, Feb 25, 2016 at 11:27 AM, Mohammed Rafi K C >> <rkavunga at redhat.com <mailto:rkavunga at redhat.com>> wrote: >> >> >> >> On 02/25/2016 11:45 PM, Steve Dainard wrote: >>> Hello, >>> >>> I upgraded from 3.6.6 to 3.7.6 a couple weeks ago. I just peered >>> 2 new nodes to a 4 node cluster and gluster peer status is: >>> >>> # gluster peer status *<-- from node gluster01* >>> Number of Peers: 5 >>> >>> Hostname: 10.0.231.51 >>> Uuid: b01de59a-4428-486b-af49-cb486ab44a07 >>> State: Peer in Cluster (Connected) >>> >>> Hostname: 10.0.231.52 >>> Uuid: 75143760-52a3-4583-82bb-a9920b283dac >>> State: Peer in Cluster (Connected) >>> >>> Hostname: 10.0.231.53 >>> Uuid: 2c0b8bb6-825a-4ddd-9958-d8b46e9a2411 >>> State: Peer in Cluster (Connected) >>> >>> Hostname: 10.0.231.54 *<-- new node gluster05* >>> Uuid: 408d88d6-0448-41e8-94a3-bf9f98255d9c >>> *State: Peer Rejected (Connected)* >>> >>> Hostname: 10.0.231.55 *<-- new node gluster06* >>> Uuid: 9c155c8e-2cd1-4cfc-83af-47129b582fd3 >>> *State: Peer Rejected (Connected)* >> >> Looks like your configuration files are mismatching, ie the >> checksum calculation differs on this two node than the others, >> >> Did you had any failed commit ? >> >> Compare your /var/lib/glusterd/<volname>/info of the failed node >> against good one, mostly you could see some difference. >> >> can you paste the /var/lib/glusterd/<volname>/info ? >> >> Regards >> Rafi KC >> >> >>> * >>> * >>> I followed the write-up >>> here: http://www.gluster.org/community/documentation/index.php/Resolving_Peer_Rejected >>> and the two new nodes peer'd properly but after a reboot of the >>> two new nodes I'm seeing the same Peer Rejected (Connected) State. >>> >>> I've attached logs from an existing node, and the two new nodes. >>> >>> Thanks for any suggestions, >>> Steve >>> >>> >>> >>> >>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users at gluster.org <mailto:Gluster-users at gluster.org> >>> http://www.gluster.org/mailman/listinfo/gluster-users >> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160226/3c764e75/attachment.html>