Steve Dainard
2016-Feb-25 20:02 UTC
[Gluster-users] Gluster 3.7.6 add new node state Peer Rejected (Connected)
I haven't done anything more than peer thus far, so I'm a bit confused as to how the volume info fits in, can you expand on this a bit? Failed commits? Is this split brain on the replica volumes? I don't get any return from 'gluster volume heal <volname> info' on all the replica volumes, but if I try a gluster volume heal <volname> full I get: 'Launching heal operation to perform full self heal on volume <volname> has been unsuccessful'. I have 5 volumes total. 'Replica 3' volumes running on gluster01/02/03: vm-storage iso-storage export-domain-storage env-modules And one distributed only volume 'storage' info shown below: *From existing host gluster01/02:* type=0 count=4 status=1 sub_count=0 stripe_count=1 replica_count=1 disperse_count=0 redundancy_count=0 version=25 transport-type=0 volume-id=26d355cb-c486-481f-ac16-e25390e73775 username=eb9e2063-6ba8-4d16-a54f-2c7cf7740c4c passwordop-version=3 client-op-version=3 quota-version=1 parent_volname=N/A restored_from_snap=00000000-0000-0000-0000-000000000000 snap-max-hard-limit=256 features.quota-deem-statfs=on features.inode-quota=on diagnostics.brick-log-level=WARNING features.quota=on performance.readdir-ahead=on performance.cache-size=1GB performance.stat-prefetch=on brick-0=10.0.231.50:-mnt-raid6-storage-storage brick-1=10.0.231.51:-mnt-raid6-storage-storage brick-2=10.0.231.52:-mnt-raid6-storage-storage brick-3=10.0.231.53:-mnt-raid6-storage-storage *From existing host gluster03/04:* type=0 count=4 status=1 sub_count=0 stripe_count=1 replica_count=1 disperse_count=0 redundancy_count=0 version=25 transport-type=0 volume-id=26d355cb-c486-481f-ac16-e25390e73775 username=eb9e2063-6ba8-4d16-a54f-2c7cf7740c4c passwordop-version=3 client-op-version=3 quota-version=1 parent_volname=N/A restored_from_snap=00000000-0000-0000-0000-000000000000 snap-max-hard-limit=256 features.quota-deem-statfs=on features.inode-quota=on performance.stat-prefetch=on performance.cache-size=1GB performance.readdir-ahead=on features.quota=on diagnostics.brick-log-level=WARNING brick-0=10.0.231.50:-mnt-raid6-storage-storage brick-1=10.0.231.51:-mnt-raid6-storage-storage brick-2=10.0.231.52:-mnt-raid6-storage-storage brick-3=10.0.231.53:-mnt-raid6-storage-storage So far between gluster01/02 and gluster03/04 the configs are the same, although the ordering is different for some of the features. On gluster05/06 the ordering is different again, and the quota-version=0 instead of 1. *From new hosts gluster05/gluster06:* type=0 count=4 status=1 sub_count=0 stripe_count=1 replica_count=1 disperse_count=0 redundancy_count=0 version=25 transport-type=0 volume-id=26d355cb-c486-481f-ac16-e25390e73775 username=eb9e2063-6ba8-4d16-a54f-2c7cf7740c4c passwordop-version=3 client-op-version=3 quota-version=0 parent_volname=N/A restored_from_snap=00000000-0000-0000-0000-000000000000 snap-max-hard-limit=256 performance.stat-prefetch=on performance.cache-size=1GB performance.readdir-ahead=on features.quota=on diagnostics.brick-log-level=WARNING features.inode-quota=on features.quota-deem-statfs=on brick-0=10.0.231.50:-mnt-raid6-storage-storage brick-1=10.0.231.51:-mnt-raid6-storage-storage brick-2=10.0.231.52:-mnt-raid6-storage-storage brick-3=10.0.231.53:-mnt-raid6-storage-storage Also, I forgot to mention that when I initially peer'd the two new hosts, glusterd crashed on gluster03 and had to be restarted (log attached) but has been fine since. Thanks, Steve On Thu, Feb 25, 2016 at 11:27 AM, Mohammed Rafi K C <rkavunga at redhat.com> wrote:> > > On 02/25/2016 11:45 PM, Steve Dainard wrote: > > Hello, > > I upgraded from 3.6.6 to 3.7.6 a couple weeks ago. I just peered 2 new > nodes to a 4 node cluster and gluster peer status is: > > # gluster peer status *<-- from node gluster01* > Number of Peers: 5 > > Hostname: 10.0.231.51 > Uuid: b01de59a-4428-486b-af49-cb486ab44a07 > State: Peer in Cluster (Connected) > > Hostname: 10.0.231.52 > Uuid: 75143760-52a3-4583-82bb-a9920b283dac > State: Peer in Cluster (Connected) > > Hostname: 10.0.231.53 > Uuid: 2c0b8bb6-825a-4ddd-9958-d8b46e9a2411 > State: Peer in Cluster (Connected) > > Hostname: 10.0.231.54 *<-- new node gluster05* > Uuid: 408d88d6-0448-41e8-94a3-bf9f98255d9c > *State: Peer Rejected (Connected)* > > Hostname: 10.0.231.55 *<-- new node gluster06* > Uuid: 9c155c8e-2cd1-4cfc-83af-47129b582fd3 > *State: Peer Rejected (Connected)* > > > Looks like your configuration files are mismatching, ie the checksum > calculation differs on this two node than the others, > > Did you had any failed commit ? > > Compare your /var/lib/glusterd/<volname>/info of the failed node against > good one, mostly you could see some difference. > > can you paste the /var/lib/glusterd/<volname>/info ? > > Regards > Rafi KC > > > > I followed the write-up here: > http://www.gluster.org/community/documentation/index.php/Resolving_Peer_Rejected > and the two new nodes peer'd properly but after a reboot of the two new > nodes I'm seeing the same Peer Rejected (Connected) State. > > I've attached logs from an existing node, and the two new nodes. > > Thanks for any suggestions, > Steve > > > > > _______________________________________________ > Gluster-users mailing listGluster-users at gluster.orghttp://www.gluster.org/mailman/listinfo/gluster-users > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160225/e124882e/attachment.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: etc-glusterfs-glusterd.vol.log.gluster03 Type: application/octet-stream Size: 262982 bytes Desc: not available URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160225/e124882e/attachment.obj>
Steve Dainard
2016-Feb-25 20:04 UTC
[Gluster-users] Gluster 3.7.6 add new node state Peer Rejected (Connected)
For clarity "no return" from 'gluster volume heal <volname> info': # gluster volume heal vm-storage info Brick 10.0.231.50:/mnt/lv-vm-storage/vm-storage Number of entries: 0 Brick 10.0.231.51:/mnt/lv-vm-storage/vm-storage Number of entries: 0 Brick 10.0.231.52:/mnt/lv-vm-storage/vm-storage Number of entries: 0 On Thu, Feb 25, 2016 at 12:02 PM, Steve Dainard <sdainard at spd1.com> wrote:> I haven't done anything more than peer thus far, so I'm a bit confused as > to how the volume info fits in, can you expand on this a bit? > > Failed commits? Is this split brain on the replica volumes? I don't get > any return from 'gluster volume heal <volname> info' on all the replica > volumes, but if I try a gluster volume heal <volname> full I get: > 'Launching heal operation to perform full self heal on volume <volname> has > been unsuccessful'. > > I have 5 volumes total. > > 'Replica 3' volumes running on gluster01/02/03: > vm-storage > iso-storage > export-domain-storage > env-modules > > And one distributed only volume 'storage' info shown below: > > *From existing host gluster01/02:* > type=0 > count=4 > status=1 > sub_count=0 > stripe_count=1 > replica_count=1 > disperse_count=0 > redundancy_count=0 > version=25 > transport-type=0 > volume-id=26d355cb-c486-481f-ac16-e25390e73775 > username=eb9e2063-6ba8-4d16-a54f-2c7cf7740c4c > password> op-version=3 > client-op-version=3 > quota-version=1 > parent_volname=N/A > restored_from_snap=00000000-0000-0000-0000-000000000000 > snap-max-hard-limit=256 > features.quota-deem-statfs=on > features.inode-quota=on > diagnostics.brick-log-level=WARNING > features.quota=on > performance.readdir-ahead=on > performance.cache-size=1GB > performance.stat-prefetch=on > brick-0=10.0.231.50:-mnt-raid6-storage-storage > brick-1=10.0.231.51:-mnt-raid6-storage-storage > brick-2=10.0.231.52:-mnt-raid6-storage-storage > brick-3=10.0.231.53:-mnt-raid6-storage-storage > > *From existing host gluster03/04:* > type=0 > count=4 > status=1 > sub_count=0 > stripe_count=1 > replica_count=1 > disperse_count=0 > redundancy_count=0 > version=25 > transport-type=0 > volume-id=26d355cb-c486-481f-ac16-e25390e73775 > username=eb9e2063-6ba8-4d16-a54f-2c7cf7740c4c > password> op-version=3 > client-op-version=3 > quota-version=1 > parent_volname=N/A > restored_from_snap=00000000-0000-0000-0000-000000000000 > snap-max-hard-limit=256 > features.quota-deem-statfs=on > features.inode-quota=on > performance.stat-prefetch=on > performance.cache-size=1GB > performance.readdir-ahead=on > features.quota=on > diagnostics.brick-log-level=WARNING > brick-0=10.0.231.50:-mnt-raid6-storage-storage > brick-1=10.0.231.51:-mnt-raid6-storage-storage > brick-2=10.0.231.52:-mnt-raid6-storage-storage > brick-3=10.0.231.53:-mnt-raid6-storage-storage > > So far between gluster01/02 and gluster03/04 the configs are the same, > although the ordering is different for some of the features. > > On gluster05/06 the ordering is different again, and the quota-version=0 > instead of 1. > > *From new hosts gluster05/gluster06:* > type=0 > count=4 > status=1 > sub_count=0 > stripe_count=1 > replica_count=1 > disperse_count=0 > redundancy_count=0 > version=25 > transport-type=0 > volume-id=26d355cb-c486-481f-ac16-e25390e73775 > username=eb9e2063-6ba8-4d16-a54f-2c7cf7740c4c > password> op-version=3 > client-op-version=3 > quota-version=0 > parent_volname=N/A > restored_from_snap=00000000-0000-0000-0000-000000000000 > snap-max-hard-limit=256 > performance.stat-prefetch=on > performance.cache-size=1GB > performance.readdir-ahead=on > features.quota=on > diagnostics.brick-log-level=WARNING > features.inode-quota=on > features.quota-deem-statfs=on > brick-0=10.0.231.50:-mnt-raid6-storage-storage > brick-1=10.0.231.51:-mnt-raid6-storage-storage > brick-2=10.0.231.52:-mnt-raid6-storage-storage > brick-3=10.0.231.53:-mnt-raid6-storage-storage > > Also, I forgot to mention that when I initially peer'd the two new hosts, > glusterd crashed on gluster03 and had to be restarted (log attached) but > has been fine since. > > Thanks, > Steve > > On Thu, Feb 25, 2016 at 11:27 AM, Mohammed Rafi K C <rkavunga at redhat.com> > wrote: > >> >> >> On 02/25/2016 11:45 PM, Steve Dainard wrote: >> >> Hello, >> >> I upgraded from 3.6.6 to 3.7.6 a couple weeks ago. I just peered 2 new >> nodes to a 4 node cluster and gluster peer status is: >> >> # gluster peer status *<-- from node gluster01* >> Number of Peers: 5 >> >> Hostname: 10.0.231.51 >> Uuid: b01de59a-4428-486b-af49-cb486ab44a07 >> State: Peer in Cluster (Connected) >> >> Hostname: 10.0.231.52 >> Uuid: 75143760-52a3-4583-82bb-a9920b283dac >> State: Peer in Cluster (Connected) >> >> Hostname: 10.0.231.53 >> Uuid: 2c0b8bb6-825a-4ddd-9958-d8b46e9a2411 >> State: Peer in Cluster (Connected) >> >> Hostname: 10.0.231.54 *<-- new node gluster05* >> Uuid: 408d88d6-0448-41e8-94a3-bf9f98255d9c >> *State: Peer Rejected (Connected)* >> >> Hostname: 10.0.231.55 *<-- new node gluster06* >> Uuid: 9c155c8e-2cd1-4cfc-83af-47129b582fd3 >> *State: Peer Rejected (Connected)* >> >> >> Looks like your configuration files are mismatching, ie the checksum >> calculation differs on this two node than the others, >> >> Did you had any failed commit ? >> >> Compare your /var/lib/glusterd/<volname>/info of the failed node against >> good one, mostly you could see some difference. >> >> can you paste the /var/lib/glusterd/<volname>/info ? >> >> Regards >> Rafi KC >> >> >> >> I followed the write-up here: >> http://www.gluster.org/community/documentation/index.php/Resolving_Peer_Rejected >> and the two new nodes peer'd properly but after a reboot of the two new >> nodes I'm seeing the same Peer Rejected (Connected) State. >> >> I've attached logs from an existing node, and the two new nodes. >> >> Thanks for any suggestions, >> Steve >> >> >> >> >> _______________________________________________ >> Gluster-users mailing listGluster-users at gluster.orghttp://www.gluster.org/mailman/listinfo/gluster-users >> >> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160225/d997ee0d/attachment.html>
Mohammed Rafi K C
2016-Feb-25 20:23 UTC
[Gluster-users] Gluster 3.7.6 add new node state Peer Rejected (Connected)
On 02/26/2016 01:32 AM, Steve Dainard wrote:> I haven't done anything more than peer thus far, so I'm a bit confused > as to how the volume info fits in, can you expand on this a bit? > > Failed commits? Is this split brain on the replica volumes? I don't > get any return from 'gluster volume heal <volname> info' on all the > replica volumes, but if I try a gluster volume heal <volname> full I > get: 'Launching heal operation to perform full self heal on volume > <volname> has been unsuccessful'.forget about this. it is not for metadata selfheal .> > I have 5 volumes total. > > 'Replica 3' volumes running on gluster01/02/03: > vm-storage > iso-storage > export-domain-storage > env-modules > > And one distributed only volume 'storage' info shown below: > > *From existing host gluster01/02:* > type=0 > count=4 > status=1 > sub_count=0 > stripe_count=1 > replica_count=1 > disperse_count=0 > redundancy_count=0 > version=25 > transport-type=0 > volume-id=26d355cb-c486-481f-ac16-e25390e73775 > username=eb9e2063-6ba8-4d16-a54f-2c7cf7740c4c > password> op-version=3 > client-op-version=3 > quota-version=1 > parent_volname=N/A > restored_from_snap=00000000-0000-0000-0000-000000000000 > snap-max-hard-limit=256 > features.quota-deem-statfs=on > features.inode-quota=on > diagnostics.brick-log-level=WARNING > features.quota=on > performance.readdir-ahead=on > performance.cache-size=1GB > performance.stat-prefetch=on > brick-0=10.0.231.50:-mnt-raid6-storage-storage > brick-1=10.0.231.51:-mnt-raid6-storage-storage > brick-2=10.0.231.52:-mnt-raid6-storage-storage > brick-3=10.0.231.53:-mnt-raid6-storage-storage > > *From existing host gluster03/04:* > type=0 > count=4 > status=1 > sub_count=0 > stripe_count=1 > replica_count=1 > disperse_count=0 > redundancy_count=0 > version=25 > transport-type=0 > volume-id=26d355cb-c486-481f-ac16-e25390e73775 > username=eb9e2063-6ba8-4d16-a54f-2c7cf7740c4c > password> op-version=3 > client-op-version=3 > quota-version=1 > parent_volname=N/A > restored_from_snap=00000000-0000-0000-0000-000000000000 > snap-max-hard-limit=256 > features.quota-deem-statfs=on > features.inode-quota=on > performance.stat-prefetch=on > performance.cache-size=1GB > performance.readdir-ahead=on > features.quota=on > diagnostics.brick-log-level=WARNING > brick-0=10.0.231.50:-mnt-raid6-storage-storage > brick-1=10.0.231.51:-mnt-raid6-storage-storage > brick-2=10.0.231.52:-mnt-raid6-storage-storage > brick-3=10.0.231.53:-mnt-raid6-storage-storage > > So far between gluster01/02 and gluster03/04 the configs are the same, > although the ordering is different for some of the features. > > On gluster05/06 the ordering is different again, and the > quota-version=0 instead of 1.This is why the peer shows as rejected. Can you check the op-version of all the glusterd including the one which is in reject state. you can find out the op-version here in /var/lib/glusterd/glusterd.info Rafi KC> > *From new hosts gluster05/gluster06:* > type=0 > count=4 > status=1 > sub_count=0 > stripe_count=1 > replica_count=1 > disperse_count=0 > redundancy_count=0 > version=25 > transport-type=0 > volume-id=26d355cb-c486-481f-ac16-e25390e73775 > username=eb9e2063-6ba8-4d16-a54f-2c7cf7740c4c > password> op-version=3 > client-op-version=3 > quota-version=0 > parent_volname=N/A > restored_from_snap=00000000-0000-0000-0000-000000000000 > snap-max-hard-limit=256 > performance.stat-prefetch=on > performance.cache-size=1GB > performance.readdir-ahead=on > features.quota=on > diagnostics.brick-log-level=WARNING > features.inode-quota=on > features.quota-deem-statfs=on > brick-0=10.0.231.50:-mnt-raid6-storage-storage > brick-1=10.0.231.51:-mnt-raid6-storage-storage > brick-2=10.0.231.52:-mnt-raid6-storage-storage > brick-3=10.0.231.53:-mnt-raid6-storage-storage > > Also, I forgot to mention that when I initially peer'd the two new > hosts, glusterd crashed on gluster03 and had to be restarted (log > attached) but has been fine since. > > Thanks, > Steve > > On Thu, Feb 25, 2016 at 11:27 AM, Mohammed Rafi K C > <rkavunga at redhat.com <mailto:rkavunga at redhat.com>> wrote: > > > > On 02/25/2016 11:45 PM, Steve Dainard wrote: >> Hello, >> >> I upgraded from 3.6.6 to 3.7.6 a couple weeks ago. I just peered >> 2 new nodes to a 4 node cluster and gluster peer status is: >> >> # gluster peer status *<-- from node gluster01* >> Number of Peers: 5 >> >> Hostname: 10.0.231.51 >> Uuid: b01de59a-4428-486b-af49-cb486ab44a07 >> State: Peer in Cluster (Connected) >> >> Hostname: 10.0.231.52 >> Uuid: 75143760-52a3-4583-82bb-a9920b283dac >> State: Peer in Cluster (Connected) >> >> Hostname: 10.0.231.53 >> Uuid: 2c0b8bb6-825a-4ddd-9958-d8b46e9a2411 >> State: Peer in Cluster (Connected) >> >> Hostname: 10.0.231.54 *<-- new node gluster05* >> Uuid: 408d88d6-0448-41e8-94a3-bf9f98255d9c >> *State: Peer Rejected (Connected)* >> >> Hostname: 10.0.231.55 *<-- new node gluster06* >> Uuid: 9c155c8e-2cd1-4cfc-83af-47129b582fd3 >> *State: Peer Rejected (Connected)* > > Looks like your configuration files are mismatching, ie the > checksum calculation differs on this two node than the others, > > Did you had any failed commit ? > > Compare your /var/lib/glusterd/<volname>/info of the failed node > against good one, mostly you could see some difference. > > can you paste the /var/lib/glusterd/<volname>/info ? > > Regards > Rafi KC > > >> * >> * >> I followed the write-up >> here: http://www.gluster.org/community/documentation/index.php/Resolving_Peer_Rejected >> and the two new nodes peer'd properly but after a reboot of the >> two new nodes I'm seeing the same Peer Rejected (Connected) State. >> >> I've attached logs from an existing node, and the two new nodes. >> >> Thanks for any suggestions, >> Steve >> >> >> >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org <mailto:Gluster-users at gluster.org> >> http://www.gluster.org/mailman/listinfo/gluster-users > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160226/112aad19/attachment.html>