Tony Schreiner
2016-Sep-22 14:16 UTC
[Gluster-users] Recovering lost node in dispersed volume
I set uo a dispersed volume with 1 x (3 + 1) nodes ( i do know that 3+1 is not optimal). Originally created in version 3.7 but recently upgraded without issue to 3.8. # gluster vol info Volume Name: rvol Type: Disperse Volume ID: e8f15248-d9de-458e-9896-f1a5782dcf74 Status: Started Snapshot Count: 0 Number of Bricks: 1 x (3 + 1) = 4 Transport-type: tcp Bricks: Brick1: calliope:/brick/p1 Brick2: euterpe:/brick/p1 Brick3: lemans:/brick/p1 Brick4: thalia:/brick/p1 Options Reconfigured: performance.readdir-ahead: on nfs.disable: off I inadvertently allowed one of the nodes (thalia) to be reinstalled; which overwrote the system, but not the brick, and I need guidance in getting it back into the volume. (on lemans) gluster peer status Number of Peers: 3 Hostname: calliope Uuid: 72373eb1-8047-405a-a094-891e559755da State: Peer in Cluster (Connected) Hostname: euterpe Uuid: 9fafa5c4-1541-4aa0-9ea2-923a756cadbb State: Peer in Cluster (Connected) Hostname: thalia Uuid: 843169fa-3937-42de-8fda-9819efc75fe8 State: Peer Rejected (Connected) the thalia peer is rejected. If I try to peer probe thalia I am told it already part of the pool. If from thalia, I try to peer probe one of the others, I am told that they are already part of another pool. I have tried removing the thalia brick with gluster vol remove-brick rvol thalia:/brick/p1 start but get the error volume remove-brick start: failed: Remove brick incorrect brick count of 1 for disperse 4 I am not finding much guidance for this particular situation. I could use a suggestion on how to recover. It's a lab situation so no biggie if I lose it. Cheers Tony Schreiner -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160922/16e32a00/attachment.html>
Serkan Çoban
2016-Sep-22 15:05 UTC
[Gluster-users] Recovering lost node in dispersed volume
Here are the steps for replacing a failed node: 1- In one of the other servers run "grep thaila /var/lib/glusterd/peers/* | cut -d: -f1 | cut -d/ -f6" and note the UUID 2- stop glusterd on failed server and add "UUID=uuid_from_previous step" to /var/lib/glusterd/glusterd.info and start glusterd 3- run "gluster peer probe calliope" 4- restart glusterd 5- now gluster peer status should show all the peers. if not probe them manually as above. 6-for all the bricks run the command "setfattr -n trusted.glusterfs.volume-id -v 0x$(grep volume-id /var/lib/glusterd/vols/vol_name/info | cut -d= -f2 | sed 's/-//g') brick_name" 7 restart glusterd and everythimg should be fine. I think I read the steps from this link: https://support.rackspace.com/how-to/recover-from-a-failed-server-in-a-glusterfs-array/ Look to the "keep the ip address" part. On Thu, Sep 22, 2016 at 5:16 PM, Tony Schreiner <anthony.schreiner at bc.edu> wrote:> I set uo a dispersed volume with 1 x (3 + 1) nodes ( i do know that 3+1 is > not optimal). > Originally created in version 3.7 but recently upgraded without issue to > 3.8. > > # gluster vol info > Volume Name: rvol > Type: Disperse > Volume ID: e8f15248-d9de-458e-9896-f1a5782dcf74 > Status: Started > Snapshot Count: 0 > Number of Bricks: 1 x (3 + 1) = 4 > Transport-type: tcp > Bricks: > Brick1: calliope:/brick/p1 > Brick2: euterpe:/brick/p1 > Brick3: lemans:/brick/p1 > Brick4: thalia:/brick/p1 > Options Reconfigured: > performance.readdir-ahead: on > nfs.disable: off > > I inadvertently allowed one of the nodes (thalia) to be reinstalled; which > overwrote the system, but not the brick, and I need guidance in getting it > back into the volume. > > (on lemans) > gluster peer status > Number of Peers: 3 > > Hostname: calliope > Uuid: 72373eb1-8047-405a-a094-891e559755da > State: Peer in Cluster (Connected) > > Hostname: euterpe > Uuid: 9fafa5c4-1541-4aa0-9ea2-923a756cadbb > State: Peer in Cluster (Connected) > > Hostname: thalia > Uuid: 843169fa-3937-42de-8fda-9819efc75fe8 > State: Peer Rejected (Connected) > > the thalia peer is rejected. If I try to peer probe thalia I am told it > already part of the pool. If from thalia, I try to peer probe one of the > others, I am told that they are already part of another pool. > > I have tried removing the thalia brick with > gluster vol remove-brick rvol thalia:/brick/p1 start > but get the error > volume remove-brick start: failed: Remove brick incorrect brick count of 1 > for disperse 4 > > I am not finding much guidance for this particular situation. I could use a > suggestion on how to recover. It's a lab situation so no biggie if I lose > it. > Cheers > > Tony Schreiner > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://www.gluster.org/mailman/listinfo/gluster-users