Martin Toth
2019-Apr-10 09:42 UTC
[Gluster-users] Replica 3 - how to replace failed node (peer)
Hi all, I am running replica 3 gluster with 3 bricks. One of my servers failed - all disks are showing errors and raid is in fault state. Type: Replicate Volume ID: 41d5c283-3a74-4af8-a55d-924447bfa59a Status: Started Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: node1.san:/tank/gluster/gv0imagestore/brick1 Brick2: node2.san:/tank/gluster/gv0imagestore/brick1 <? this brick is down Brick3: node3.san:/tank/gluster/gv0imagestore/brick1 So one of my bricks is totally failed (node2). It went down and all data are lost (failed raid on node2). Now I am running only two bricks on 2 servers out from 3. This is really critical problem for us, we can lost all data. I want to add new disks to node2, create new raid array on them and try to replace failed brick on this node. What is the procedure of replacing Brick2 on node2, can someone advice? I can?t find anything relevant in documentation. Thanks in advance, Martin
David Spisla
2019-Apr-10 10:09 UTC
[Gluster-users] Replica 3 - how to replace failed node (peer)
Hello Martin, look here: https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.4/pdf/administration_guide/Red_Hat_Gluster_Storage-3.4-Administration_Guide-en-US.pdf on page 324. There is a manual how to replace a brick in case of a hardware failure Regards David Spisla Am Mi., 10. Apr. 2019 um 11:42 Uhr schrieb Martin Toth <snowmailer at gmail.com>:> Hi all, > > I am running replica 3 gluster with 3 bricks. One of my servers failed - > all disks are showing errors and raid is in fault state. > > Type: Replicate > Volume ID: 41d5c283-3a74-4af8-a55d-924447bfa59a > Status: Started > Number of Bricks: 1 x 3 = 3 > Transport-type: tcp > Bricks: > Brick1: node1.san:/tank/gluster/gv0imagestore/brick1 > Brick2: node2.san:/tank/gluster/gv0imagestore/brick1 <? this brick is down > Brick3: node3.san:/tank/gluster/gv0imagestore/brick1 > > So one of my bricks is totally failed (node2). It went down and all data > are lost (failed raid on node2). Now I am running only two bricks on 2 > servers out from 3. > This is really critical problem for us, we can lost all data. I want to > add new disks to node2, create new raid array on them and try to replace > failed brick on this node. > > What is the procedure of replacing Brick2 on node2, can someone advice? I > can?t find anything relevant in documentation. > > Thanks in advance, > Martin > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190410/718e869f/attachment.html>
Karthik Subrahmanya
2019-Apr-10 10:20 UTC
[Gluster-users] Replica 3 - how to replace failed node (peer)
Hi Martin, After you add the new disks and creating raid array, you can run the following command to replace the old brick with new one: - If you are going to use a different name to the new brick you can run gluster volume replace-brick <volname> <old-brick> <new-brick> commit force - If you are planning to use the same name for the new brick as well then you can use gluster volume reset-brick <volname> <old-brick> <new-brick> commit force Here old-brick & new-brick's hostname & path should be same. After replacing the brick, make sure the brick comes online using volume status. Heal should automatically start, you can check the heal status to see all the files gets replicated to the newly added brick. If it does not start automatically, you can manually start that by running gluster volume heal <volname>. HTH, Karthik On Wed, Apr 10, 2019 at 3:13 PM Martin Toth <snowmailer at gmail.com> wrote:> Hi all, > > I am running replica 3 gluster with 3 bricks. One of my servers failed - > all disks are showing errors and raid is in fault state. > > Type: Replicate > Volume ID: 41d5c283-3a74-4af8-a55d-924447bfa59a > Status: Started > Number of Bricks: 1 x 3 = 3 > Transport-type: tcp > Bricks: > Brick1: node1.san:/tank/gluster/gv0imagestore/brick1 > Brick2: node2.san:/tank/gluster/gv0imagestore/brick1 <? this brick is down > Brick3: node3.san:/tank/gluster/gv0imagestore/brick1 > > So one of my bricks is totally failed (node2). It went down and all data > are lost (failed raid on node2). Now I am running only two bricks on 2 > servers out from 3. > This is really critical problem for us, we can lost all data. I want to > add new disks to node2, create new raid array on them and try to replace > failed brick on this node. > > What is the procedure of replacing Brick2 on node2, can someone advice? I > can?t find anything relevant in documentation. > > Thanks in advance, > Martin > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190410/a2b266d9/attachment.html>