thr3ads.net - Gluster users - [Gluster-users] How to recover after one node breakdown [Mar 2016]

If this information is useful, please help other people find it:
Share via:

Gaurav Garg

2016-Mar-17 06:51 UTC

[Gluster-users] How to recover after one node breakdown

>> Could I run some glusterfs command on good node to recover the
replicate volume? if I don't copy the files ,including  glusterd.info and
other files,from good node to new node.

running glusterfs command is not enough to recover the replicate volume. for
recovery you need to follow following steps.

1) remove /var/lib/glusterd/* data from new node (if in any case it present)
then start glusterd on new node
1) kill glusterd on new node
2) from 1st node (which is in good condition) execute #gluster peer status
command and copy the uuid from the peer status (you will see one failed node
entry with hostname and uuid) and replace this UUID in node file
/var/lib/glusterd/glusterd.info
from the 1st node you can also get the uuid of failed node by doing #cat
/var/lib/glusterd/peer/* it will show uuid of failed node along with
hostname/ip-address of failed node.
3) copy /var/lib/glusterd/peers/* to new node.
4) you need to rename one of the /var/lib/glusterd/peers/* file (you can find
that file in new node by just matching the uuid of new node
(/var/lib/glusterd/glusterd.info) with /var/lib/glusterd/peers/* file name) with
the uuid of 1st node (/var/lib/glusterd/glusterd.info) and modify the content of
the same file with having uuid of 1st node and hostname of 1st node.
5) now start glusterd on new node.
6) your volume will recover.


above steps are mandatory steps to recover failed node.

Thanks,

Regards,
Gaurav

----- Original Message -----
From: "songxin" <songxin_1980 at 126.com>
To: "Alastair Neil" <ajneil.tech at gmail.com>
Cc: gluster-users at gluster.org
Sent: Thursday, March 17, 2016 8:56:58 AM
Subject: Re: [Gluster-users] How to recover after one node breakdown

Thank you very much for your reply. 

In fact it is that I use a new node ,of which rootfs is new , to replace the
failed node.
And the new node has same IP address with the failed one. 

The brick is on a external hard disk.Because the hard disk is mounted on the
node ?so the data on the brick of failed node will not be loss but may be async
with the brick of good node.And the brick of failed node will be mounted on the
new node.

Now my recovery steps is run some glusterfs command on good node as below, after
starting the glusterd on new node.
1.remove brick of new node from volume(the volume type is changed from replicate
to distribute)
2.peer detach the new node ip(the new node ip is same as failed node) 
3.peer probe the new node ip 
3.add brick of new node to volume(the volume type is change to replicate) 

But many problem,like data async or peer state is error etc, will happen. 

My question is below. 

Could I run some glusterfs command on good node to recover the replicate volume?
if I don't copy the files ,including glusterd.info and other files,from good
node to new node.

Thanks 
Xin 




???? iPhone 

? 2016?3?17??04:54?Alastair Neil < ajneil.tech at gmail.com > ??? 




hopefully you have a back up of /var/lib/glusterd/ glusterd.info and
/var/lib/glusterd/peers, if so I think you can copy them back to and restart
glusterd and the volume info should get populated from the other node. If not
you can probably reconstruct these from these files on the other node.

i.e: 
On the unaffected node the peers directory should have an entry for the failed
node containing the uuid of the failed node. The glusterd.info file should
enable you to recreate the peer file on the failed node.


On 16 March 2016 at 09:25, songxin < songxin_1980 at 126.com > wrote: 



Hi, 
Now I face a problem. 
Reproduc step is as below. 
1.I create a replicate volume using two brick on two board 
2.start the volume 
3.one board is breakdown and all 
files in the rootfs ,including /var/lib/glusterd/*,are lost. 
4.reboot the board and ip is not change. 

My question: 
How to recovery the replicate volume? 

Thanks, 
Xin 




_______________________________________________ 
Gluster-users mailing list 
Gluster-users at gluster.org 
http://www.gluster.org/mailman/listinfo/gluster-users 


_______________________________________________
Gluster-users mailing list
Gluster-users at gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

songxin

2016-Mar-18 01:56 UTC

head link

[Gluster-users] How to recover after one node breakdown

Hi,

Thank you for your reply.
And I found a link that is about how to use a server to replace a old one as
below.


http://www.gluster.org/community/documentation/index.php/Gluster_3.4:_Brick_Restoration_-_Replace_Crashed_Server


My question is following.
1.I found that your recovery step is a little different with the link about sync
the volume.I want to know if both methods are  right for recovering the
replicate volume?
2.Is this link is just fit for glusterfs 3.4?My glusterfs version is 3.7.6.Is
the the link fit the glusterfs 3.7.6?


Thanks,
Xin






At 2016-03-17 14:51:34, "Gaurav Garg" <ggarg at redhat.com>
wrote:>>> Could I run some glusterfs command on good node to recover the
replicate volume? if I don't copy the files ,including  glusterd.info and
other files,from good node to new node.
>
>
>running glusterfs command is not enough to recover the replicate volume. for
recovery you need to follow following steps.
>
>1) remove /var/lib/glusterd/* data from new node (if in any case it present)
then start glusterd on new node
>1) kill glusterd on new node
>2) from 1st node (which is in good condition) execute #gluster peer status
command and copy the uuid from the peer status (you will see one failed node
entry with hostname and uuid) and replace this UUID in node file
/var/lib/glusterd/glusterd.info
>from the 1st node you can also get the uuid of failed node by doing #cat
/var/lib/glusterd/peer/* it will show uuid of failed node along with
hostname/ip-address of failed node.
>3) copy /var/lib/glusterd/peers/* to new node.
>4) you need to rename one of the /var/lib/glusterd/peers/* file (you can
find that file in new node by just matching the uuid of new node
(/var/lib/glusterd/glusterd.info) with /var/lib/glusterd/peers/* file name) with
the uuid of 1st node (/var/lib/glusterd/glusterd.info) and modify the content of
the same file with having uuid of 1st node and hostname of 1st node.
>5) now start glusterd on new node.
>6) your volume will recover.
>>
>above steps are mandatory steps to recover failed node.
>
>Thanks,
>
>Regards,
>Gaurav
>
>----- Original Message -----
>From: "songxin" <songxin_1980 at 126.com>
>To: "Alastair Neil" <ajneil.tech at gmail.com>
>Cc: gluster-users at gluster.org
>Sent: Thursday, March 17, 2016 8:56:58 AM
>Subject: Re: [Gluster-users] How to recover after one node breakdown
>
>Thank you very much for your reply. 
>
>In fact it is that I use a new node ,of which rootfs is new , to replace the
failed node.
>And the new node has same IP address with the failed one. 
>
>The brick is on a external hard disk.Because the hard disk is mounted on the
node ?so the data on the brick of failed node will not be loss but may be async
with the brick of good node.And the brick of failed node will be mounted on the
new node.
>
>Now my recovery steps is run some glusterfs command on good node as below,
after starting the glusterd on new node.
>1.remove brick of new node from volume(the volume type is changed from
replicate to distribute)
>2.peer detach the new node ip(the new node ip is same as failed node) 
>3.peer probe the new node ip 
>3.add brick of new node to volume(the volume type is change to replicate) 
>
>But many problem,like data async or peer state is error etc, will happen. 
>
>My question is below. 
>
>Could I run some glusterfs command on good node to recover the replicate
volume? if I don't copy the files ,including glusterd.info and other
files,from good node to new node.
>
>Thanks 
>Xin 
>
>
>
>
>???? iPhone 
>
>? 2016?3?17??04:54?Alastair Neil < ajneil.tech at gmail.com > ??? 
>
>
>
>
>hopefully you have a back up of /var/lib/glusterd/ glusterd.info and
/var/lib/glusterd/peers, if so I think you can copy them back to and restart
glusterd and the volume info should get populated from the other node. If not
you can probably reconstruct these from these files on the other node.
>
>i.e: 
>On the unaffected node the peers directory should have an entry for the
failed node containing the uuid of the failed node. The glusterd.info file
should enable you to recreate the peer file on the failed node.
>
>
>On 16 March 2016 at 09:25, songxin < songxin_1980 at 126.com > wrote: 
>
>
>
>Hi, 
>Now I face a problem. 
>Reproduc step is as below. 
>1.I create a replicate volume using two brick on two board 
>2.start the volume 
>3.one board is breakdown and all 
>files in the rootfs ,including /var/lib/glusterd/*,are lost. 
>4.reboot the board and ip is not change. 
>
>My question: 
>How to recovery the replicate volume? 
>
>Thanks, 
>Xin 
>
>
>
>
>_______________________________________________ 
>Gluster-users mailing list 
>Gluster-users at gluster.org 
>http://www.gluster.org/mailman/listinfo/gluster-users 
>
>
>_______________________________________________
>Gluster-users mailing list
>Gluster-users at gluster.org
>http://www.gluster.org/mailman/listinfo/gluster-users-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20160318/74188877/attachment.html>

Gluster users - Mar 2016 - How to recover after one node breakdown

[Gluster-users] How to recover after one node breakdown

[Gluster-users] How to recover after one node breakdown