thr3ads.net - Gluster users - [Gluster-users] Failure migration/recovery question [Feb 2011]

If this information is useful, please help other people find it:
Share via:

Graeme Davis

2011-Feb-02 16:16 UTC

[Gluster-users] Failure migration/recovery question

Rookie question.  I've been tinkering with a 10-node 
distributed-replicated setup and I wanted to test what would happen if 1 
machine died and I had to rebuild it.

gluster> volume info all
Volume Name: data
Type: Distributed-Replicate
Status: Started
Number of Bricks: 5 x 2 = 10
Transport-type: tcp
Bricks:
Brick1: dl180-101:/data
Brick2: dl180-102:/data
Brick3: dl180-103:/data
Brick4: dl180-104:/data
Brick5: dl180-105:/data
Brick6: dl180-106:/data
Brick7: dl180-107:/data
Brick8: dl180-108:/data
Brick9: dl180-109:/data
Brick10: dl180-110:/data

I took down dl180-102 (dl180-101 is its replicate buddy) and reinstalled 
the machine, as if we had some horrible failure and just had to start 
over again.

What would be the best method to get the new 102 back in the cluster 
without data loss?  I tried to remove the 101 and 102 bricks thinking it 
would migrate the data (on 101) to other nodes but it didn't do that.  
Do I manually have to copy data from 101:/data onto the glusterfs and 
then add the 101/102 bricks and rebalance?  Could I have used 
replace-brick to move the data to other existing bricks?

Thanks,

Graeme

Andrew Séguin

2011-Feb-03 13:01 UTC

head link

[Gluster-users] Failure migration/recovery question

Hi,

I use only a replicated setup (two server), so it might be slightly
different, but I've done something similar recently.

I rebuilt the server with the same hostname / IP as the previous system and
then:

1. peer probe <server that is up>   #  (get trusted by others)
2. volume sync <server that is up>    # (get the volume configuration)
3. ls -laR /gluster-mount-point    # On a client with the volume mounted,
watch /var/log/gluster/<mount-path>.log

You could try (if you have not reused the same hostname / IP):

1. peer probe <server that is up>
2. volume sync <server that is up>
3. volume remove-brick <volume> <brick which is missing>
4. volume add-brick <volume> <brick which is replacing>
5. volume rebalance <volume> start  OR/AND  maybe ls -laR just in case...

I don't think you could use 'replace-brick' if the brick you are
removing is
already down.


As stated at the beginning, I only run a two-node replicated set up but I
hope this helps!

Regards,
Andrew


On Feb 2, 2011 5:16 PM, "Graeme Davis" <graeme at graeme.org>
wrote:

Gluster users - Feb 2011 - Failure migration/recovery question

[Gluster-users] Failure migration/recovery question

[Gluster-users] Failure migration/recovery question