thr3ads.net - Gluster users - [Gluster-users] Need to replace a brick on a failed first Gluster node [Jan 2012]

If this information is useful, please help other people find it:
Share via:

Greg Scott

2012-Jan-22 11:00 UTC

head link

[Gluster-users] Need to replace a brick on a failed first Gluster node

Hello -

I am using Glusterfs 3.2.5-2. I have one very small replicated volume
with 2 bricks, as follows:

[root at lme-fw2 ~]# gluster volume info

Volume Name: firewall-scripts

Type: Replicate

Status: Started

Number of Bricks: 2

Transport-type: tcp

Bricks:

Brick1: 192.168.253.1:/gluster-fw1

Brick2: 192.168.253.2:/gluster-fw2

The application is a small active/standby HA appliance and I use the
Gluster volume for config info. The Gluster nodes are also clients and
there are no other clients. Fortunately for me, nothing is in
production yet.

My challenge is, the hard drive at 192.168.253.1 failed. This was the
first Gluster node when I set everything up. I replaced its hard drive
and am rebuilding it. I have a good copy of everything I care about in
the 192.168.253.2 brick. My thought was, I could just remove the old
192.168.253.1 brick and replica, then gluster peer and add it all back
again.

But apparently not so simple:

[root at lme-fw2 ~]# gluster volume remove-brick firewall-scripts
192.168.253.1:/gluster-fw1

Removing brick(s) can result in data loss. Do you want to Continue?
(y/n) y

Incorrect brick 192.168.253.1:/gluster-fw1 for volume firewall-scripts

Not particularly helpful diagnostic info. I also played around with
gluster peer detach/attach, but now I think I may have created a mess:

[root at lme-fw2 ~]# gluster peer probe 192.168.253.1

[root at lme-fw2 ~]# gluster peer status

Number of Peers: 1

Hostname: 192.168.253.1

Uuid: 00000000-0000-0000-0000-000000000000

State: Establishing Connection (Disconnected)

[root at lme-fw2 ~]#

Trying again:

[root at lme-fw2 ~]# gluster peer detach 192.168.253.1

Detach successful

[root at lme-fw2 ~]# gluster peer status

No peers present

[root at lme-fw2 ~]# gluster volume info

Volume Name: firewall-scripts

Type: Replicate

Status: Started

Number of Bricks: 2

Transport-type: tcp

Bricks:

Brick1: 192.168.253.1:/gluster-fw1

Brick2: 192.168.253.2:/gluster-fw2

[root at lme-fw2 ~]# gluster volume remove-brick firewall-scripts
192.168.253.1:/gluster-fw1

Removing brick(s) can result in data loss. Do you want to Continue?
(y/n) y

Incorrect brick 192.168.253.1:/gluster-fw1 for volume firewall-scripts

[root at lme-fw2 ~]#

This should be simple and maybe I am missing something. On the fw2
Gluster node, I want to remove all trace of the old fw1 and then set up
a new fw1 as a new replica. How do I get there from here? Also, once
this goes into production, I will not have the luxury of taking
everything offline and rebuilding it. What is the best way to recover
from a hard drive failure on either node?

Thanks

- Greg Scott

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://supercolony.gluster.org/pipermail/gluster-users/attachments/20120122/546e0c1c/attachment.html>

Giovanni Toraldo

2012-Jan-22 11:34 UTC

head link

[Gluster-users] Need to replace a brick on a failed first Gluster node

Hi Greg,

2012/1/22 Greg Scott <GregScott at
infrasupport.com>:> My challenge is, the hard drive at 192.168.253.1 failed. ?This was the
first
> Gluster node when I set everything up.? ?I replaced its hard drive and am
> rebuilding it.? I have a good copy of everything I care about in the
> 192.168.253.2 brick.? ?My thought was, I could just remove the old
> 192.168.253.1 brick and replica, then gluster peer and add it all back
> again.
It's far more simple: if you retain the same hostname/ip address on
the new machine, you need to make sure the new glusterd has the same
UUID of the old dead one (there is a file in /etc/glusterd),
configurations are automatically synced back at the first contact with
the other active nodes.

Instead, if you replace the node with a different node with different
hostname / ip:
http://community.gluster.org/q/a-replica-node-has-failed-completely-and-must-be-replaced-with-new-empty-hardware-how-do-i-add-the-new-hardware-and-bricks-back-into-the-replica-pair-and-begin-the-healing-process/

-- 
Giovanni Toraldo - LiberSoft
http://www.libersoft.it

Gluster users - Jan 2012 - Need to replace a brick on a failed first Gluster node

[Gluster-users] Need to replace a brick on a failed first Gluster node

[Gluster-users] Need to replace a brick on a failed first Gluster node