thr3ads.net - Gluster users - [Gluster-users] How to replace a dead brick? (3.6.5) [Oct 2015]

If this information is useful, please help other people find it:
Share via:

Lindsay Mathieson

2015-Oct-07 07:06 UTC

[Gluster-users] How to replace a dead brick? (3.6.5)

First up - one of the things that concerns me re gluster is the incoherent
state of documentation. The only docs linked on the main webpage are for
3.2 and there is almost nothing on how to handle failure modes such as dead
disks/bricks etc, which is one of glusters primary functions.

My problem - I have a replica 2 volume, 2 nodes, 2 bricks (zfs datasets).

As a test, I destroyed one brick (zfs destroy the dataset).

Can't start the datastore1:

volume start: datastore1: failed: Failed to find brick directory
/glusterdata/datastore1 for volume datastore1. Reason : No such file or
directory

A bit disturbing, I was hoping it would work off the remaining brick.

Can't replace the brick:

gluster volume replace-brick datastore1
vnb.proxmox.softlog:/glusterdata/datastore1
vnb.proxmox.softlog:/glusterdata/datastore1-2 commit force

because the store is not running.

After a lot of googling I found list messages referencing the remove brick
command:
gluster volume remove-brick datastore1 replica 2
vnb.proxmox.softlog:/glusterdata/datastore1c commit force

Fails with the unhelpful error:

wrong brick type: commit, use <HOSTNAME>:<export-dir-abs-path>
Usage: volume remove-brick <VOLNAME> [replica <COUNT>] <BRICK>
...
<start|stop|status|commit|force>

In the end I destroyed and recreated the volume so I could resume testing,
but I have no idea how I would handle a real failed brick in the future

--
Lindsay
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20151007/c7e018b1/attachment.html>

sreejith kb

2015-Oct-07 11:28 UTC

head link

[Gluster-users] How to replace a dead brick? (3.6.5)

Hi,

     While you removing a failed brick from four existing cluster volume
try to provide the correct replica number  *'n-1' *while removing a
brick
from 'n' number of bricks from a gluster volume.

so here you are trying to remove one brick from a volume that contain 2
number of bricks in total, so do like this

gluster volume remove-brick datastore1 replica *1*
vnb.proxmox.softlog:/glusterdata/datastore1c
 force.

Follow the same strategy while adding a brick to an existing cluster
volume. provide replica number as 'n+1'

and if you are using a cloned VM that already contains  gluster packages
installed on it and have some gluster volume/peer/brick information on it,
then reset those values( including extended attributes )  and then only add
that new node/brick to your existing cluster.

and if you are replacing a failed node with a new one that having the same
IP, then after probing the peer you have to set the volume attributes on it
and restart the gluster-server service, then everything will be fine. If
you have anymore doubt in that feel free to contact.

regards,
sreejith K B,
sree15081947 at gmail.com
mob:09895315396







On 7 October 2015 at 12:36, Lindsay Mathieson <lindsay.mathieson at
gmail.com>
wrote:
> First up - one of the things that concerns me re gluster is the incoherent
> state of documentation. The only docs linked on the main webpage are for
> 3.2 and there is almost nothing on how to handle failure modes such as dead
> disks/bricks etc, which is one of glusters primary functions.
>
> My problem - I have a replica 2 volume, 2 nodes, 2 bricks (zfs datasets).
>
> As a test, I destroyed one brick (zfs destroy the dataset).
>
>
> Can't start the datastore1:
>
>   volume start: datastore1: failed: Failed to find brick directory
> /glusterdata/datastore1 for volume datastore1. Reason : No such file or
> directory
>
> A bit disturbing, I was hoping it would work off the remaining brick.
>
> Can't replace the brick:
>
>   gluster volume replace-brick datastore1
> vnb.proxmox.softlog:/glusterdata/datastore1
> vnb.proxmox.softlog:/glusterdata/datastore1-2 commit force
>
> because the store is not running.
>
> After a lot of googling I found list messages referencing the remove brick
> command:
> gluster volume remove-brick datastore1 replica 2
> vnb.proxmox.softlog:/glusterdata/datastore1c commit force
>
> Fails with the unhelpful error:
>
> wrong brick type: commit, use <HOSTNAME>:<export-dir-abs-path>
> Usage: volume remove-brick <VOLNAME> [replica <COUNT>]
<BRICK> ...
> <start|stop|status|commit|force>
>
> In the end I destroyed and recreated the volume so I could resume testing,
> but I have no idea how I would handle a real failed brick in the future
>
> --
> Lindsay
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>


-- 
........................................................................................
Regards,
Sreejith k b
Mob: 09895315396
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20151007/b25c5a97/attachment.html>

Joe Julian

2015-Oct-07 21:19 UTC

head link

[Gluster-users] How to replace a dead brick? (3.6.5)

On 10/07/2015 12:06 AM, Lindsay Mathieson wrote:> First up - one of the things that concerns me re gluster is the 
> incoherent state of documentation. The only docs linked on the main 
> webpage are for 3.2 and there is almost nothing on how to handle 
> failure modes such as dead disks/bricks etc, which is one of glusters 
> primary functions.
Every link under Documentation at http://gluster.org points to the 
gluster.readthedocs.org pages that are all current. Where is this "main 
webpage" in which you found links to the old wiki pages?
>
> My problem - I have a replica 2 volume, 2 nodes, 2 bricks (zfs datasets).
>
> As a test, I destroyed one brick (zfs destroy the dataset).
>
>
> Can't start the datastore1:
>
>   volume start: datastore1: failed: Failed to find brick directory 
> /glusterdata/datastore1 for volume datastore1. Reason : No such file 
> or directory
>
> A bit disturbing, I was hoping it would work off the remaining brick.
It *is* still working off the remaining brick. It won't start the 
missing brick because the missing brick is missing. This is by design. 
If, for whatever reason, your brick did not mount, you don't want 
gluster to start filling your root device with replication from the 
other brick.

I documented this on my blog at 
https://joejulian.name/blog/replacing-a-brick-on-glusterfs-340/ which is 
still accurate for the latest version.

The bug report I filed for this was closed without resolution. I assume 
there's no plans for ever making this easy for administrators.
https://bugzilla.redhat.com/show_bug.cgi?id=991084
> Can't replace the brick:
>
>   gluster volume replace-brick datastore1 
> vnb.proxmox.softlog:/glusterdata/datastore1 
> vnb.proxmox.softlog:/glusterdata/datastore1-2 commit force
>
> because the store is not running.
>
> After a lot of googling I found list messages referencing the remove 
> brick command:
> gluster volume remove-brick datastore1 replica 2 
> vnb.proxmox.softlog:/glusterdata/datastore1c commit force
>
> Fails with the unhelpful error:
>
> wrong brick type: commit, use <HOSTNAME>:<export-dir-abs-path>
> Usage: volume remove-brick <VOLNAME> [replica <COUNT>]
<BRICK> ...
> <start|stop|status|commit|force>
>
> In the end I destroyed and recreated the volume so I could resume 
> testing, but I have no idea how I would handle a real failed brick in 
> the future
>
> -- 
> Lindsay
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20151007/d988ead9/attachment.html>

Humble Devassy Chirammal

2015-Oct-08 07:10 UTC

head link

[Gluster-users] How to replace a dead brick? (3.6.5)

The steps for replacing the brick is documented and available @
http://gluster.readthedocs.org/en/latest/Administrator%20Guide/Managing%20Volumes/.
Hope it helps.
On Oct 7, 2015 12:36 PM, "Lindsay Mathieson" <lindsay.mathieson at
gmail.com>
wrote:
> First up - one of the things that concerns me re gluster is the incoherent
> state of documentation. The only docs linked on the main webpage are for
> 3.2 and there is almost nothing on how to handle failure modes such as dead
> disks/bricks etc, which is one of glusters primary functions.
>
> My problem - I have a replica 2 volume, 2 nodes, 2 bricks (zfs datasets).
>
> As a test, I destroyed one brick (zfs destroy the dataset).
>
>
> Can't start the datastore1:
>
>   volume start: datastore1: failed: Failed to find brick directory
> /glusterdata/datastore1 for volume datastore1. Reason : No such file or
> directory
>
> A bit disturbing, I was hoping it would work off the remaining brick.
>
> Can't replace the brick:
>
>   gluster volume replace-brick datastore1
> vnb.proxmox.softlog:/glusterdata/datastore1
> vnb.proxmox.softlog:/glusterdata/datastore1-2 commit force
>
> because the store is not running.
>
> After a lot of googling I found list messages referencing the remove brick
> command:
> gluster volume remove-brick datastore1 replica 2
> vnb.proxmox.softlog:/glusterdata/datastore1c commit force
>
> Fails with the unhelpful error:
>
> wrong brick type: commit, use <HOSTNAME>:<export-dir-abs-path>
> Usage: volume remove-brick <VOLNAME> [replica <COUNT>]
<BRICK> ...
> <start|stop|status|commit|force>
>
> In the end I destroyed and recreated the volume so I could resume testing,
> but I have no idea how I would handle a real failed brick in the future
>
> --
> Lindsay
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20151008/84618011/attachment.html>

Gluster users - Oct 2015 - How to replace a dead brick? (3.6.5)

[Gluster-users] How to replace a dead brick? (3.6.5)

[Gluster-users] How to replace a dead brick? (3.6.5)

[Gluster-users] How to replace a dead brick? (3.6.5)

[Gluster-users] How to replace a dead brick? (3.6.5)