On 10/07/2015 12:06 AM, Lindsay Mathieson wrote:> First up - one of the things that concerns me re gluster is the > incoherent state of documentation. The only docs linked on the main > webpage are for 3.2 and there is almost nothing on how to handle > failure modes such as dead disks/bricks etc, which is one of glusters > primary functions.Every link under Documentation at http://gluster.org points to the gluster.readthedocs.org pages that are all current. Where is this "main webpage" in which you found links to the old wiki pages?> > My problem - I have a replica 2 volume, 2 nodes, 2 bricks (zfs datasets). > > As a test, I destroyed one brick (zfs destroy the dataset). > > > Can't start the datastore1: > > volume start: datastore1: failed: Failed to find brick directory > /glusterdata/datastore1 for volume datastore1. Reason : No such file > or directory > > A bit disturbing, I was hoping it would work off the remaining brick.It *is* still working off the remaining brick. It won't start the missing brick because the missing brick is missing. This is by design. If, for whatever reason, your brick did not mount, you don't want gluster to start filling your root device with replication from the other brick. I documented this on my blog at https://joejulian.name/blog/replacing-a-brick-on-glusterfs-340/ which is still accurate for the latest version. The bug report I filed for this was closed without resolution. I assume there's no plans for ever making this easy for administrators. https://bugzilla.redhat.com/show_bug.cgi?id=991084> Can't replace the brick: > > gluster volume replace-brick datastore1 > vnb.proxmox.softlog:/glusterdata/datastore1 > vnb.proxmox.softlog:/glusterdata/datastore1-2 commit force > > because the store is not running. > > After a lot of googling I found list messages referencing the remove > brick command: > gluster volume remove-brick datastore1 replica 2 > vnb.proxmox.softlog:/glusterdata/datastore1c commit force > > Fails with the unhelpful error: > > wrong brick type: commit, use <HOSTNAME>:<export-dir-abs-path> > Usage: volume remove-brick <VOLNAME> [replica <COUNT>] <BRICK> ... > <start|stop|status|commit|force> > > In the end I destroyed and recreated the volume so I could resume > testing, but I have no idea how I would handle a real failed brick in > the future > > -- > Lindsay > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://www.gluster.org/mailman/listinfo/gluster-users-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20151007/d988ead9/attachment.html>
Lindsay Mathieson
2015-Oct-07 23:24 UTC
[Gluster-users] How to replace a dead brick? (3.6.5)
On 8 October 2015 at 07:19, Joe Julian <joe at julianfamily.org> wrote:> > > On 10/07/2015 12:06 AM, Lindsay Mathieson wrote: > > First up - one of the things that concerns me re gluster is the incoherent > state of documentation. The only docs linked on the main webpage are for > 3.2 and there is almost nothing on how to handle failure modes such as dead > disks/bricks etc, which is one of glusters primary functions. > > > Every link under Documentation at http://gluster.org points to the > gluster.readthedocs.org pages that are all current. Where is this "main > webpage" in which you found links to the old wiki pages? >The Community Page: http://www.gluster.org/community/documentation/index.php Which is what came up at the top when i searched for gluster documentation. Might be an idea to redirect to the main docs from that page.> > > My problem - I have a replica 2 volume, 2 nodes, 2 bricks (zfs datasets). > > As a test, I destroyed one brick (zfs destroy the dataset). > > > Can't start the datastore1: > > volume start: datastore1: failed: Failed to find brick directory > /glusterdata/datastore1 for volume datastore1. Reason : No such file or > directory > > A bit disturbing, I was hoping it would work off the remaining brick. > > > It *is* still working off the remaining brick. It won't start the missing > brick because the missing brick is missing. This is by design. If, for > whatever reason, your brick did not mount, you don't want gluster to start > filling your root device with replication from the other brick. >It wouldn't start the *Datastore*, so all bricks were unavailable. I did stop the datastore myself in the first place, but I would have expected I could restart it. thanks, -- Lindsay -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20151008/1c8b54ec/attachment.html>
Lindsay Mathieson
2015-Oct-08 05:56 UTC
[Gluster-users] How to replace a dead brick? (3.6.5)
On 8 October 2015 at 07:19, Joe Julian <joe at julianfamily.org> wrote:> I documented this on my blog at > https://joejulian.name/blog/replacing-a-brick-on-glusterfs-340/ which is > still accurate for the latest version. > > The bug report I filed for this was closed without resolution. I assume > there's no plans for ever making this easy for administrators. > https://bugzilla.redhat.com/show_bug.cgi?id=991084 >Yes, its the sort of workaround one can never remember in an emergency, you'd have to google it up ... In the case I was working with, probably easier and quicker to do a remove-brick/add-brick. thanks, -- Lindsay -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20151008/ace5dc7d/attachment.html>