> Message: 5 > Date: Mon, 30 Apr 2012 08:39:57 +0100 > From: Brian Candler <B.Candler at pobox.com> > Subject: Re: [Gluster-users] Bricks suggestions > To: Gandalf Corvotempesta <gandalf.corvotempesta at gmail.com> > Cc: Gluster-users at gluster.org > Message-ID: <20120430073957.GA16804 at nsrc.org> > Content-Type: text/plain; charset=us-ascii > > On Sun, Apr 29, 2012 at 11:22:20PM +0200, Gandalf Corvotempesta wrote: >> So, what will you do? RAID1? No raid? > > RAID10 for write-active filesystems, and RAID6 for archive filesystems. > >> How does gluster detect a failed disk with no raid? What I don't >> understand is how gluster will detect a failure on a disk and the reply >> with data on the other server. > > I'm not sure - that's what the risk is. One would hope that gluster would > detect the failed disk and take it out of service, but I see a lot of posts > on this list from people who have problems in various failure scenarios > (failures to heal and the like). I'm not sure that glusterfs has really got > these situations nailed. > > Indeed, in my experience the gluster client won't even reconnect to a > glusterfsd (brick) if the brick has gone away and come back up. You have to > manually unmount and remount. That's about the simplest failure scenario you > can imagine. > >> With a raid controller, if controller detect a failure, will reply with >> KO to the operating system > > KO or OK? With a RAID controller (or software RAID), the RAID subsystem > should quietly mark the failed drive as unusable and redirect all operations > to the working drive. And you will have a way to detect this situation, > e.g. /proc/mdstat for Linux software RAID. > >> Is safer to use a 24 disks server with no raid and with 24 replicated >> and distributed bricks (24 on one server and 24 on other server)? > > In theory they should be the same, and with replicated/distributed you also > get the benefit that if an entire server dies, the data remains available. > In practice I am not convinced that glusterfs will work well this way. > > > ------------------------------ > > Message: 6 > Date: Mon, 30 Apr 2012 10:53:42 +0200 > From: Gandalf Corvotempesta <gandalf.corvotempesta at gmail.com> > Subject: Re: [Gluster-users] Bricks suggestions > To: Brian Candler <B.Candler at pobox.com> > Cc: Gluster-users at gluster.org > Message-ID: > <CAJH6TXh-jb=-Gus2yadhSbF94-dNdQ-iP04jH6H4wGSgNB8LHQ at mail.gmail.com> > Content-Type: text/plain; charset="iso-8859-1" > > 2012/4/30 Brian Candler <B.Candler at pobox.com> > >> KO or OK? With a RAID controller (or software RAID), the RAID subsystem >> should quietly mark the failed drive as unusable and redirect all >> operations >> to the working drive. And you will have a way to detect this situation, >> e.g. /proc/mdstat for Linux software RAID. >> > > KO. > As you wrote, in a raid environment, the controller will detect a failed > disk and redirect I/O to the working drive. > > With no RAID, is gluster smart enough to detect a disk failure and redirect > all I/O to the other server? > > A disk can have a damed cluster, so only a portion of itself will became > unusable. > A raid controller is able to detect this, gluster will do the same or still > try to reply > with brokend data? > > So, do you suggest to use a RAID10 on each server? > - disk1+disk2 raid1 > - disk3+disk4 raid1 > > raid0 over these raid1 and then replicate it with gluster? > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: <http://gluster.org/pipermail/gluster-users/attachments/20120430/763be665/attachment-0001.htm> > > ------------------------------I have been running the following configuration for over 16 months with no issues: Fluster V3.0.0 in two SuperMicro servers each with 8x2TB hard drives configured as JBOD. I use Gluster to replicate each drive between servers and the distribute across the drives giving me approx 16TB as a single volume. I can pull a single drive an replace and then use self heal to rebuild. I can shutdown or reboot a server and traffic continues to the other server (good for kernel updates). I use logdog to alert me via email/text if a drive fails. I chose this config because it was 1) simplest, 2) maximized my disk storage, 3) effectively resulted in a shared nothing RAID10 SAN-like storage system, 4) minimized the amount of data movement during a rebuild, 5) it didn't require any hardware RAID controllers which would increase my cost. This config has worked for me exactly as planned. I'm currently building a new server with 8x4TB drives and will be replacing one of the existing servers in a couple of weeks. I will force a self heal to populate if wit files from primary server. When done I'll repeat process for the other server. Larry Bates vitalEsafe, Inc.