thr3ads.net - Gluster users - [Gluster-users] Some questions about theoretical gluster failures. [Oct 2011]

If this information is useful, please help other people find it:
Share via:

Harry Mangalam

2011-Oct-26 02:01 UTC

[Gluster-users] Some questions about theoretical gluster failures.

We're considering implementing gluster for a genomics cluster, and it
seems to have some theoretical advantages that so far seem to have
been borne out in some limited testing, mod some odd problems with an
inability to delete dir trees. I'm about to test with the latest beta
that was promised to clear up these bugs, but as I'm doing that,
answers to these Qs would be appraciated...

- what happens in a distributed system if a node goes down? Does the
rest of the system keep working with the files on that brick
unavailable until it comes back or is the filesystem corrupted? In my
testing, it seemed that the system indeed kept working and added files
to the remaining systems, but that files that were hashed to the
failed volume were unavailable (of course).

- is there a head node? the system is distributed but you're mounting
a specific node for the glusterfs mount - if that node goes down, is
the whole filesystem hosed or is that node reference really a group
reference and the gluster filesystem continues with the loss of that
node's files? ie can any gluster node replace a mountpoint node and
does that happen transparently? (I haven't tested this).

- can you intermix distributed and mirrored volumes? This is of
particular interest since some of our users want to have replicated
data and some don't care.

Many thanks
hjm
--
Harry Mangalam - Research Computing, OIT, Rm 225 MSTB, UC Irvine
[ZOT 2225] / 92697 Google Voice Multiplexer: (949) 478-4487
MSTB Lat/Long: (33.642025,-117.844414) (paste into Google Maps)
--
This signature has been OCCUPIED!
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://supercolony.gluster.org/pipermail/gluster-users/attachments/20111025/ed0e009c/attachment.html>

Joe Landman

2011-Oct-26 02:09 UTC

head link

[Gluster-users] Some questions about theoretical gluster failures.

On 10/25/2011 10:01 PM, Harry Mangalam wrote:
> - what happens in a distributed system if a node goes down? Does the
> rest of the system keep working with the files on that brick unavailable
> until it comes back or is the filesystem corrupted? In my testing, it
> seemed that the system indeed kept working and added files to the
> remaining systems, but that files that were hashed to the failed volume
> were unavailable (of course).
This is basically it.

> - is there a head node? the system is distributed but you're mounting a
Only if you mount via nfs, though technically you can mount it from any 
server.  If you mount via gluster client, just point it at any of the 
servers.  In the nfs case, if the mount server goes away, so does access 
unless you remount.  In the glusterfs case, if the mount server goes 
away, the other servers can continue talking with the client.
> specific node for the glusterfs mount - if that node goes down, is the
> whole filesystem hosed or is that node reference really a group
> reference and the gluster filesystem continues with the loss of that
> node's files? ie can any gluster node replace a mountpoint node and
does
> that happen transparently? (I haven't tested this).
You can mount from any node, but the mount target has to be specifically 
unmounted/remounted under nfs (umount -l is your friend).  With 
GlusterFS client its less of an issue.

This said, I don't know many people using the nfs client version.  I 
haven't tested 3.2.4's server, but through 3.2.3, we can crash the NFS 
server with a moderate load.
> - can you intermix distributed and mirrored volumes? This is of
Not sure what you mean by intermix ... but yes, you can have multiple 
(many) volumes of all different types coming from the same units on 
different volume names.
> particular interest since some of our users want to have replicated data
> and some don't care.
>
> Many thanks
>
> hjm
>
> --
>
> Harry Mangalam - Research Computing, OIT, Rm 225 MSTB, UC Irvine
>
> [ZOT 2225] / 92697 Google Voice Multiplexer: (949) 478-4487
>
> MSTB Lat/Long: (33.642025,-117.844414) (paste into Google Maps)
>
> --
>
> This signature has been OCCUPIED!
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics, Inc.
email: landman at scalableinformatics.com
web  : http://scalableinformatics.com
        http://scalableinformatics.com/sicluster
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615

Jeff Darcy

2011-Oct-26 13:34 UTC

head link

[Gluster-users] Some questions about theoretical gluster failures.

On Tue, 25 Oct 2011 19:01:33 -0700
Harry Mangalam <harry.mangalam at uci.edu> wrote:
> We're considering implementing gluster for a genomics cluster, and it 
> seems to have some theoretical advantages that so far seem to have 
> been borne out in some limited testing, mod some odd problems with an 
> inability to delete dir trees.  I'm about to test with the latest
> beta that was promised to clear up these bugs, but as I'm doing that, 
> answers to these Qs would be appraciated...
> 
> - what happens in a distributed system if a node goes down?  Does the 
> rest of the system keep working with the files on that brick 
> unavailable until it comes back or is the filesystem corrupted?  In
> my testing, it seemed that the system indeed kept working and added
> files to the remaining systems, but that files that were hashed to
> the failed volume were unavailable (of course).
Yes, this is what I would expect (and have always observed) when using
just distribution without replication.  Not only are existing files
on the failed brick unavailable, but IMX attempts to create new
files which would hash to that brick (effectively a random 1/N) also
fail.  That part, at least, is fixable.  With replication, the
single-brick failure would effectively be invisible to the distribution
layer so even this glitch wouldn't occur.
> - is there a head node?  the system is distributed but you're
> mounting a specific node for the glusterfs mount - if that node goes
> down, is the whole filesystem hosed or is that node reference really
> a group reference and the gluster filesystem continues with the loss
> of that node's files?  ie can any gluster node replace a mountpoint
> node and does that happen transparently? (I haven't tested this).
The node that you specify for the mount is really only used to fetch
the volfile, which contains the names of all bricks that are involved in
providing service for that volume.  The mount node might not even be
one of those nodes itself (e.g. mount from gluster1, bricks are
actually on gluster2 and gluster3).  Once the connections have been
made to each brick, they're all equal and the failure of one will have
only partial (if any) effect.
> - can you intermix distributed and mirrored volumes?  This is of 
> particular interest since some of our users want to have replicated 
> data and some don't care.
Every volume is inherently distributed (even if there's only one
brick), and can optionally be striped and/or replicated as well
independently of what's being done for other volumes.

Possibly Parallel Threads

Search for more possibly parallel threads

Gluster users - Oct 2011 - Some questions about theoretical gluster failures.

[Gluster-users] Some questions about theoretical gluster failures.

[Gluster-users] Some questions about theoretical gluster failures.

[Gluster-users] Some questions about theoretical gluster failures.

Possibly Parallel Threads