thr3ads.net - Gluster users - [Gluster-users] Newbie questions [May 2010]

If this information is useful, please help other people find it:
Share via:

Joshua Baker-LePain

2010-May-03 19:50 UTC

[Gluster-users] Newbie questions

I'm a Gluster newbie trying to get myself up to speed.  I've been
through
the bulk of the website docs and I'm in the midst of some small (although 
increasing) scale test setups.  But I wanted to poll the list's collective 
wisdom on how best to fit Gluster into my setup.

As background, I currently have over 550 nodes with over 3000 cores in my 
(SGE scheduled) cluster, and we expand on a roughly biannual basis.  The 
cluster is all gigabit ethernet -- each rack has a switch, and these 
switches each have 4-port trunks to our central switch.  Despite the 
number of nodes in each rack, these trunks are not currently 
oversubscribed.  The cluster is shared among many research groups and the 
vast majority of the jobs are embarrassingly parallel.  Our current 
storage is an active-active pair of NetApp FAS3070s with a total of 8 
shelves of disks.  Unsurprisingly, it's fairly easy for any one user to 
flatten either head (or both) of the NetApp.

I'm looking at Gluster for 2 purposes:

1) To host our "database" volume.  This volume has copies of several
    protein and gene databases (PDB, UniProt, etc).  The databases
    generally consist of tens of thousands of small (a few hundred KB at
    most) files.  Users often start array jobs with hundreds or thousands
    of tasks, each task of which accesses many of these files.

2) To host a cluster-wide scratch space.  Users waste a lot of time (and
    bandwidth) copying (often temporary) results back and forth between the
    network storage and the nodes' scratch disks.  And scaling the NetApp
    is difficult, not least of which because it is rather difficult to
    convince PIs to spring for storage rather than more cores.

For purpose 1, clearly I'm looking at a replicated volume.  For purpose 2, 
I'm assuming that distributed is the way to go (rather than striped), 
although for reliability reasons I'd likely go replicated then 
distributed.  For storage bricks, I'm looking at something like HP's
DL180
G6, where I would have 25 internal SAS disks (or alternatively, I could 
put the same number in a SAS-attached external chassis).

In addition to any general advice folks could give, I have these specific 
questions:

1) My initial leaning would be to RAID10 the disks at the server level,
    and then use the RAID volumes as gluster exports.  But I could also see
    running the disks in JBOD mode and doing all the redundancy at the
    Gluster level.  The latter would seem to make management (and, e.g.,
    hot swap) more difficult, but is it preferred from a Gluster
    perspective?  How difficult would it make disk and/or brick
    maintenance?

2) Is it frowned upon to create 2 volumes out of the same physical set of
    disks?  I'd like to maximize the spindle count in both volumes
    (especially the scratch volume), but will it overly degrade
    performance?  Would it be better to simply create one replicated and
    distributed volume and use that for both of the above purposes?

3) Is it crazy to think of doing a distributed (or NUFA) volume with the
    scratch disks in the whole cluster?  Especially given that we have
    nodes of many ages and see not infrequent node crashes due to bad
    memory/HDDs/user code?

If you've made it this far, thanks very much for reading.  Any and all 
advice (and/or pointers at more documentation) would be much appreciated.

-- 
Joshua Baker-LePain
QB3 Shared Cluster Sysadmin
UCSF

Jon Tegner

2010-May-04 05:30 UTC

head link

[Gluster-users] Newbie questions

Hi, I'm also a newbie, and I'm looking forward to answers to your
questions.

Just one question, why would distributed be preferable over striped (I'm 
probably the bigger newbie here)?
> For purpose 1, clearly I'm looking at a replicated volume.  For 
> purpose 2, I'm assuming that distributed is the way to go (rather than 
> striped), although for 
Regards,

/jon

Daniel Maher

2010-May-04 07:54 UTC

head link

[Gluster-users] Newbie questions

On 05/03/2010 09:50 PM, Joshua Baker-LePain wrote:
> For purpose 1, clearly I'm looking at a replicated volume. For purpose
> 2, I'm assuming that distributed is the way to go (rather than
striped),
> although for reliability reasons I'd likely go replicated then
> distributed. For storage bricks, I'm looking at something like HP's
1. Yes.
2. Your call - both will work, but as you said, it's a question of in 
how many places you want the data to be. :)
> 2) Is it frowned upon to create 2 volumes out of the same physical set of
> disks? I'd like to maximize the spindle count in both volumes
> (especially the scratch volume), but will it overly degrade
> performance? Would it be better to simply create one replicated and
> distributed volume and use that for both of the above purposes?
I don't know about ? frowned ?, but my knee-jerk response would be to 
avoid that scenario.  That said, it really all comes down to usage 
patterns ; if you're only serving data out of one volume at a time, then 
there's no problem, but if you're constantly using both...
> 3) Is it crazy to think of doing a distributed (or NUFA) volume with the
> scratch disks in the whole cluster? Especially given that we have
> nodes of many ages and see not infrequent node crashes due to bad
> memory/HDDs/user code?
Again, ? crazy ? is a little strong, but again, it might not hurt to 
review your usage patterns before diving into the architecture.  Who 
will access what, in what amounts, and at what speed, when ?  Once this 
has been established, you can make better informed decisions about where 
to put the data, and how to let people access it (in fact, i would 
submit that many of your questions will answer themselves :) ).


-- 
Daniel Maher <dma+gluster AT witbe DOT net>

pkoelle

2010-May-04 12:25 UTC

head link

[Gluster-users] Newbie questions

Am 03.05.2010 21:50, schrieb Joshua Baker-LePain:
[snip]> I'm looking at Gluster for 2 purposes:
>
> 1) To host our "database" volume. This volume has copies of
several
> protein and gene databases (PDB, UniProt, etc). The databases
> generally consist of tens of thousands of small (a few hundred KB at
> most) files. Users often start array jobs with hundreds or thousands
> of tasks, each task of which accesses many of these files. From our testing we found gluster with many small files to be rather 
slow (GigE). Each open() will go over the network and will effectively 
kill read performance (5-7 MB/sec). We tried to serve webapps with many 
small files and startup time was not tolerable.

Of course, you need to test yourself ;)

hth
  Paul

Gluster users - May 2010 - Newbie questions

[Gluster-users] Newbie questions

[Gluster-users] Newbie questions

[Gluster-users] Newbie questions

[Gluster-users] Newbie questions