thr3ads.net - Gluster users - [Gluster-users] Best practices? [Jan 2012]

If this information is useful, please help other people find it:
Share via:

Brian Candler

2012-Jan-22 22:17 UTC

[Gluster-users] Best practices?

Suppose I start building nodes with (say) 24 drives each in them.

Would the standard/recommended approach be to make each drive its own
filesystem, and export 24 separate bricks, server1:/data1 .. 
server1:/data24 ?  Making a distributed replicated volume between this and
another server would then have to list all 48 drives individually.

At the other extreme, I could put all 24 drives into some flavour of stripe
or RAID and export a single filesystem out of that.

It seems to me that having separate filesystems per disk ould be the easiest
to understand and to recover data from, and allow volume 'hot spots' to
be
measured and controlled, at the expense of having to add each brick
separately into a volume.

I was trying to find some current best-practices or system design guidelines
on the wiki, but unfortunately a lot of what I find is marked "out of
date",
e.g.
http://gluster.org/community/documentation/index.php/Guide_to_Optimizing_GlusterFS
http://gluster.org/community/documentation/index.php/Best_Practices_v1.3
[the latter is not marked out of date, but links to pages which are]

Also the glusterfs3.2 admin guide seems to dodge this issue, assuming you
already have your bricks prepared before telling you how to add them into a
volume.

But if you can point me at some recommended reading, I'd be more than happy
to read it :-)

Thanks,

Brian.

Greg_Swift at aotx.uscourts.gov

2012-Jan-23 21:54 UTC

head link

[Gluster-users] Best practices?

gluster-users-bounces at gluster.org wrote on 01/22/2012 04:17:02 PM:
>
> Suppose I start building nodes with (say) 24 drives each in them.
>
> Would the standard/recommended approach be to make each drive its own
> filesystem, and export 24 separate bricks, server1:/data1 ..
> server1:/data24 ?  Making a distributed replicated volume between this
and> another server would then have to list all 48 drives individually.
>
> At the other extreme, I could put all 24 drives into some flavour of
stripe> or RAID and export a single filesystem out of that.
>
> It seems to me that having separate filesystems per disk ould be the
easiest> to understand and to recover data from, and allow volume 'hot
spots' to
be> measured and controlled, at the expense of having to add each brick
> separately into a volume.
>
> I was trying to find some current best-practices or system design
guidelines> on the wiki, but unfortunately a lot of what I find is marked "out of
date",> e.g.
> http://gluster.org/community/documentation/index.php/
> Guide_to_Optimizing_GlusterFS
> http://gluster.org/community/documentation/index.php/Best_Practices_v1.3
> [the latter is not marked out of date, but links to pages which are]
>
> Also the glusterfs3.2 admin guide seems to dodge this issue, assuming you
> already have your bricks prepared before telling you how to add them into
a> volume.
>
> But if you can point me at some recommended reading, I'd be more than
happy> to read it :-)
Its been talked about a few times on the list in abstract but I can give
you one lesson learned from our environment.

the volume to brick ratio is a sliding scale.  you can can have more of
one, but then you need to have less of the other.

So taking your example above:

2 nodes
24 disks per node

Lets put that out into possible configurations:

2 nodes
24 bricks per node per volume
1 volume
---------
= 24 running processes and 24 ports per node

2 nodes
24 bricks per node per volume
100 volumes
---------
= 2400 running processes and 2400 ports per node

2 nodes
1 brick per node per volume
24 volumes
---------
= 24 running processes and 24 ports per node

2 nodes
1 brick per node per volume
2400 volumes
---------
= 2400 running processes and 2400 ports per node


More process/ports means more potential for ports in use, connectivity
issues, file use limits (ulimits), etc.

thats not the only thing to keep in mind, but its a poorly documented one
that burned me so :)

Greg_Swift at aotx.uscourts.gov

2012-Jan-24 15:23 UTC

head link

[Gluster-users] Best practices?

"Larry Bates" <larry.bates at vitalesafe.com> wrote on
01/24/2012 08:34:03 AM:
> I'll admit to not understanding your response and would really
> appreciate a little more explanation.  I only have two servers
> with 8 x 2TB each in AFR-DHT so far, but we are growing and will
> continue to do so basically forever.
I added a bit more clarification in my last response.

> Q: If you are putting all your bricks into a single AFR-DHT volume
> does any of this matter?A: If there is only one volume, then this is fairly mute.  Othewise, its
depends on the number of filesystems (leading to separate bricks) and the
number of volumes.
> Maybe I'm confused but it seems by keeping the drives as individual
> bricks and using Gluster AFR-DHT to consolidate them into a single
> volume you are:
>
> 1) Maximizing your disk storage (i.e. no disks lost to RAID5 or
> RAID6 overhead)
>
> 2) Limiting rebuilds due to disk failures to a single disk pair,
> thus shortening rebuild times and making rebuilds pretty clearly
> defined.
>
> 3) Making it easier to grow your volume because it can be done by
> adding only 2 drives/bricks at a time (which couldn't really be
> done if you consolidate via RAID5/RAID6 first).
All valid thoughts.  Personally, I prefer to never have to mess with
gluster just because I lost a disk in a RAID set.  Moving large amounts of
data around due to unplanned failures is way more costly to me than loosing
1-2 disks capacity per raid set.

Our original intention was to minimize administrative overhead.  There are
benefits to consolidating your disks into as few filesystems to be
presented as bricks as possible when you a large number of volumes or a
large number of disks.

-greg

Possibly Parallel Threads

Search for more maybe matching threads

Gluster users - Jan 2012 - Best practices?

[Gluster-users] Best practices?

[Gluster-users] Best practices?

[Gluster-users] Best practices?

Possibly Parallel Threads