thr3ads.net - Gluster users - [Gluster-users] Does gluster make use of a multicore setup? Hardware recs.? [Apr 2011]

If this information is useful, please help other people find it:
Share via:

Martin Schenker

2011-Apr-27 08:39 UTC

[Gluster-users] Does gluster make use of a multicore setup? Hardware recs.?

Hi all!

I'm new to the Gluster system and tried to find answers to some simple 
questions (and couldn't find the information with Google etc.)

-does Gluster spread it's cpu load across a multicore environment? So 
does it make sense to have 50 core units as Gluster server? CPU loads 
seem to go up quite high during file system repairs so spreading / 
multithreading should help? What kind of CPUs are working well? How much 
memory does help the preformance?

-Are there any recommendations for commodity hardware? We're thinking of 
36 slot 4U servers, what kind of controllers DO work well for IO speed? 
Any real life experiences? Does it dramatically improve the performance 
to increase the number of controllers per disk?

The aim is for  a ~80-120T file system with 2-3 bricks.

Thanks for any feedback!

Best, Martin

Liam Slusser

2011-Apr-27 10:12 UTC

head link

[Gluster-users] Does gluster make use of a multicore setup? Hardware recs.?

Gluster is threaded and will take advantage of multiple CPU hardware
and memory, however having a fast disk subsystem is far more
important.  Having a lot of memory with huge bricks isn't very
necessary IMO because even with 32g of ram your cache hit ratio across
huge 30+tb bricks is so insanely small that it doesn't really make any
real world difference.

I have a 240tb cluster over 16 bricks (4 physical servers each
exporting 4 30tb bricks) and another 120tb cluster over 8 bricks (4
physical servers each exporting 2 30tb bricks).

Hardware wise both my clusters are basically the same.  Supermicro
SC846E1-R900B 4u 24 drive chassis, dual 2.2ghz quadcore xeons, 8g ram,
3ware 9690sa-4i4e SAS raid controller, 24 x Seagate 1.5tb SATA 7200rpm
hard drives in each chassis.  Each brick is a raid6 over all 24 drives
per chassis.  I daisy chain the chassis's together via SAS cables.  So
on my larger cluster I daisy chain 3 more 24 drive chassis's off the
back of the head node.  The smaller cluster I only daisy chain one
chassis off the back.

Lots of people prefer not to do raid and have gluster handle the file
replication (replicate) between bricks.  My problem with that is with
a huge amount of files (I have nearly 100 million files on my larger
cluster) that a rebuild (ls -alR) takes 2-3 weeks.  And since those
Seagate drives are crap (I lose maybe 1-3 drives a month!) I would
constantly be rebuilding almost all the time.  Using hardware raid
makes life much easier for me.  Instead of having 384 bricks I have
16.  When I lose a drive I just hot swap it and let the 3ware
controller rebuild the raid6 array.  The rebuild time on the 3ware
depends on the workload but its anywhere from 2-5 days normally.  One
time I lost a drive in the middle of a rebuild (so one failed and one
in a rebuild state) and was able to hotswap the new failed drive and
it correctly rebuilt the array with two failed drives without any
problems or downtime on the cluster.  Win!

So I'm a big fan of hardware raid, especially the 3ware controllers.
They handle the slow non-enterprise Seagate drives very well.  I've
tried LSI, Dell Perc 5e/6e, and Supermicro (LSI) controller and they
all had issues with drive timeouts.  A few recommendations when using
the 3ware controllers, disable SMARTD in Linux (it pisses off the
3ware controller) and the 3ware controller keeps an eye on the SMART
on each disk anyway, set the block readahead in linux to 16384
(/sbin/blockdev --setra 16384 /dev/sdX), upgrade the firmware on the
3ware controller to the newest version from 3ware, use the newest
3ware drives and not the included driver bundled with whatever linux
distro you use, spend the $100 and make sure you get the optional
battery backup module for the controller, and use nagios to check your
raid status!  Oh, and if you use desktop commodity hard drives, make
sure you have a bunch of spares on hand.  :)

Even with hardware raid I still use gluster's replication to provide
me redundancy so I can do patches and system maintenance without
downtime to my clients.  I mirror bricks between head nodes and then
use distribute to glue all the replicated bricks together.

I have two Dell 1950 1u public facing webservers (Linux/Apache) using
the gluster fuse mount connected via a private backend network to my
smaller cluster.  My average file request size is around 3megs (10-15
requests per second), and i've been able to push 800mbit/sec of http
traffic from those two clients.  Might have been higher but my
firewall only has gigabit ethernet which was basically saturated at
that point.   I only use a 128meg gluster client cache because I'm
feeding my CDN so the requests are very random and I very rarely see
two requests for the same file.  Thats pretty awesome random read
performance if you ask me considering the hardware.  I start getting
uncomfortable with anymore than 600mbit/sec of traffic as the service
read times off the bricks on the gluster servers start getting quite
high.  Those 1.5tb Seagate hard drives are cheap, $80 a drive, but
they're not very fast at random reads.

Hope that helps!

liam

On Wed, Apr 27, 2011 at 1:39 AM, Martin Schenker
<martin.schenker at profitbricks.com> wrote:> Hi all!
>
> I'm new to the Gluster system and tried to find answers to some simple
> questions (and couldn't find the information with Google etc.)
>
> -does Gluster spread it's cpu load across a multicore environment? So
does
> it make sense to have 50 core units as Gluster server? CPU loads seem to go
> up quite high during file system repairs so spreading / multithreading
> should help? What kind of CPUs are working well? How much memory does help
> the preformance?
>
> -Are there any recommendations for commodity hardware? We're thinking
of 36
> slot 4U servers, what kind of controllers DO work well for IO speed? Any
> real life experiences? Does it dramatically improve the performance to
> increase the number of controllers per disk?
>
> The aim is for ?a ~80-120T file system with 2-3 bricks.
>
> Thanks for any feedback!
>
> Best, Martin
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>

Joe Landman

2011-Apr-27 12:29 UTC

head link

[Gluster-users] Does gluster make use of a multicore setup? Hardware recs.?

On 04/27/2011 04:39 AM, Martin Schenker wrote:> Hi all!
>
> I'm new to the Gluster system and tried to find answers to some simple
> questions (and couldn't find the information with Google etc.)
>
> -does Gluster spread it's cpu load across a multicore environment? So
Yes, Gluster is multi-threaded.  You can tune the number of IO threads 
per brick.
> does it make sense to have 50 core units as Gluster server? CPU loads
No ... as the rate limiting factor will be the storage units themselves, 
and not the processing behind Gluster.  You could dedicate some cores to 
it, but at some point you are going to run out of IO bandwidth or IOP 
capability before you run out of threading.
> seem to go up quite high during file system repairs so spreading /
> multithreading should help? What kind of CPUs are working well? How much
That high load is often a result of an IO system that is under load, 
poorly tuned (or poorly designed for the workload).
> memory does help the preformance?
Gluster will cache, so depending upon how much of your data is 
anticipated to be "hot", you can adjust from there.
>
> -Are there any recommendations for commodity hardware? We're thinking
of
Well, we are biased ... see .sig :)
> 36 slot 4U servers, what kind of controllers DO work well for IO speed?
We've had other customers use these, and they've had cooling issues, not
to mention issues with expandor performance.
> Any real life experiences? Does it dramatically improve the performance
> to increase the number of controllers per disk?
A good design can get you order of magnitude better performance than a 
poor design.  Lots of real world experience with this.
>
> The aim is for a ~80-120T file system with 2-3 bricks.
Hmmm... going wide for larger scenarios is almost always a better move 
than reduction of chassis.
>
> Thanks for any feedback!
>
> Best, Martin
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics, Inc.
email: landman at scalableinformatics.com
web  : http://scalableinformatics.com
        http://scalableinformatics.com/sicluster
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615

Gluster users - Apr 2011 - Does gluster make use of a multicore setup? Hardware recs.?

[Gluster-users] Does gluster make use of a multicore setup? Hardware recs.?

[Gluster-users] Does gluster make use of a multicore setup? Hardware recs.?

[Gluster-users] Does gluster make use of a multicore setup? Hardware recs.?