Martin Schenker
2011-Apr-27  08:39 UTC
[Gluster-users] Does gluster make use of a multicore setup? Hardware recs.?
Hi all! I'm new to the Gluster system and tried to find answers to some simple questions (and couldn't find the information with Google etc.) -does Gluster spread it's cpu load across a multicore environment? So does it make sense to have 50 core units as Gluster server? CPU loads seem to go up quite high during file system repairs so spreading / multithreading should help? What kind of CPUs are working well? How much memory does help the preformance? -Are there any recommendations for commodity hardware? We're thinking of 36 slot 4U servers, what kind of controllers DO work well for IO speed? Any real life experiences? Does it dramatically improve the performance to increase the number of controllers per disk? The aim is for a ~80-120T file system with 2-3 bricks. Thanks for any feedback! Best, Martin
Liam Slusser
2011-Apr-27  10:12 UTC
[Gluster-users] Does gluster make use of a multicore setup? Hardware recs.?
Gluster is threaded and will take advantage of multiple CPU hardware and memory, however having a fast disk subsystem is far more important. Having a lot of memory with huge bricks isn't very necessary IMO because even with 32g of ram your cache hit ratio across huge 30+tb bricks is so insanely small that it doesn't really make any real world difference. I have a 240tb cluster over 16 bricks (4 physical servers each exporting 4 30tb bricks) and another 120tb cluster over 8 bricks (4 physical servers each exporting 2 30tb bricks). Hardware wise both my clusters are basically the same. Supermicro SC846E1-R900B 4u 24 drive chassis, dual 2.2ghz quadcore xeons, 8g ram, 3ware 9690sa-4i4e SAS raid controller, 24 x Seagate 1.5tb SATA 7200rpm hard drives in each chassis. Each brick is a raid6 over all 24 drives per chassis. I daisy chain the chassis's together via SAS cables. So on my larger cluster I daisy chain 3 more 24 drive chassis's off the back of the head node. The smaller cluster I only daisy chain one chassis off the back. Lots of people prefer not to do raid and have gluster handle the file replication (replicate) between bricks. My problem with that is with a huge amount of files (I have nearly 100 million files on my larger cluster) that a rebuild (ls -alR) takes 2-3 weeks. And since those Seagate drives are crap (I lose maybe 1-3 drives a month!) I would constantly be rebuilding almost all the time. Using hardware raid makes life much easier for me. Instead of having 384 bricks I have 16. When I lose a drive I just hot swap it and let the 3ware controller rebuild the raid6 array. The rebuild time on the 3ware depends on the workload but its anywhere from 2-5 days normally. One time I lost a drive in the middle of a rebuild (so one failed and one in a rebuild state) and was able to hotswap the new failed drive and it correctly rebuilt the array with two failed drives without any problems or downtime on the cluster. Win! So I'm a big fan of hardware raid, especially the 3ware controllers. They handle the slow non-enterprise Seagate drives very well. I've tried LSI, Dell Perc 5e/6e, and Supermicro (LSI) controller and they all had issues with drive timeouts. A few recommendations when using the 3ware controllers, disable SMARTD in Linux (it pisses off the 3ware controller) and the 3ware controller keeps an eye on the SMART on each disk anyway, set the block readahead in linux to 16384 (/sbin/blockdev --setra 16384 /dev/sdX), upgrade the firmware on the 3ware controller to the newest version from 3ware, use the newest 3ware drives and not the included driver bundled with whatever linux distro you use, spend the $100 and make sure you get the optional battery backup module for the controller, and use nagios to check your raid status! Oh, and if you use desktop commodity hard drives, make sure you have a bunch of spares on hand. :) Even with hardware raid I still use gluster's replication to provide me redundancy so I can do patches and system maintenance without downtime to my clients. I mirror bricks between head nodes and then use distribute to glue all the replicated bricks together. I have two Dell 1950 1u public facing webservers (Linux/Apache) using the gluster fuse mount connected via a private backend network to my smaller cluster. My average file request size is around 3megs (10-15 requests per second), and i've been able to push 800mbit/sec of http traffic from those two clients. Might have been higher but my firewall only has gigabit ethernet which was basically saturated at that point. I only use a 128meg gluster client cache because I'm feeding my CDN so the requests are very random and I very rarely see two requests for the same file. Thats pretty awesome random read performance if you ask me considering the hardware. I start getting uncomfortable with anymore than 600mbit/sec of traffic as the service read times off the bricks on the gluster servers start getting quite high. Those 1.5tb Seagate hard drives are cheap, $80 a drive, but they're not very fast at random reads. Hope that helps! liam On Wed, Apr 27, 2011 at 1:39 AM, Martin Schenker <martin.schenker at profitbricks.com> wrote:> Hi all! > > I'm new to the Gluster system and tried to find answers to some simple > questions (and couldn't find the information with Google etc.) > > -does Gluster spread it's cpu load across a multicore environment? So does > it make sense to have 50 core units as Gluster server? CPU loads seem to go > up quite high during file system repairs so spreading / multithreading > should help? What kind of CPUs are working well? How much memory does help > the preformance? > > -Are there any recommendations for commodity hardware? We're thinking of 36 > slot 4U servers, what kind of controllers DO work well for IO speed? Any > real life experiences? Does it dramatically improve the performance to > increase the number of controllers per disk? > > The aim is for ?a ~80-120T file system with 2-3 bricks. > > Thanks for any feedback! > > Best, Martin > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://gluster.org/cgi-bin/mailman/listinfo/gluster-users >
Joe Landman
2011-Apr-27  12:29 UTC
[Gluster-users] Does gluster make use of a multicore setup? Hardware recs.?
On 04/27/2011 04:39 AM, Martin Schenker wrote:> Hi all! > > I'm new to the Gluster system and tried to find answers to some simple > questions (and couldn't find the information with Google etc.) > > -does Gluster spread it's cpu load across a multicore environment? SoYes, Gluster is multi-threaded. You can tune the number of IO threads per brick.> does it make sense to have 50 core units as Gluster server? CPU loadsNo ... as the rate limiting factor will be the storage units themselves, and not the processing behind Gluster. You could dedicate some cores to it, but at some point you are going to run out of IO bandwidth or IOP capability before you run out of threading.> seem to go up quite high during file system repairs so spreading / > multithreading should help? What kind of CPUs are working well? How muchThat high load is often a result of an IO system that is under load, poorly tuned (or poorly designed for the workload).> memory does help the preformance?Gluster will cache, so depending upon how much of your data is anticipated to be "hot", you can adjust from there.> > -Are there any recommendations for commodity hardware? We're thinking ofWell, we are biased ... see .sig :)> 36 slot 4U servers, what kind of controllers DO work well for IO speed?We've had other customers use these, and they've had cooling issues, not to mention issues with expandor performance.> Any real life experiences? Does it dramatically improve the performance > to increase the number of controllers per disk?A good design can get you order of magnitude better performance than a poor design. Lots of real world experience with this.> > The aim is for a ~80-120T file system with 2-3 bricks.Hmmm... going wide for larger scenarios is almost always a better move than reduction of chassis.> > Thanks for any feedback! > > Best, Martin > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://gluster.org/cgi-bin/mailman/listinfo/gluster-users-- Joseph Landman, Ph.D Founder and CEO Scalable Informatics, Inc. email: landman at scalableinformatics.com web : http://scalableinformatics.com http://scalableinformatics.com/sicluster phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615