Kushnir, Michael (NIH/NLM/LHC) [C]
2012-Oct-31 19:59 UTC
[Gluster-users] Best practices for creating bricks
Hello, I am working with several Dell PE720xd. I have 24 disks per server at my disposal with a high end raid card with 1GB RAM and BBC. I will be building a distributed-replicated volume. Is it better for me to set up one or two large RAID0 arrays and use those as bricks, or should I make each hard drive a brick? This will be back end storage for an image search engine with lots of small file reads from multiple application servers. Thanks, Michael ______________________________________________________________________________________ Michael Kushnir System Architect / Engineer Communications Engineering Branch Lister Hill National Center for Biomedical Communications National Library of Medicine 8600 Rockville Pike, Building 38A, Floor 10 Besthesda, MD 20894 Phone: 301-435-3219 Email: michael.kushnir at nih.gov<mailto:michael.kushnir at nih.gov> [cid:image001.jpg at 01CDB780.8FACDD20] -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20121031/03cd3e03/attachment.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.jpg Type: image/jpeg Size: 8449 bytes Desc: image001.jpg URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20121031/03cd3e03/attachment.jpg>
On 10/31/2012 03:59 PM, Kushnir, Michael (NIH/NLM/LHC) [C] wrote:> I am working with several Dell PE720xd. I have 24 disks per server at my > disposal with a high end raid card with 1GB RAM and BBC. I will be building a > distributed-replicated volume. Is it better for me to set up one or two large > RAID0 arrays and use those as bricks, or should I make each hard drive a brick? > > This will be back end storage for an image search engine with lots of small > file reads from multiple application servers.As a very general rule, the "one big RAID" type of configuration will work a little better, especially for workloads where accesses to blocks within the same RAID stripe will come close together - large requests, sequential requests, requests with some other locality of reference. For workloads lacking that characteristic, the "many small bricks" approach - essentially letting GlusterFS manage scheduling instead of the RAID controller - might work better. However, there are so many exceptions both ways that no rule of thumb really suffices. I highly recommend running your workload or a close simulation of it on both configurations, to see which really performs better for your I/O patterns on your hardware. It's worth the effort.
On Wed, Oct 31, 2012 at 03:59:15PM -0400, Kushnir, Michael (NIH/NLM/LHC) [C] wrote:> I am working with several Dell PE720xd. I have 24 disks per server at > my disposal with a high end raid card with 1GB RAM and BBC. I will be > building a distributed-replicated volume. Is it better for me to set up > one or two large RAID0 arrays and use those as bricks, or should I make > each hard drive a brick?I would recommend neither. Remember that drives *will* fail and you need to be ready to handle these events. - a RAID0 array has the problem that if one disk fails, you lose the entire filesystem. Therefore you replace the bad drive, make a fresh filesystem, and then have to resync the *entire* contents from the other gluster node. If you get a second disk failure during that time, you have lost everything. - separate disks per brick is less bad, but is harder to manage. A disk failure involves swapping the drive, building a new filesystem on it, re-adding it into gluster, and letting it resync across. This needs to be a well-trodden path operationally, and also you need suitable monitoring in place to know when a brick has gone down. Personally I would say: if write performance is relatively unimportant, then use RAID6 for the arrays. If write performance is important, then use RAID10 (and accept a 50% loss of capacity) What you get from either RAID6 or RAID10 is a no-brainer way to replace failed disks and have the array reconstruct itself automatically, plus standard tools for monitoring and managing the array. This is my personal point of view: others may differ. It also depends on how important your data is to you (but the implication from building a replicated volume is that your data *is* important) Regards, Brian.
Maybe Matching Threads
- How to resolve split-brain and replication failures?
- Problems with striped-replicated volumes on 3.3.1
- [LLVMdev] Criticism of garbage collection support in LLVM
- CERN using RHEL/CentOS?
- Latin Hypercube Sample and transformation to uniformly distributed integers or classes