D. Dante Lorenso
2012-Jan-28 23:31 UTC
[Gluster-users] best practices? 500+ win/mac computers, gluster, samba, new SAN, new hardware
All, I'm in the market for some hardware. Planning to put together a Gluster "cloud" to make our lab storage more performant and reliable. Thinking about buying 8 servers with 4 x 2TB 7200 rpm SATA drives (expandable to 8 drives). Each server will have 8 network ports and will be connected to a SAN switch using 4 ports link aggregated and connected to a LAN switch using the other 4 ports aggregated. The servers will run CentOS 6.2 Linux. The LAN side will run Samba and export the network shares, and the SAN side will run Gluster daemon. With 8 machines and 4 ports for SAN each, I need 32 ports total. I'm thinking a 48 port switch would work well as a SAN back-end switch giving me left over space to add iSCSI devices and backup servers which need to hook into the SAN. On a budget, I'm planning to use custom-built Supermicro servers with a DLink 48-port Layer 2 switch for the SAN. I've already put together a test gluster setup using some virtual machines and it seems very good. As I move to designing the production configuration, I'm wondering if there are best practices for how to set up shares and bricks, etc. Right now I'm thinking something like this: 1) Where should bricks be stored? I'd like bricks to stay out of sight so admins are not tempted to accidentally write data directly to the brick instead of the gluster mount. Something like: /brick/[brick dir name|brick mount dir] 2) Where should the glusterFS be mounted on the box? I'm thinking of using either /mnt or creating a new /gluster directory for the mount points: /mnt/[gluster share] or /gluster/[gluster share] 3) Common Samba configurations? In my virtual machine tests, I had problems mounting my Samba shares on Mac and Windows. As I starting configuring, it turned out I needed a bunch of samba-specific rules to fix .DS_Store files on mac, adjust directory mode and file modes, set socket options, and general Active Directory integration stuff. Is there a best-practices for smb.conf files when used with Gluster? Over time, I'm hoping to go through my network and replace all Windows storage servers with Samba whether I'm using Gluster or not. If any of you have pointers on this, it'd be great. 4) Performance tuning. So far, I've discovered using dd and iperf to debug my transfer rates. I use dd to test raw speed of the underlying disks (should I use RAID 0, RAID 1, RAID 5 ?), then I use iperf to test the speed of the network (make sure I'm getting the bandwidth I expect). Finally, I can use dd again to test my read and write speed to and from the gluster mount point. If all looks good, I move to testing transfers all the way to a Windows 7 box that mounts the storage servers over Samba. Then, I test everything like this: win7 -> network -> samba -> gluster -> brick -> ext4 -> sata hdd 5) Preferred striping or layout? I want fast, good, and cheap! http://en.wikipedia.org/wiki/Project_triangle Since I already know the hardware, my costs are pretty much determined. Next, I want to get the most Good and Fast from that cost. I'm thinking RAID 10 ... but at the network level. Perhaps if my drives on each of the 8 servers are RAID 0, then I can use "replicate 2" through gluster and get the "RAID 1" equivalent. I think using replicate 2 in gluster will 1/2 my network write/read speed, though. If instead I used RAID 1 for my hardware and no replicate in Gluster, then I get a RAID 0-1 overall, but can not afford to lose one entire storage server. For a network lab of 500+ computers with Active Directory, user profiles stored on LAN as well as desktops and redirected Movies, Photos, My Documents to network storage as well, I need performance and reliability. Any papers out there showing how another large university has achieved this using Gluster? TIA! -- Dante D. Dante Lorenso dante at lorenso.com
Matthew Mackes
2012-Jan-28 23:53 UTC
[Gluster-users] best practices? 500+ win/mac computers, gluster, samba, new SAN, new hardware
One other thought. We are using Lenovo Thinkserver RD240's. They are powerful, and inexpensive. They might be less then the Super-micro. On Sat, Jan 28, 2012 at 6:46 PM, Matthew Mackes < matthewmackes at deltasoniccarwash.com> wrote:> Hello, > > You are on the right track.Your mount points are fine.I like to mount my > Gluster storage under /mnt/gluster and place my bricks inside /STORAGE. > > I think that you are planing many more network interfaces per node then > your 4 (even 8) SATA drives per node at 7200 RPM will require. 2 aggregate > ports should be plenty for heavy load, and one for normal use. > > In my experience the 7200 RPM SATA drives will be your bottleneck. 15,000 > RPM SAS is a better choice for a storage node that requires heavy storage > load. > > The only case I can think of for 4+ network interfaces per machine is if > you intend to subnet your Gluster SAN network from your normal network used > for storage access and administration. In that case you could bond 2 > interfaces for the Gluster SAN network (for replication, stripping, et > between nodes) and the other pair bonded for your SAMBA and management > access. > > > Matt > > >-- Matthew Mackes Delta Sonic CWS, Buffalo N.Y. 716-541-2190 --------------------------------------------------------------- The Steps to solve any problem (Scientific Method): 1. Ask a Question 2 .Do Background Research 3. Construct a Hypothesis 4. Test Your Hypothesis by Doing an Experiment 5. Analyze Your Data and Draw a Conclusion 6. Communicate Your Results --------------------------------------------------------------- -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20120128/2bbfd093/attachment.html>
Brian Candler
2012-Jan-29 00:02 UTC
[Gluster-users] best practices? 500+ win/mac computers, gluster, samba, new SAN, new hardware
On Sat, Jan 28, 2012 at 05:31:28PM -0600, D. Dante Lorenso wrote:> Thinking about buying 8 servers with 4 x 2TB 7200 rpm SATA drives > (expandable to 8 drives). Each server will have 8 network ports and > will be connected to a SAN switch using 4 ports link aggregated and > connected to a LAN switch using the other 4 ports aggregated. The > servers will run CentOS 6.2 Linux. The LAN side will run Samba and > export the network shares, and the SAN side will run Gluster daemon.Just a terminology issue, but Gluster isn't really a SAN, it's a distributed NAS. A SAN uses a block-level protocol (e.g. iSCSI), into which the client runs a regular filesystem like ext4 or xfs or whatever. A NAS is a file-sharing protocol (e.g. NFS). Gluster is the latter.> With 8 machines and 4 ports for SAN each, I need 32 ports total. > I'm thinking a 48 port switch would work well as a SAN back-end > switch giving me left over space to add iSCSI devices and backup > servers which need to hook into the SAN.Out of interest, why are you considering two different network fabrics? Are there one set of clients which are talking CIFS and a different set of clients using the Gluster native client?> 4) Performance tuning. So far, I've discovered using dd and iperf > to debug my transfer rates. I use dd to test raw speed of the > underlying disks (should I use RAID 0, RAID 1, RAID 5 ?)Try some dd measurements onto a RAID 5 volume, especially for writing, and you'll find it sucks. I also suggest something like bonnie++ to get a more realistic performance measurement than just the dd throughput, as it will include seeks and filesystem operations (e.g. file creations/deletions)> Perhaps if my drives on each of the 8 > servers are RAID 0, then I can use "replicate 2" through gluster and > get the "RAID 1" equivalent. I think using replicate 2 in gluster > will 1/2 my network write/read speed, though.In theory Gluster replication ought to improve your read speed, since some clients can access one copy spindle while other clients access the other. But I'm not sure how much it will impact the write speed. I would however suggest that building a local RAID 0 array is probably a bad idea, because if one disk of the set fails, that whole filesystem is toast. Gluster does give you the option of a "distributed replicated" volume, so you can get both the "RAID 0" and "RAID 1" functionality. HTH, Brian.
D. Dante Lorenso
2012-Feb-01 18:30 UTC
[Gluster-users] best practices? 500+ win/mac computers, gluster, samba, new SAN, new hardware
On 1/28/12 5:46 PM, Matthew Mackes wrote:> Hello, > > You are on the right track.Your mount points are fine.I like to mount my > Gluster storage under /mnt/gluster and place my bricks inside /STORAGE. > > I think that you are planing many more network interfaces per node then > your 4 (even 8) SATA drives per node at 7200 RPM will require. 2 > aggregate ports should be plenty for heavy load, and one for normal use. > > In my experience the 7200 RPM SATA drives will be your bottleneck. > 15,000 RPM SAS is a better choice for a storage node that requires > heavy storage load.OMG, drive prices are insane! Just 4 x 2TB SATA drives cost more than the rest of an entire 2U system. We are considering getting Desktop drives temporarily and waiting for Thailand to rebuild before filling out the remainder of our disk arrays!> The only case I can think of for 4+ network interfaces per machine is if > you intend to subnet your Gluster SAN network from your normal network > used for storage access and administration. In that case you could bond > 2 interfaces for the Gluster SAN network (for replication, stripping, et > between nodes) and the other pair bonded for your SAMBA and management > access.We are thinking now to just get 4 nics. 2 bonded for the storage network and 2 bonded for the Samba interface to the LAN. That outta do it. I think your math on the drive speed vs network speed is right. -- Dante D. Dante Lorenso dante at lorenso.com
Larry Bates
2012-Feb-02 22:15 UTC
[Gluster-users] best practices? 500+ win/mac computers, gluster, samba, new SAN, new hardware
On Wed, Feb 01, 2012 at 12:21:17PM -0600, D. Dante Lorenso wrote:>> >Gluster does give you the option of a "distributed replicated" volume, so >> >you can get both the "RAID 0" and "RAID 1" functionality. >> >> If you have 8 drives connected to a single machine, how do you >> introduce those drives to Gluster? I was thinking I'd combine them >> into a single volume using RAID 0 and mount that volume on a box and >> turn it into a brick. Otherwise you have to add 8 separate bricks, >> right? That's not better is it?>I'm in the process of building a pair of test systems (in my case 12 disks >per server), and haven't quite got to building the Gluster layer, but yes 8 >separate filesystems and 8 separate bricks per server is what I'm suggesting >you consider. > >Then you create a distributed replicated volume using 16 bricks across 2 >servers, added in the correct order so that they pair up and down >(serverA:brick1 serverB:brick1 serverA:brick2 serverB:brick2 etc) - or >across 4 servers or however many you're building. > >The advantage is that if you lose one disk, 7/8 of the data is still usable >on both disks, and 1/8 is still available on one disk. If you lose a second >disk, there is a 1 in 15 chance that it's the mirror of the other failed >one, but a 14 in 15 chance that you won't lose any data. Furthermore, >replacing the failed disk will only have to synchronise (heal) one disk >worth of data. > >Now, if you decide to make RAID0 sets instead, then losing one disk will >destroy the whole filesystem. If you lose any disk in the second server you >will have lost everything. And when you replace the one failed disk, you >will need to make a new filesystem across the whole RAID0 array and resync >all 8 disks worth of data. > >I think it only makes sense to build an array brick if you are using RAID1 >or higher. RAID1 or RAID10 is fast but presumably you don't want to store 4 >copies of your data, 2 on each server. The write performance of RAID5 and >RAID6 is terrible. An expensive RAID card with battery-backed write-through >cache will make it slightly less terrible, but still terrible. > >Regards, > >Brian.I would like to second Brian's suggestions. I have almost exactly this setup and it has worked perfectly for well over a year. The added benefit is that you get exactly 50% of the total storage. If you distribute across RAID5/6 arrays you get significantly less than that (i.e. RAID5 costs you 1 disk and RAID6 costs you two disks for each array). Larry Bates vitalEsafe, Inc.