D. Dante Lorenso
2012-Mar-15 04:09 UTC
[Gluster-users] Usage Case: just not getting the performance I was hoping for
All, For our project, we bought 8 new Supermicro servers. Each server is a quad-core Intel cpu with 2U chassis supporting 8 x 7200 RPM Sata drives. To start out, we only populated 2 x 2TB enterprise drives in each server and added all 8 peers with their total of 16 drives as bricks to our gluster pool as distributed replicated (2). The replica worked as follows: 1.1 -> 2.1 1.2 -> 2.2 3.1 -> 4.1 3.2 -> 4.2 5.1 -> 6.1 5.2 -> 6.2 7.1 -> 8.1 7.2 -> 8.2 Where "1.1" above represents server 1, brick 1, etc... We set up 4 gigabit network ports in each server (2 on motherboard and 2 as intel pro dual-nic PCI-express). The network ports were bonded in Linux to the switch giving us 2 "bonded nics" in each server with theoretical 2Gbps throughput aggregate per bonded nic pair. One network was the "san/nas" network for gluster to communicate and the other was the lan interface where Samba would run. After tweaking settings the best we could, we were able to copy files from Mac and Win7 desktops across the network but only able to get 50-60 MB/s transfer speeds tops when sending large files (> 2GB) to gluster. When copying a directory of small files, we get <= 1 MB/s performance! My question is ... is this right? Is this what I should expect from Gluster, or is there something we did wrong? We aren't using super expensive equipment, granted, but I was really hoping for better performance than this given that raw drive speeds using dd show that we can write at 125+ MB/s to each "brick" 2TB disk. Our network switch is a decent Layer 2 D-Link switch (actually, 2 of them stacked with 10Gb cable), and we are only using 1GbE nics rather than infiniband or 10 GbE in the servers. Overall, we spent about 22K on servers where drives where more than 1/3 of that cost due to the Thailand flooding. Me and my team have been tearing apart our entire network to try to see where the performance was lost. We've questioned switches, cables, routers, firewalls, gluster, samba, and even things on the system like ram and motherboards, etc. When using a single Win2008 server with Raid 10 on 4 drives, shared to the network with built-in CIFS, we get much better (near 2x) performance than this 8-server gluster setup using Samba for smb/cifs and a total of 16 drives. From your real-world usage, what kind of performance are you getting from gluster? Is what I'm seeing the best I can do, or do you think I've configured something wrong and need to continue working with it? If I can't get gluster to work, our fail-back plan is to convert these 8 servers into iSCSI targets and mount the storage onto a Win2008 head and continue sharing to the network as before. Personally, I would rather us continue moving toward CentOS 6.2 with Samba and Gluster, but I can't justify the change unless I can deliver the performance. What are your thoughts? -- Dante D. Dante Lorenso dante at lorenso.com
Brian Candler
2012-Mar-15 07:22 UTC
[Gluster-users] Usage Case: just not getting the performance I was hoping for
On Wed, Mar 14, 2012 at 11:09:28PM -0500, D. Dante Lorenso wrote:> get 50-60 MB/s transfer speeds tops when sending large files (> 2GB) > to gluster. When copying a directory of small files, we get <= 1 > MB/s performance! > > My question is ... is this right? Is this what I should expect from > Gluster, or is there something we did wrong? We aren't using super > expensive equipment, granted, but I was really hoping for better > performance than this given that raw drive speeds using dd show that > we can write at 125+ MB/s to each "brick" 2TB disk.Unfortunately I don't have any experience with replicated volumes, but the raw glusterfs protocol is very fast: a single brick which is a 12-disk raid0 stripe can give 500MB/sec easily over 10G ethernet without any tuning. I would expect a distributed volume to work fine too, as it just sends each request to one of N nodes. Striped volumes are unfortunately broken on top of XFS at the moment: http://oss.sgi.com/archives/xfs/2012-03/msg00161.html Replicated volumes, from what I've read, need to touch both servers even for read operations (for the self-healing functionality), and that could be a major bottleneck. But there are a few basic things to check: (1) Are you using XFS for the underlying filesystems? If so, did you mount them with the "inode64" mount option? Without this, XFS performance sucks really badly for filesystems >1TB Without inode64, even untarring files into a single directory will make XFS distribute them between AGs, rather than allocating contiguous space for them. This is a major trip-up and there is currently talk of changing the default to be inode64. (2) I have this in /etc/rc.local: for i in /sys/block/sd*/bdi/read_ahead_kb; do echo 1024 >"$i"; done for i in /sys/block/sd*/queue/max_sectors_kb; do echo 1024 >"$i"; done> If I can't get gluster to work, our fail-back plan is to convert > these 8 servers into iSCSI targets and mount the storage onto a > Win2008 head and continue sharing to the network as before. > Personally, I would rather us continue moving toward CentOS 6.2 with > Samba and Gluster, but I can't justify the change unless I can > deliver the performance.Optimising replicated volumes I can't help with. However if you make a simple RAID10 array on each server, and then join the servers into a distributed gluster volume, I think it will rock. What you lose is the high-availability, i.e. if one server fails a proportion of your data becomes unavailable until you fix it - but that's no worse than your iSCSI proposal (unless you are doing something complex, like drbd replication between pairs of nodes and HA failover of the iSCSI target) BTW, Linux md RAID10 with 'far' layout is really cool; for reads it performs like a RAID0 stripe, and it reduces head seeking for random access. Regards, Brian.
Jeff Darcy
2012-Mar-15 13:39 UTC
[Gluster-users] Usage Case: just not getting the performance I was hoping for
On 03/15/2012 12:09 AM, D. Dante Lorenso wrote:> After tweaking settings the best we could, we were able to copy files > from Mac and Win7 desktops across the network but only able to get 50-60 > MB/s transfer speeds tops when sending large files (> 2GB) to gluster. > When copying a directory of small files, we get <= 1 MB/s performance! > > ... > > When using a single Win2008 server with Raid 10 on 4 drives, shared to > the network with built-in CIFS, we get much better (near 2x) performance > than this 8-server gluster setup using Samba for smb/cifs and a total of > 16 drives.Please bear in mind that GlusterFS in general is optimized for aggregate bandwidth, and single-stream bandwidth is often dominated by the *latency* of the underlying network. Combine this with the fact that the replication piece in particular is very latency-sensitive, and a single copy with replication becomes very much a worst case. If what you're looking for is bandwidth, I suggest testing with many I/O streams and ideally many clients. If you're more concerned with latency, you should probably measure that at both the network and storage levels, in addition to using the GlusterFS tools to measure at that level. That should give you some ideas about where you can drive out latency and improve performance for those use cases.
Possibly Parallel Threads
- default cluster.stripe-block-size for striped volumes on 3.0.x vs 3.3 beta (128kb), performance change if i reduce to a smaller block size?
- RAID options for Gluster
- crash when using the cp command to copy files off a striped gluster dir but not when using rsync
- Recommendations for busy static web server replacement
- I am very confused about strip Stripe what way it hold space?