thr3ads.net - Gluster users - [Gluster-users] Usage Case: just not getting the performance I was hoping for [Mar 2012]

If this information is useful, please help other people find it:
Share via:

D. Dante Lorenso

2012-Mar-15 04:09 UTC

[Gluster-users] Usage Case: just not getting the performance I was hoping for

All,

For our project, we bought 8 new Supermicro servers.  Each server is a 
quad-core Intel cpu with 2U chassis supporting 8 x 7200 RPM Sata drives. 
  To start out, we only populated 2 x 2TB enterprise drives in each 
server and added all 8 peers with their total of 16 drives as bricks to 
our gluster pool as distributed replicated (2).  The replica worked as 
follows:

   1.1 -> 2.1
   1.2 -> 2.2
   3.1 -> 4.1
   3.2 -> 4.2
   5.1 -> 6.1
   5.2 -> 6.2
   7.1 -> 8.1
   7.2 -> 8.2

Where "1.1" above represents server 1, brick 1, etc...

We set up 4 gigabit network ports in each server (2 on motherboard and 2 
as intel pro dual-nic PCI-express).  The network ports were bonded in 
Linux to the switch giving us 2 "bonded nics" in each server with 
theoretical 2Gbps throughput aggregate per bonded nic pair.  One network 
was the "san/nas" network for gluster to communicate and the other was
the lan interface where Samba would run.

After tweaking settings the best we could, we were able to copy files 
from Mac and Win7 desktops across the network but only able to get 50-60 
MB/s transfer speeds tops when sending large files (> 2GB) to gluster. 
When copying a directory of small files, we get <= 1 MB/s performance!

My question is ... is this right?  Is this what I should expect from 
Gluster, or is there something we did wrong?  We aren't using super 
expensive equipment, granted, but I was really hoping for better 
performance than this given that raw drive speeds using dd show that we 
can write at 125+ MB/s to each "brick" 2TB disk.

Our network switch is a decent Layer 2 D-Link switch (actually, 2 of 
them stacked with 10Gb cable), and we are only using 1GbE nics rather 
than infiniband or 10 GbE in the servers.  Overall, we spent about 22K 
on servers where drives where more than 1/3 of that cost due to the 
Thailand flooding.

Me and my team have been tearing apart our entire network to try to see 
where the performance was lost.  We've questioned switches, cables, 
routers, firewalls, gluster, samba, and even things on the system like 
ram and motherboards, etc.

When using a single Win2008 server with Raid 10 on 4 drives, shared to 
the network with built-in CIFS, we get much better (near 2x) performance 
than this 8-server gluster setup using Samba for smb/cifs and a total of 
16 drives.

 From your real-world usage, what kind of performance are you getting 
from gluster?  Is what I'm seeing the best I can do, or do you think 
I've configured something wrong and need to continue working with it?

If I can't get gluster to work, our fail-back plan is to convert these 8 
servers into iSCSI targets and mount the storage onto a Win2008 head and 
continue sharing to the network as before.  Personally, I would rather 
us continue moving toward CentOS 6.2 with Samba and Gluster, but I can't 
justify the change unless I can deliver the performance.

What are your thoughts?

-- Dante

D. Dante Lorenso
dante at lorenso.com

Brian Candler

2012-Mar-15 07:22 UTC

head link

[Gluster-users] Usage Case: just not getting the performance I was hoping for

On Wed, Mar 14, 2012 at 11:09:28PM -0500, D. Dante Lorenso
wrote:> get 50-60 MB/s transfer speeds tops when sending large files (> 2GB)
> to gluster. When copying a directory of small files, we get <= 1
> MB/s performance!
> 
> My question is ... is this right?  Is this what I should expect from
> Gluster, or is there something we did wrong?  We aren't using super
> expensive equipment, granted, but I was really hoping for better
> performance than this given that raw drive speeds using dd show that
> we can write at 125+ MB/s to each "brick" 2TB disk.
Unfortunately I don't have any experience with replicated volumes, but the
raw glusterfs protocol is very fast: a single brick which is a 12-disk raid0
stripe can give 500MB/sec easily over 10G ethernet without any tuning.

I would expect a distributed volume to work fine too, as it just sends each
request to one of N nodes.

Striped volumes are unfortunately broken on top of XFS at the moment:
http://oss.sgi.com/archives/xfs/2012-03/msg00161.html

Replicated volumes, from what I've read, need to touch both servers even for
read operations (for the self-healing functionality), and that could be a
major bottleneck.

But there are a few basic things to check:

(1) Are you using XFS for the underlying filesystems? If so, did you mount
them with the "inode64" mount option?  Without this, XFS performance
sucks
really badly for filesystems >1TB

Without inode64, even untarring files into a single directory will make XFS
distribute them between AGs, rather than allocating contiguous space for
them.

This is a major trip-up and there is currently talk of changing the default
to be inode64.

(2) I have this in /etc/rc.local:

for i in /sys/block/sd*/bdi/read_ahead_kb; do echo 1024 >"$i"; done
for i in /sys/block/sd*/queue/max_sectors_kb; do echo 1024 >"$i";
done
> If I can't get gluster to work, our fail-back plan is to convert
> these 8 servers into iSCSI targets and mount the storage onto a
> Win2008 head and continue sharing to the network as before.
> Personally, I would rather us continue moving toward CentOS 6.2 with
> Samba and Gluster, but I can't justify the change unless I can
> deliver the performance.
Optimising replicated volumes I can't help with.

However if you make a simple RAID10 array on each server, and then join the
servers into a distributed gluster volume, I think it will rock.  What you
lose is the high-availability, i.e.  if one server fails a proportion of
your data becomes unavailable until you fix it - but that's no worse than
your iSCSI proposal (unless you are doing something complex, like drbd
replication between pairs of nodes and HA failover of the iSCSI target)

BTW, Linux md RAID10 with 'far' layout is really cool; for reads it
performs
like a RAID0 stripe, and it reduces head seeking for random access.

Regards,

Brian.

Jeff Darcy

2012-Mar-15 13:39 UTC

head link

[Gluster-users] Usage Case: just not getting the performance I was hoping for

On 03/15/2012 12:09 AM, D. Dante Lorenso wrote:> After tweaking settings the best we could, we were able to copy files 
> from Mac and Win7 desktops across the network but only able to get 50-60 
> MB/s transfer speeds tops when sending large files (> 2GB) to gluster. 
> When copying a directory of small files, we get <= 1 MB/s performance!
> 
> ...
> 
> When using a single Win2008 server with Raid 10 on 4 drives, shared to 
> the network with built-in CIFS, we get much better (near 2x) performance 
> than this 8-server gluster setup using Samba for smb/cifs and a total of 
> 16 drives.
Please bear in mind that GlusterFS in general is optimized for aggregate
bandwidth, and single-stream bandwidth is often dominated by the *latency* of
the underlying network.  Combine this with the fact that the replication piece
in particular is very latency-sensitive, and a single copy with replication
becomes very much a worst case.  If what you're looking for is bandwidth, I
suggest testing with many I/O streams and ideally many clients.  If you're
more
concerned with latency, you should probably measure that at both the network
and storage levels, in addition to using the GlusterFS tools to measure at that
level.  That should give you some ideas about where you can drive out latency
and improve performance for those use cases.

Reasonably Related Threads

Search for more maybe matching threads

Gluster users - Mar 2012 - Usage Case: just not getting the performance I was hoping for

[Gluster-users] Usage Case: just not getting the performance I was hoping for

[Gluster-users] Usage Case: just not getting the performance I was hoping for

[Gluster-users] Usage Case: just not getting the performance I was hoping for

Reasonably Related Threads