Hi Corey,
Let me share the results of testing that i've been doing for the past 5
weeks or so. As in your experience, the results are no where near to what
i've been expecting. What a disappointment. Anyway, here we go.
I am using Centos 6.3 with the latest updates and patches using the latest
QLogic OFED version 1.5.3.x; Qlogic drivers with OFED 1.5.4.x is not compiling
on Centos 6.3. I've also tried vanilla OFED 1.5.4.1 and Mellanox OFED
1.5.3.x with pretty much similar results. I've been testing Glusterfs 3.2.7
and 3.3.0. No significant performance difference between 3.2 and 3.3 brunch.
My hardware is 1 storage server with Mellanox QDR dual port card. Two server
nodes with Qlogic dual port mezzanine cards and QLogic QDR (HP) blade switch.
Storage server uses ZFS made of 4 stripes of 2 disks mirror + 240gb SSD for ZIL
+ 240gb SSD for L2ARC cache. I also enabled compression + deduplication.
Underlying ZFS performance using iozone tests ( iozone -+u -t 2 -F f1 f2 -r 2048
-s 30G) is between 4GB/s and 10GB/s depending on the test levels. Infiniband
fabric tests using rdma were giving between 3 and 4 GB/s. Please note GigaBytes
NOT GigaBits per second.
So, I was expecting to have a throughput of around 2.5 - 3 GB/s over glusterfs
rdma taking into account overheads. yeah, right, wishful thinking it was!!!
I've built my PoC environment and started testing with just one client and
i've been getting around 400-600mb/s tops. Writes were about 20% faster than
reads. Following some performance tuning on the glusterfs and zfs side I've
managed to increase throughput to around 700-800mb/s with writes still being
about 20% faster. To note that adding the "-o" switch to the iozone
command to use the synchronised writes the writes throughput was limited to the
ZIL SSD speed.
While trying to figure out the cause of the bottleneck i've realised that
the bottle neck is coming from the client side as running concurrent test from
two clients would give me about 650mb/s per each client. Doing a bit more
research it seems that the cause of the problem is with FUSE. googleing for this
issue i've found a number of people complaining the limit of fuse throughput
at around 600-700mb/s. There is a kernel patch to address this issue, but the
results of testing from several people showed only a marginal increase in
performance. Guys managed to increase their throughput from around 600mb/s to
about 850mb/s or so. Thus, from what i've read, it's currently not being
possible to achieve speeds over 1GB/s with fuse. This made me wonder the reason
behind choosing to use fuse in the first place for the client side glusterfs.
P.S. If you are looking to use glusterfs as the backend storage for the kvm
virtualisation, I would warn you that it's a tricky business. I've
managed to make things work, but the performance is far worse than any of my
pessimistic expectations! An example - a mounted glusterfs-rdma file system on
the server running kvm would give me around 700-850mb/s throughput. I was only
getting 50mb/s max when doing the test from the vm stored on that partition. In
comparison, nfs would give me around 350-400mb/s. I have never expected glustefs
to perform worse than nfs.
I would be grateful if anyone would share their experience with glusterfs over
infiniband and their tips on improving performance.
cheers
Andrei
----- Original Message -----
From: "Corey Kovacs" <corey.kovacs at gmail.com>
To: gluster-users at gluster.org
Sent: Friday, 7 September, 2012 2:45:48 PM
Subject: [Gluster-users] Throughout over infiniband
Folks,
I finally got my hands on a 4x FDR (56Gb) Infiniband switch and 4 cards to do
some testing of GlusterFS over that interface.
So far, I am not getting the throughput I _think_ I should see.
My config is made up of..
4 dl360-g8's (three bricks and one client)
4 4xFDR, dual port IB cards (one port configured in each card per host)
1 4xFDR 36 port Mellanox Switch (managed and configured)
GlusterFS 3.2.6
RHEL6.3
I have tested the IB cards and get about 6GB between hosts over raw IB. Using
ipoib, I can get about 22Gb/sec. Not too shabby for a first go but I expected
more (cards are in connected mode with MTU of 64k).
My raw speed to the disks (though the buffer cache... I just realized I've
not tested direct mode IO, I'll do that later today) is about 800MB/sec. I
expect to see on the order of 2GB/sec (a little less than 3x800).
When I write a large stream using dd, and watch the bricks I/O I see ~800MB/sec
on each one, but at the end of the test, the report from dd indicates 800MB/sec.
Am I missing something fundamental?
Any pointers would be appreciated,
Thanks!
Corey
_______________________________________________
Gluster-users mailing list
Gluster-users at gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://supercolony.gluster.org/pipermail/gluster-users/attachments/20120909/b7aef193/attachment.html>