thr3ads.net - Gluster users - [Gluster-users] Gluster 3.3.0 on CentOS 6

If this information is useful, please help other people find it:
Share via:

Bartek Krawczyk

2012-Oct-18 06:38 UTC

[Gluster-users] Gluster 3.3.0 on CentOS 6 - GigabitEthernet vs InfiniBand

Hi, we've been assembling a small cluster using a Dell M1000e and few
M620s. We've decided to use GlusterFS as our storage solution. Since
our setup has an InfiniBand switch we want to use GlusterFS using rdma
transport.
I've been doing some iozone benchmarking and the results I got are
really strange. There's almost no difference between using Gigabit
Ethernet, InfiniBand IPoIB or InfiniBand RDMA.
To test InfiniBand IPoIB I added peers using IPs on ibX interfaces and
a tcp transport in volumes.
To test InfiniBand RDMA I added peers also using IPs on ibX interfaces
and a rdma transport (it was the only transport on that volume). I've
mounted it using "mount -t glusterfs masterib:/vol4.rdma /home/test".

Please find some plots which compare raw disk iozone benchmarks (I
took the average of all results of iozone -a for each test). The
second plot is the max value of iozone -a results for each type of
connection. As you can see raw disk performance isn't the bottleneck.
In the documentation of GlusterFS 3.3.0 it's said that rdma transport
isn't well supported in 3.3.0 release. But then why there's almost no
difference between InfiniBand IPoIB (10Gbps) and Gigabit Ethernet
(1Gbps) ?

I see GlusterFS 3.3.1 was released on 16th of October. I'll try
upgrading but I don't see any significant changes to RDMA.

Reagards and feel free to chime in with your suggestions

PS. here are some infiniband diagnostic commands to show that it
should be working correctly:

[root at node01 ~]# ibv_rc_pingpong masterib
  local address:  LID 0x0001, QPN 0x4c004a, PSN 0xd90c3e, GID ::
  remote address: LID 0x0002, QPN 0x64004a, PSN 0x64d15d, GID ::
8192000 bytes in 0.01 seconds = 8733.48 Mbit/sec
1000 iters in 0.01 seconds = 7.50 usec/iter

[root at node01 ~]# ibhosts
Ca	: 0x0002c90300384bc0 ports 2 "node02 mlx4_0"
Ca	: 0x0002c90300385450 ports 2 "master mlx4_0"
Ca	: 0x0002c90300385150 ports 2 "node01 mlx4_0"

[root at node01 ~]# ibv_devinfo
hca_id:	mlx4_0
	transport:			InfiniBand (0)
	fw_ver:				2.10.2132
	node_guid:			0002:c903:0038:5150
	sys_image_guid:			0002:c903:0038:5153
	vendor_id:			0x02c9
	vendor_part_id:			4099
	hw_ver:				0x0
	board_id:			DEL0A10210018
	phys_port_cnt:			2
		port:	1
			state:			PORT_ACTIVE (4)
			max_mtu:		2048 (4)
			active_mtu:		2048 (4)
			sm_lid:			4
			port_lid:		1
			port_lmc:		0x00
			link_layer:		InfiniBand

		port:	2
			state:			PORT_DOWN (1)
			max_mtu:		2048 (4)
			active_mtu:		2048 (4)
			sm_lid:			0
			port_lid:		0
			port_lmc:		0x00
			link_layer:		InfiniBand


-- 
Bartek Krawczyk
network and system administrator
-------------- next part --------------
A non-text attachment was scrubbed...
Name: average.jpg
Type: image/jpeg
Size: 71591 bytes
Desc: not available
URL:
<http://supercolony.gluster.org/pipermail/gluster-users/attachments/20121018/8b796715/attachment.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: max.jpg
Type: image/jpeg
Size: 79196 bytes
Desc: not available
URL:
<http://supercolony.gluster.org/pipermail/gluster-users/attachments/20121018/8b796715/attachment-0001.jpg>

Bartek Krawczyk

2012-Oct-18 07:48 UTC

head link

[Gluster-users] Gluster 3.3.0 on CentOS 6 - GigabitEthernet vs InfiniBand

On 18 October 2012 08:44, Ling Ho <ling at slac.stanford.edu>
wrote:> When you mount using rdma, try running some network tools like iftop to see
> if traffics is going through your Ge interface.
>
> If your volume is created with both tcp and rdma, my experience is rdma
does
> not work under 3.3.0 and it will always fall back to tcp.
>
> However ipoib works fine for us. Again you should check where the traffics
> go.I used tcpdump and iftop and confirmed the traffic using IPoIB goes
through ib0 interface, not eth.
When I use rdma transport the traffic doesn't show on neither eth nor
ib0 interface - so I guess it's correctly using RDMA.
I re-ran the iozone -a tests and they're the same. In addition I did a
"dd" test fo read and write on IPoIB and RDMA mounted volume.

IPoIB:
[root at master gluster3]# dd if=/dev/zero of=test bs=100M count=50
50+0 przeczytanych record?w
50+0 zapisanych record?w
skopiowane 5242880000 bajt?w (5,2 GB), 16,997 s, 308 MB/s

[root at master gluster3]# dd if=test of=/dev/null
10240000+0 przeczytanych record?w
10240000+0 zapisanych record?w
skopiowane 5242880000 bajt?w (5,2 GB), 28,4185 s, 184 MB/s


RDMA:
[root at master gluster]# dd if=/dev/zero of=test bs=100M count=50
50+0 przeczytanych record?w
50+0 zapisanych record?w
skopiowane 5242880000 bajt?w (5,2 GB), 70,3636 s, 74,5 MB/s

[root at master gluster]# dd if=test of=/dev/null
10240000+0 przeczytanych record?w
10240000+0 zapisanych record?w
skopiowane 5242880000 bajt?w (5,2 GB), 10,8389 s, 484 MB/s

I did a "sync" between those tests. Funny isn't it? RDMA is much
slower than IPoIB on writing and faster on reading.

I think we'll stick to the IPoIB until something's fixed in glusterfs.
And still - why the results are so similar to Gigabit Ethernet?

Regards

-- 
Bartek Krawczyk
network and system administrator

Ivan Dimitrov

2012-Oct-18 09:16 UTC

head link

[Gluster-users] Gluster 3.3.0 on CentOS 6 - GigabitEthernet vs InfiniBand

On 10/18/12 10:48 AM, Bartek Krawczyk wrote:> On 18 October 2012 08:44, Ling Ho <ling at slac.stanford.edu> wrote:
>
> If your volume is created with both tcp and rdma, my experience is rdma
does
> not work under 3.3.0 and it will always fall back to tcp.I just converted from gbe to infiniband and I was cursing the entire 
last week about this. Please devs: Make sure we can set the transport 
type between peers!

Corey Kovacs

2012-Oct-18 13:39 UTC

head link

[Gluster-users] Gluster 3.3.0 on CentOS 6 - GigabitEthernet vs InfiniBand

One problem right away is that 3.3 doesn't support rdma. I know it still
builds the rdma packages but rdma isn't a supported connection method for
3.3. It was set aside so the 33. release could make it on time. As I
understand it, 3.1 was/is suppposed to be the first 3.x version to fully
support it. I'd upgrade and test again. That or downgrade to 3.2.7 which
does in fact support it right now.

I'm not even sure how you got things mounted using the "rdma"
semantics
with 3.3.

My experiences so far were sort of disappointing until I found out a few
key items about GlusterFS which I'd taken for granted.

1. Stripes are not what you might think. The I/O for a stripe does _not_
fan out as in a raid card. It's an unfortunate use of the term only
describing and allowing you to store files larger than the max size of a
single brick.
2. I/O is done in sync mode so cache coherency isn't an issue and to ensure
the integrity of the data written.
3. The performance of a distributed volume far exceeds that of a stripe for
my use. Again, depends on the size of the bricks.

These are things which effect my experience and now that I at least _think_
I understand them my results make much more sense to me. Generally my
throughput maxes out around 800-900MB/s which is the limit of my disk
storage right now.

As a test I created as large of a volume as I could using ramdisks. I did
this to see just how much of a limiter my disks actually are, and I was
very surprised to see the speed NOT increase across the file system using
an rdma target on version 3.2.6. The local I/O reached 1.9GB/s. So, in
addition to my spindle based limit, I do not believe I am using any more
than a single lane of my IB cards (which are FDR ( 56Gb) Mellanox).

Using the ib test utilities I can indeed max the connection at 6GB/s, which
is really amazing to see but so far I just can't seem to get GlusterFS to
make use of the total B/W available.

Also, you might try putting your cads into "Connected Mode" and
increasing
your MTU to 64k.

Corey
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://supercolony.gluster.org/pipermail/gluster-users/attachments/20121018/08e94510/attachment.html>

Gluster users - Oct 2012 - Gluster 3.3.0 on CentOS 6 - GigabitEthernet vs InfiniBand

[Gluster-users] Gluster 3.3.0 on CentOS 6 - GigabitEthernet vs InfiniBand

[Gluster-users] Gluster 3.3.0 on CentOS 6 - GigabitEthernet vs InfiniBand

[Gluster-users] Gluster 3.3.0 on CentOS 6 - GigabitEthernet vs InfiniBand

[Gluster-users] Gluster 3.3.0 on CentOS 6 - GigabitEthernet vs InfiniBand