Hi, I'm running GlusterFS 3.3.1-ubuntu1~precise9 and I'm having some problems with the "rdma" and "tcp,rdma" options I hope someone can help me with. 1. What does "tcp,rdma" actually do - does it let you mix both types of client? (I did a few tests with iozone and found it gave identical performance to the "tcp".) 2. I can't get "rdma" to work, even in the simplest case with a single node. volume create storage transport transport rdma my_server:/data/area volume start storage mount -t glusterfs my_server:storage /mnt/storage The last line hangs. Looking in /var/log/glusterfs I can see the log for the volume: [2013-05-30 06:24:19.605315] E [rdma.c:4604:tcp_connect_finish] 0-storage-client-0: tcp connect to failed (Connection refused) [2013-05-30 06:24:19.605713] W [rdma.c:4187:gf_rdma_disconnect] (-->/usr/sbin/glusterfs(main+0x34d) [0x7f374d38a3ed] (-->/usr/lib/libglusterfs.so.0(+0x3bd17) [0x7f374cf23d17] (-->/usr/lib/glusterfs/3.3.1/rpc-transport/rdma.so(+0x5231) [0x7f3743398231]))) 0-storage-client-0: disconnect called (peer:) [2013-05-30 06:24:19.605763] W [rdma.c:4521:gf_rdma_handshake_pollerr] (-->/usr/sbin/glusterfs(main+0x34d) [0x7f374d38a3ed] (-->/usr/lib/libglusterfs.so.0(+0x3bd17) [0x7f374cf23d17] (-->/usr/lib/glusterfs/3.3.1/rpc-transport/rdma.so(+0x5150) [0x7f3743398150]))) 0-rpc-transport/rdma: storage-client-0: peer () disconnected, cleaning up This block repeats every few seconds - the line "tcp connect to failed" looks like it has lost the server name somehow? Iain -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20130530/ca4a7f48/attachment.html>
On 05/29/2013 10:33 PM, Iain Buchanan wrote:> Hi, > > I'm running GlusterFS 3.3.1-ubuntu1~precise9 and I'm having some > problems with the "rdma" and "tcp,rdma" options I hope someone can > help me with. > > 1. What does "tcp,rdma" actually do - does it let you mix both types > of client? (I did a few tests with iozone and found it gave identical > performance to the "tcp".) > > 2. I can't get "rdma" to work, even in the simplest case with a single > node. > volume create storage transport transport rdma my_server:/data/area > volume start storage > mount -t glusterfs my_server:storage /mnt/storage > > The last line hangs. Looking in /var/log/glusterfs I can see the log > for the volume: > > [2013-05-30 06:24:19.605315] E [rdma.c:4604:tcp_connect_finish] > 0-storage-client-0: *tcp connect to failed (Connection refused)* > [2013-05-30 06:24:19.605713] W [rdma.c:4187:gf_rdma_disconnect] > (-->/usr/sbin/glusterfs(main+0x34d) [0x7f374d38a3ed] > (-->/usr/lib/libglusterfs.so.0(+0x3bd17) [0x7f374cf23d17] > (-->/usr/lib/glusterfs/3.3.1/rpc-transport/rdma.so(+0x5231) > [0x7f3743398231]))) 0-storage-client-0: disconnect called (peer:) > [2013-05-30 06:24:19.605763] W [rdma.c:4521:gf_rdma_handshake_pollerr] > (-->/usr/sbin/glusterfs(main+0x34d) [0x7f374d38a3ed] > (-->/usr/lib/libglusterfs.so.0(+0x3bd17) [0x7f374cf23d17] > (-->/usr/lib/glusterfs/3.3.1/rpc-transport/rdma.so(+0x5150) > [0x7f3743398150]))) 0-rpc-transport/rdma: storage-client-0: peer () > disconnected, cleaning up > > This block repeats every few seconds - the line "tcp connect to > failed" looks like it has lost the server name somehow? > > Iain >If you've installed from the yum repo (http://goo.gl/s077x) that shouldn't be happening. kkeithley applied the patch. If not, rdma's broken in 3.3.[01]. https://bugzilla.redhat.com/show_bug.cgi?id=849122 To mount via rdma when using tcp,rdma, mount -t glusterfs server1:myvol.rdma /mnt/foo -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20130529/520a4d40/attachment.html>
On 13-05-30 01:33 AM, Iain Buchanan wrote:> [2013-05-30 06:24:19.605315] E [rdma.c:4604:tcp_connect_finish] > 0-storage-client-0: *tcp connect to failed (Connection refused)*I've seen soooo many people have this problem in IRC and the problem is firewalled connections. Try again with your firewalls off. Often the case seems to be is that the daemons and clients can't connect to the bricks/mgmt server/etc. and this is how it presents. M. -- Michael Brown | `One of the main causes of the fall of Systems Consultant | the Roman Empire was that, lacking zero, Net Direct Inc. | they had no way to indicate successful ?: +1 519 883 1172 x5106 | termination of their C programs.' - Firth -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20130530/4f4b6f5c/attachment.html>