Dear Gluster-Users, I am experiencing RDMA problems. I have installed Ubuntu 16.04.4 running with 4.15.0-13-generic kernel, MLNX_OFED_LINUX-4.3-1.0.1.0-ubuntu16.04-x86_64 to 4 different servers. All of them has Mellanox ConnectX-4 LX dual port NICs. These four servers are connected via Mellanox SN2100 Switch. I have installed GlusterFS Server v3.10 (from Ubuntu PPA) to 3 servers. These 3 boxes are running as gluster cluster. Additionally, I have installed Glusterfs Client to the last one. I have created Gluster Volume with this command: # gluster volume create db transport rdma replica 3 arbiter 1 gluster1:/storage/db/ gluster2:/storage/db/ cinder:/storage/db force (network.ping-timeout is 3) Then I have mounted this volume using mount command below. mount -t glusterfs -o transport=rdma gluster1:/db /db After mountings "/db", I can access the files. The problem is, when I reboot one of the cluster nodes, fuse client gives this error below and hangs. [2018-04-17 07:42:55.506422] W [MSGID: 103070] [rdma.c:4284:gf_rdma_handle_failed_send_completion] 0-rpc-transport/rdma: *send work request on `mlx5_0' returned error wc.status = 5, wc.vendor_err = 245, post->buf = 0x7f8b92016000, wc.byte_len = 0, post->reused = 135* When I change transport mode from rdma to tcp, fuse client works well. No hangs. I also tried Gluster 3.8, 3.10, 4.0.0 and 4.0.1 (from Ubuntu PPAs) on Ubuntu 16.04.4 and Centos 7.4. But results were the same. Thank you. Necati. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180425/30df3475/attachment.html>
Is infiniband itself working fine? You can run tools like ibv_rc_pingpong to find out. On Wed, Apr 25, 2018 at 12:23 PM, Necati E. SISECI <siseci at gmail.com> wrote:> Dear Gluster-Users, > > I am experiencing RDMA problems. > > I have installed Ubuntu 16.04.4 running with 4.15.0-13-generic kernel, > MLNX_OFED_LINUX-4.3-1.0.1.0-ubuntu16.04-x86_64 to 4 different servers. > All of them has Mellanox ConnectX-4 LX dual port NICs. These four servers > are connected via Mellanox SN2100 Switch. > > I have installed GlusterFS Server v3.10 (from Ubuntu PPA) to 3 servers. > These 3 boxes are running as gluster cluster. Additionally, I have > installed Glusterfs Client to the last one. > > I have created Gluster Volume with this command: > > # gluster volume create db transport rdma replica 3 arbiter 1 > gluster1:/storage/db/ gluster2:/storage/db/ cinder:/storage/db force > > (network.ping-timeout is 3) > > Then I have mounted this volume using mount command below. > > mount -t glusterfs -o transport=rdma gluster1:/db /db > > After mountings "/db", I can access the files. > > The problem is, when I reboot one of the cluster nodes, fuse client gives > this error below and hangs. > > [2018-04-17 07:42:55.506422] W [MSGID: 103070] [rdma.c:4284:gf_rdma_handle_failed_send_completion] > 0-rpc-transport/rdma: *send work request on `mlx5_0' returned error > wc.status = 5, wc.vendor_err = 245, post->buf = 0x7f8b92016000, wc.byte_len > = 0, post->reused = 135* > > When I change transport mode from rdma to tcp, fuse client works well. No > hangs. > > I also tried Gluster 3.8, 3.10, 4.0.0 and 4.0.1 (from Ubuntu PPAs) on > Ubuntu 16.04.4 and Centos 7.4. But results were the same. > > Thank you. > Necati. > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://lists.gluster.org/mailman/listinfo/gluster-users >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180425/70d03968/attachment.html>
Thank you for your mail. ibv_rc_pingpong seems working between servers and client. Also udaddy, ucmatose, rping etc are working. root at gluster1:~# ibv_rc_pingpong -d mlx5_0 -g 0 ? local address:? LID 0x0000, QPN 0x0001e4, PSN 0x10090e, GID fe80::ee0d:9aff:fec0:1dc8 ? remote address: LID 0x0000, QPN 0x00014c, PSN 0x09402b, GID fe80::ee0d:9aff:fec0:1b14 8192000 bytes in 0.01 seconds = 7964.03 Mbit/sec 1000 iters in 0.01 seconds = 8.23 usec/iter root at cinder:~# ibv_rc_pingpong -g 0 -d mlx5_0 gluster1 ? local address:? LID 0x0000, QPN 0x00014c, PSN 0x09402b, GID fe80::ee0d:9aff:fec0:1b14 ? remote address: LID 0x0000, QPN 0x0001e4, PSN 0x10090e, GID fe80::ee0d:9aff:fec0:1dc8 8192000 bytes in 0.01 seconds = 8424.73 Mbit/sec 1000 iters in 0.01 seconds = 7.78 usec/iter Thank you. Necati. On 25-04-2018 12:27, Raghavendra Gowdappa wrote:> Is infiniband itself working fine? You can run tools like > ibv_rc_pingpong to find out. > > On Wed, Apr 25, 2018 at 12:23 PM, Necati E. SISECI <siseci at gmail.com > <mailto:siseci at gmail.com>> wrote: > > Dear Gluster-Users, > > I am experiencing RDMA problems. > > I have installed Ubuntu 16.04.4 running with 4.15.0-13-generic > kernel, MLNX_OFED_LINUX-4.3-1.0.1.0-ubuntu16.04-x86_64 to 4 > different servers. All of them has Mellanox ConnectX-4 LX dual > port NICs. These four servers are connected via Mellanox SN2100 > Switch. > > I have installed GlusterFS Server v3.10 (from Ubuntu PPA) to 3 > servers. These 3 boxes are running as gluster cluster. > Additionally, I have installed Glusterfs Client to the last one. > > I have created Gluster Volume with this command: > > # gluster volume create db transport rdma replica 3 arbiter 1 > gluster1:/storage/db/ gluster2:/storage/db/ cinder:/storage/db force > > (network.ping-timeout is 3) > > Then I have mounted this volume using mount command below. > > mount -t glusterfs -o transport=rdma gluster1:/db /db > > After mountings "/db", I can access the files. > > The problem is, when I reboot one of the cluster nodes, fuse > client gives this error below and hangs. > > [2018-04-17 07:42:55.506422] W [MSGID: 103070] > [rdma.c:4284:gf_rdma_handle_failed_send_completion] > 0-rpc-transport/rdma: *send work request on `mlx5_0' returned > error wc.status = 5, wc.vendor_err = 245, post->buf > 0x7f8b92016000, wc.byte_len = 0, post->reused = 135* > > When I change transport mode from rdma to tcp, fuse client works > well. No hangs. > > I also tried Gluster 3.8, 3.10, 4.0.0 and 4.0.1 (from Ubuntu PPAs) > on Ubuntu 16.04.4 and Centos 7.4. But results were the same. > > Thank you. > > Necati. > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org <mailto:Gluster-users at gluster.org> > http://lists.gluster.org/mailman/listinfo/gluster-users > <http://lists.gluster.org/mailman/listinfo/gluster-users> > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180425/d7f2e4ec/attachment.html>