Arman Khalatyan
2017-Mar-03 11:18 UTC
[Gluster-users] [ovirt-users] Hot to force glusterfs to use RDMA?
I think there are some bug in the vdsmd checks; 2017-03-03 11:15:42,413 ERROR (jsonrpc/7) [storage.HSM] Could not connect to storageServer (hsm:2391) Traceback (most recent call last): File "/usr/share/vdsm/storage/hsm.py", line 2388, in connectStorageServer conObj.connect() File "/usr/share/vdsm/storage/storageServer.py", line 167, in connect self.getMountObj().getRecord().fs_file) File "/usr/lib/python2.7/site-packages/vdsm/storage/mount.py", line 237, in getRecord (self.fs_spec, self.fs_file)) OSError: [Errno 2] Mount of `10.10.10.44:/GluReplica` at `/rhev/data-center/mnt/glusterSD/10.10.10.44:_GluReplica` does not exist 2017-03-03 11:15:42,416 INFO (jsonrpc/7) [dispatcher] Run and protect: connectStorageServer, Return response: {'statuslist': [{'status': 100, 'id': u'4b2ea911-ef35-4de0-bd11-c4753e6048d8'}]} (logUtils:52) 2017-03-03 11:15:42,417 INFO (jsonrpc/7) [jsonrpc.JsonRpcServer] RPC call StoragePool.connectStorageServer succeeded in 2.63 seconds (__init__:515) 2017-03-03 11:15:44,239 INFO (jsonrpc/2) [jsonrpc.JsonRpcServer] RPC call Host.getAllVmStats succeeded in 0.00 seconds (__init__:515) [root at clei21 ~]# df | grep glu 10.10.10.44:/GluReplica.rdma 3770662912 407818240 3362844672 11% /rhev/data-center/mnt/glusterSD/10.10.10.44:_GluReplica ls "/rhev/data-center/mnt/glusterSD/10.10.10.44:_GluReplica" 09f95051-bc93-4cf5-85dc-16960cee74e4 __DIRECT_IO_TEST__ [root at clei21 ~]# touch /rhev/data-center/mnt/glusterSD/10.10.10.44 \:_GluReplica/testme.txt [root at clei21 ~]# unlink /rhev/data-center/mnt/glusterSD/10.10.10.44 \:_GluReplica/testme.txt On Fri, Mar 3, 2017 at 11:51 AM, Arman Khalatyan <arm2arm at gmail.com> wrote:> Thank you all for the nice hints. > Somehow my host was not able to access the userspace RDMA, after > installing: > yum install -y libmlx4.x86_64 > > I can mount: > /usr/bin/mount -t glusterfs -o backup-volfile-servers=10.10. > 10.44:10.10.10.42:10.10.10.41,transport=rdma 10.10.10.44:/GluReplica /mnt > 10.10.10.44:/GluReplica.rdma 3770662912 407817216 3362845696 > <(336)%20284-5696> 11% /mnt > > Looks the rdma and gluster are working except ovirt GUI:( > > With MountOptions: > backup-volfile-servers=10.10.10.44:10.10.10.42:10.10.10.41,transport=rdma > > I am not able to activate storage. > > > ---Gluster Status ---- > gluster volume status > Status of volume: GluReplica > Gluster process TCP Port RDMA Port Online > Pid > ------------------------------------------------------------ > ------------------ > Brick 10.10.10.44:/zclei22/01/glu 49162 49163 Y > 17173 > Brick 10.10.10.42:/zclei21/01/glu 49156 49157 Y > 17113 > Brick 10.10.10.41:/zclei26/01/glu 49157 49158 Y > 16404 > Self-heal Daemon on localhost N/A N/A Y > 16536 > Self-heal Daemon on clei21.vib N/A N/A Y > 17134 > Self-heal Daemon on 10.10.10.44 N/A N/A Y > 17329 > > Task Status of Volume GluReplica > ------------------------------------------------------------ > ------------------ > There are no active volume tasks > > > -----IB status ----- > > ibstat > CA 'mlx4_0' > CA type: MT26428 > Number of ports: 1 > Firmware version: 2.7.700 > Hardware version: b0 > Node GUID: 0x002590ffff163758 > System image GUID: 0x002590ffff16375b > Port 1: > State: Active > Physical state: LinkUp > Rate: 10 > Base lid: 273 > LMC: 0 > SM lid: 3 > Capability mask: 0x02590868 > Port GUID: 0x002590ffff163759 > Link layer: InfiniBand > > Not bad for SDR switch ! :-P > qperf clei22.vib ud_lat ud_bw > ud_lat: > latency = 23.6 us > ud_bw: > send_bw = 981 MB/sec > recv_bw = 980 MB/sec > > > > > On Fri, Mar 3, 2017 at 9:08 AM, Deepak Naidu <dnaidu at nvidia.com> wrote: > >> >> As you can see from my previous email that the RDMA connection tested >> with qperf. >> >> I think you have wrong command. Your testing *TCP & not RDMA. *Also >> check if you have RDMA & IB modules loaded on your hosts. >> >> root at clei26 ~]# qperf clei22.vib tcp_bw tcp_lat >> tcp_bw: >> bw = 475 MB/sec >> tcp_lat: >> latency = 52.8 us >> [root at clei26 ~]# >> >> >> >> *Please run below command to test RDMA* >> >> >> >> *[root at storageN2 ~]# qperf storageN1 ud_lat ud_bw* >> >> *ud_lat**:* >> >> * latency = 7.51 us* >> >> *ud_bw**:* >> >> * send_bw = 9.21 GB/sec* >> >> * recv_bw = 9.21 GB/sec* >> >> *[root at sc-sdgx-202 ~]#* >> >> >> >> Read qperf man pages for more info. >> >> >> >> * To run a TCP bandwidth and latency test: >> >> qperf myserver tcp_bw tcp_lat >> >> * To run a UDP latency test and then cause the server to terminate: >> >> qperf myserver udp_lat quit >> >> * To measure the RDMA UD latency and bandwidth: >> >> qperf myserver ud_lat ud_bw >> >> * To measure RDMA UC bi-directional bandwidth: >> >> qperf myserver rc_bi_bw >> >> * To get a range of TCP latencies with a message size from 1 to 64K >> >> qperf myserver -oo msg_size:1:64K:*2 -vu tcp_lat >> >> >> >> >> >> *Check if you have RDMA & IB modules loaded* >> >> >> >> lsmod | grep -i ib >> >> >> >> lsmod | grep -i rdma >> >> >> >> >> >> >> >> -- >> >> Deepak >> >> >> >> >> >> >> >> *From:* Arman Khalatyan [mailto:arm2arm at gmail.com] >> *Sent:* Thursday, March 02, 2017 10:57 PM >> *To:* Deepak Naidu >> *Cc:* Rafi Kavungal Chundattu Parambil; gluster-users at gluster.org; >> users; Sahina Bose >> *Subject:* RE: [Gluster-users] [ovirt-users] Hot to force glusterfs to >> use RDMA? >> >> >> >> Dear Deepak, thank you for the hints, which gluster are you using? >> >> As you can see from my previous email that the RDMA connection tested >> with qperf. It is working as expected. In my case the clients are servers >> as well, they are hosts for the ovirt. Disabling selinux is nor recommended >> by ovirt, but i will give a try. >> >> >> >> Am 03.03.2017 7:50 vorm. schrieb "Deepak Naidu" <dnaidu at nvidia.com>: >> >> I have been testing glusterfs over RDMA & below is the command I use. >> Reading up the logs, it looks like your IB(InfiniBand) device is not being >> initialized. I am not sure if u have an issue on the client IB or the >> storage server IB. Also have you configured ur IB devices correctly. I am >> using IPoIB. >> >> Can you check your firewall, disable selinux, I think, you might have >> checked it already ? >> >> >> >> *mount -t glusterfs -o transport=rdma storageN1:/vol0 /mnt/vol0* >> >> >> >> >> >> ? *The below error seems if you have issue starting your volume. >> I had issue, when my transport was set to tcp,rdma. I had to force start my >> volume. If I had set it only to tcp on the volume, the volume would start >> easily.* >> >> >> >> [2017-03-02 11:49:47.829391] E [MSGID: 114022] >> [client.c:2530:client_init_rpc] 0-GluReplica-client-2: failed to >> initialize RPC >> [2017-03-02 11:49:47.829413] E [MSGID: 101019] [xlator.c:433:xlator_init] >> 0-GluReplica-client-2: Initialization of volume 'GluReplica-client-2' >> failed, review your volfile again >> [2017-03-02 11:49:47.829425] E [MSGID: 101066] >> [graph.c:324:glusterfs_graph_init] 0-GluReplica-client-2: initializing >> translator failed >> [2017-03-02 11:49:47.829436] E [MSGID: 101176] >> [graph.c:673:glusterfs_graph_activate] 0-graph: init failed >> >> >> >> ? *The below error seems if you have issue with IB device. If >> not configured properly.* >> >> >> >> [2017-03-02 11:49:47.828996] W [MSGID: 103071] >> [rdma.c:4589:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm event >> channel creation failed [No such device] >> [2017-03-02 11:49:47.829067] W [MSGID: 103055] [rdma.c:4896:init] >> 0-GluReplica-client-2: Failed to initialize IB Device >> [2017-03-02 11:49:47.829080] W [rpc-transport.c:354:rpc_transport_load] >> 0-rpc-transport: 'rdma' initialization failed >> >> >> >> >> >> -- >> >> Deepak >> >> >> >> >> >> *From:* gluster-users-bounces at gluster.org [mailto:gluster-users-bounces@ >> gluster.org] *On Behalf Of *Sahina Bose >> *Sent:* Thursday, March 02, 2017 10:26 PM >> *To:* Arman Khalatyan; gluster-users at gluster.org; Rafi Kavungal >> Chundattu Parambil >> *Cc:* users >> *Subject:* Re: [Gluster-users] [ovirt-users] Hot to force glusterfs to >> use RDMA? >> >> >> >> [Adding gluster users to help with error] >> >> [2017-03-02 11:49:47.828996] W [MSGID: 103071] >> [rdma.c:4589:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm event >> channel creation failed [No such device] >> >> >> >> On Thu, Mar 2, 2017 at 5:36 PM, Arman Khalatyan <arm2arm at gmail.com> >> wrote: >> >> BTW RDMA is working as expected: >> root at clei26 ~]# qperf clei22.vib tcp_bw tcp_lat >> tcp_bw: >> bw = 475 MB/sec >> tcp_lat: >> latency = 52.8 us >> [root at clei26 ~]# >> >> thank you beforehand. >> >> Arman. >> >> >> >> On Thu, Mar 2, 2017 at 12:54 PM, Arman Khalatyan <arm2arm at gmail.com> >> wrote: >> >> just for reference: >> gluster volume info >> >> Volume Name: GluReplica >> Type: Replicate >> Volume ID: ee686dfe-203a-4caa-a691-26353460cc48 >> Status: Started >> Snapshot Count: 0 >> Number of Bricks: 1 x (2 + 1) = 3 >> Transport-type: tcp,rdma >> Bricks: >> Brick1: 10.10.10.44:/zclei22/01/glu >> Brick2: 10.10.10.42:/zclei21/01/glu >> Brick3: 10.10.10.41:/zclei26/01/glu (arbiter) >> Options Reconfigured: >> network.ping-timeout: 30 >> server.allow-insecure: on >> storage.owner-gid: 36 >> storage.owner-uid: 36 >> cluster.data-self-heal-algorithm: full >> features.shard: on >> cluster.server-quorum-type: server >> cluster.quorum-type: auto >> network.remote-dio: enable >> cluster.eager-lock: enable >> performance.stat-prefetch: off >> performance.io-cache: off >> performance.read-ahead: off >> performance.quick-read: off >> performance.readdir-ahead: on >> nfs.disable: on >> >> >> >> [root at clei21 ~]# gluster volume status >> Status of volume: GluReplica >> Gluster process TCP Port RDMA Port Online >> Pid >> ------------------------------------------------------------ >> ------------------ >> Brick 10.10.10.44:/zclei22/01/glu 49158 49159 Y >> 15870 >> Brick 10.10.10.42:/zclei21/01/glu 49156 49157 Y >> 17473 >> Brick 10.10.10.41:/zclei26/01/glu 49153 49154 Y >> 18897 >> Self-heal Daemon on localhost N/A N/A Y >> 17502 >> Self-heal Daemon on 10.10.10.41 N/A N/A Y >> 13353 >> Self-heal Daemon on 10.10.10.44 N/A N/A Y >> 32745 >> >> Task Status of Volume GluReplica >> ------------------------------------------------------------ >> ------------------ >> There are no active volume tasks >> >> >> >> On Thu, Mar 2, 2017 at 12:52 PM, Arman Khalatyan <arm2arm at gmail.com> >> wrote: >> >> I am not able to mount with RDMA over cli.... >> >> Are there some volfile parameters needs to be tuned? >> /usr/bin/mount -t glusterfs -o backup-volfile-servers=10.10.1 >> 0.44:10.10.10.42:10.10.10.41,transport=rdma 10.10.10.44:/GluReplica /mnt >> >> [2017-03-02 11:49:47.795511] I [MSGID: 100030] [glusterfsd.c:2454:main] >> 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.8.9 >> (args: /usr/sbin/glusterfs --volfile-server=10.10.10.44 >> --volfile-server=10.10.10.44 --volfile-server=10.10.10.42 >> --volfile-server=10.10.10.41 --volfile-server-transport=rdma >> --volfile-id=/GluReplica.rdma /mnt) >> [2017-03-02 11:49:47.812699] I [MSGID: 101190] >> [event-epoll.c:628:event_dispatch_epoll_worker] 0-epoll: Started thread >> with index 1 >> [2017-03-02 11:49:47.825210] I [MSGID: 101190] >> [event-epoll.c:628:event_dispatch_epoll_worker] 0-epoll: Started thread >> with index 2 >> [2017-03-02 11:49:47.828996] W [MSGID: 103071] >> [rdma.c:4589:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm event >> channel creation failed [No such device] >> [2017-03-02 11:49:47.829067] W [MSGID: 103055] [rdma.c:4896:init] >> 0-GluReplica-client-2: Failed to initialize IB Device >> [2017-03-02 11:49:47.829080] W [rpc-transport.c:354:rpc_transport_load] >> 0-rpc-transport: 'rdma' initialization failed >> [2017-03-02 11:49:47.829272] W [rpc-clnt.c:1070:rpc_clnt_connection_init] >> 0-GluReplica-client-2: loading of new rpc-transport failed >> [2017-03-02 11:49:47.829325] I [MSGID: 101053] >> [mem-pool.c:641:mem_pool_destroy] 0-GluReplica-client-2: size=588 max=0 >> total=0 >> [2017-03-02 11:49:47.829371] I [MSGID: 101053] >> [mem-pool.c:641:mem_pool_destroy] 0-GluReplica-client-2: size=124 max=0 >> total=0 >> [2017-03-02 11:49:47.829391] E [MSGID: 114022] >> [client.c:2530:client_init_rpc] 0-GluReplica-client-2: failed to >> initialize RPC >> [2017-03-02 11:49:47.829413] E [MSGID: 101019] [xlator.c:433:xlator_init] >> 0-GluReplica-client-2: Initialization of volume 'GluReplica-client-2' >> failed, review your volfile again >> [2017-03-02 11:49:47.829425] E [MSGID: 101066] >> [graph.c:324:glusterfs_graph_init] 0-GluReplica-client-2: initializing >> translator failed >> [2017-03-02 11:49:47.829436] E [MSGID: 101176] >> [graph.c:673:glusterfs_graph_activate] 0-graph: init failed >> [2017-03-02 11:49:47.830003] W [glusterfsd.c:1327:cleanup_and_exit] >> (-->/usr/sbin/glusterfs(mgmt_getspec_cbk+0x3c1) [0x7f524c9dbeb1] >> -->/usr/sbin/glusterfs(glusterfs_process_volfp+0x172) [0x7f524c9d65d2] >> -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x7f524c9d5b4b] ) 0-: >> received signum (1), shutting down >> [2017-03-02 11:49:47.830053] I [fuse-bridge.c:5794:fini] 0-fuse: >> Unmounting '/mnt'. >> [2017-03-02 11:49:47.831014] W [glusterfsd.c:1327:cleanup_and_exit] >> (-->/lib64/libpthread.so.0(+0x7dc5) [0x7f524b343dc5] >> -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x7f524c9d5cd5] >> -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x7f524c9d5b4b] ) 0-: >> received signum (15), shutting down >> [2017-03-02 11:49:47.831014] W [glusterfsd.c:1327:cleanup_and_exit] >> (-->/lib64/libpthread.so.0(+0x7dc5) [0x7f524b343dc5] >> -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x7f524c9d5cd5] >> -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x7f524c9d5b4b] ) 0-: >> received signum (15), shutting down >> >> >> >> On Thu, Mar 2, 2017 at 12:11 PM, Sahina Bose <sabose at redhat.com> wrote: >> >> You will need to pass additional mount options while creating the storage >> domain (transport=rdma) >> >> Please let us know if this works. >> >> >> >> On Thu, Mar 2, 2017 at 2:42 PM, Arman Khalatyan <arm2arm at gmail.com> >> wrote: >> >> Hi, >> >> Are there way to force the connections over RDMA only? >> >> If I check host mounts I cannot see rdma mount option: >> mount -l| grep gluster >> 10.10.10.44:/GluReplica on /rhev/data-center/mnt/glusterSD/10.10.10.44:_GluReplica >> type fuse.glusterfs (rw,relatime,user_id=0,group_i >> d=0,default_permissions,allow_other,max_read=131072) >> >> I have glusterized 3 nodes: >> >> GluReplica >> Volume ID: >> ee686dfe-203a-4caa-a691-26353460cc48 >> Volume Type: >> Replicate (Arbiter) >> Replica Count: >> 2 + 1 >> Number of Bricks: >> 3 >> Transport Types: >> TCP, RDMA >> Maximum no of snapshots: >> 256 >> Capacity: >> 3.51 TiB total, 190.56 GiB used, 3.33 TiB free >> >> >> >> _______________________________________________ >> Users mailing list >> Users at ovirt.org >> http://lists.ovirt.org/mailman/listinfo/users >> >> >> >> >> >> >> >> >> >> >> ------------------------------ >> >> This email message is for the sole use of the intended recipient(s) and >> may contain confidential information. Any unauthorized review, use, >> disclosure or distribution is prohibited. If you are not the intended >> recipient, please contact the sender by reply email and destroy all copies >> of the original message. >> ------------------------------ >> >> >> > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170303/973c76a1/attachment.html>
Denis Chaplygin
2017-Mar-06 11:44 UTC
[Gluster-users] [ovirt-users] Hot to force glusterfs to use RDMA?
Hello! On Fri, Mar 3, 2017 at 12:18 PM, Arman Khalatyan <arm2arm at gmail.com> wrote:> I think there are some bug in the vdsmd checks; > > OSError: [Errno 2] Mount of `10.10.10.44:/GluReplica` at > `/rhev/data-center/mnt/glusterSD/10.10.10.44:_GluReplica` does not exist >> > 10.10.10.44:/GluReplica.rdma 3770662912 407818240 3362844672 11% > /rhev/data-center/mnt/glusterSD/10.10.10.44:_GluReplica >I suppose, that vdsm is not able to handle that .rdma suffix on volume path. Could you please file a bug for that issue to track it? -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170306/37f139cd/attachment.html>
Mohammed Rafi K C
2017-Mar-06 11:56 UTC
[Gluster-users] [ovirt-users] Hot to force glusterfs to use RDMA?
I will see what we can do from gluster side to fix this. I will get back to you . Regards Rafi KC On 03/06/2017 05:14 PM, Denis Chaplygin wrote:> Hello! > > On Fri, Mar 3, 2017 at 12:18 PM, Arman Khalatyan <arm2arm at gmail.com > <mailto:arm2arm at gmail.com>> wrote: > > I think there are some bug in the vdsmd checks; > > OSError: [Errno 2] Mount of `10.10.10.44:/GluReplica` at > `/rhev/data-center/mnt/glusterSD/10.10.10.44:_GluReplica` does not > exist > > > > > 10.10.10.44:/GluReplica.rdma 3770662912 407818240 3362844672 > 11% /rhev/data-center/mnt/glusterSD/10.10.10.44:_GluReplica > > > I suppose, that vdsm is not able to handle that .rdma suffix on volume > path. Could you please file a bug for that issue to track it?-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170306/6090d390/attachment.html>