lure is that the number of completion queue elements (in a completion queue) we had requested in ibv_create_cq, (1024 * send_count) is less than the maximum supported by the ib hardware (max_cqe =3D 131071). ----- Original Message ----- From: "Jeremy Stout" <stout.jeremy at gmail.com> To: "Raghavendra G" <raghavendra at gluster.com> Cc: gluster-users at gluster.org Sent: Friday, December 3, 2010 4:20:04 PM Subject: Re: [Gluster-users] RDMA Problems with GlusterFS 3.1.1 I patched the source code and rebuilt GlusterFS. Here are the full logs: Server: [2010-12-03 07:08:55.945804] I [glusterd.c:275:init] management: Using /etc/glusterd as working directory [2010-12-03 07:08:55.947692] E [rdma.c:2047:rdma_create_cq] rpc-transport/rdma: max_mr_size =3D 18446744073709551615, max_cq =3D 65408, max_cqe =3D 131071, max_mr =3D 131056 [2010-12-03 07:08:55.953226] E [rdma.c:2079:rdma_create_cq] rpc-transport/rdma: rdma.management: creation of send_cq failed [2010-12-03 07:08:55.953509] E [rdma.c:3785:rdma_get_device] rpc-transport/rdma: rdma.management: could not create CQ [2010-12-03 07:08:55.953582] E [rdma.c:3971:rdma_init] rpc-transport/rdma: could not create rdma device for mthca0 [2010-12-03 07:08:55.953668] E [rdma.c:4803:init] rdma.management: Failed to initialize IB Device [2010-12-03 07:08:55.953691] E [rpc-transport.c:971:rpc_transport_load] rpc-transport: ''rdma'' initialization failed [2010-12-03 07:08:55.953780] I [glusterd.c:96:glusterd_uuid_init] glusterd: generated UUID: 4eb47ca7-227c-49c4-97bd-25ac177b2f6a Given volfile: +------------------------------------------------------------------------------+ 1: volume management 2: type mgmt/glusterd 3: option working-directory /etc/glusterd 4: option transport-type socket,rdma 5: option transport.socket.keepalive-time 10 6: option transport.socket.keepalive-interval 2 7: end-volume 8: +------------------------------------------------------------------------------+ [2010-12-03 07:09:10.244790] I [glusterd-handler.c:785:glusterd_handle_create_volume] glusterd: Received create volume req [2010-12-03 07:09:10.247646] I [glusterd-utils.c:232:glusterd_lock] glusterd: Cluster lock held by 4eb47ca7-227c-49c4-97bd-25ac177b2f6a [2010-12-03 07:09:10.247678] I [glusterd-handler.c:2835:glusterd_op_txn_begin] glusterd: Acquired local lock [2010-12-03 07:09:10.247708] I [glusterd3_1-mops.c:1091:glusterd3_1_cluster_lock] glusterd: Sent lock req to 0 peers [2010-12-03 07:09:10.248038] I [glusterd3_1-mops.c:1233:glusterd3_1_stage_op] glusterd: Sent op req to 0 peers [2010-12-03 07:09:10.251970] I [glusterd3_1-mops.c:1323:glusterd3_1_commit_op] glusterd: Sent op req to 0 peers [2010-12-03 07:09:10.252020] I [glusterd3_1-mops.c:1145:glusterd3_1_cluster_unlock] glusterd: Sent unlock req to 0 peers [2010-12-03 07:09:10.252036] I [glusterd-op-sm.c:4738:glusterd_op_txn_complete] glusterd: Cleared local lock [2010-12-03 07:09:22.11649] I [glusterd-handler.c:936:glusterd_handle_cli_start_volume] glusterd: Received start vol reqfor volume testdir [2010-12-03 07:09:22.11724] I [glusterd-utils.c:232:glusterd_lock] glusterd: Cluster lock held by 4eb47ca7-227c-49c4-97bd-25ac177b2f6a [2010-12-03 07:09:22.11734] I [glusterd-handler.c:2835:glusterd_op_txn_begin] glusterd: Acquired local lock [2010-12-03 07:09:22.11761] I [glusterd3_1-mops.c:1091:glusterd3_1_cluster_lock] glusterd: Sent lock req to 0 peers [2010-12-03 07:09:22.12120] I [glusterd3_1-mops.c:1233:glusterd3_1_stage_op] glusterd: Sent op req to 0 peers [2010-12-03 07:09:22.184403] I [glusterd-utils.c:971:glusterd_volume_start_glusterfs] : About to start glusterfs for brick pgh-submit-1:/mnt/gluster [2010-12-03 07:09:22.229143] I [glusterd3_1-mops.c:1323:glusterd3_1_commit_op] glusterd: Sent op req to 0 peers [2010-12-03 07:09:22.229198] I [glusterd3_1-mops.c:1145:glusterd3_1_cluster_unlock] glusterd: Sent unlock req to 0 peers [2010-12-03 07:09:22.229218] I [glusterd-op-sm.c:4738:glusterd_op_txn_complete] glusterd: Cleared local lock [2010-12-03 07:09:22.240157] I [glusterd-pmap.c:281:pmap_registry_remove] pmap: removing brick (null) on port 24009 Client: [2010-12-03 07:09:00.82784] W [io-stats.c:1644:init] testdir: dangling volume. check volfile [2010-12-03 07:09:00.82824] W [dict.c:1204:data_to_str] dict: @data=3D(nil) [2010-12-03 07:09:00.82836] W [dict.c:1204:data_to_str] dict: @data=3D(nil) [2010-12-03 07:09:00.85980] E [rdma.c:2047:rdma_create_cq] rpc-transport/rdma: max_mr_size =3D 18446744073709551615, max_cq =3D 65408, max_cqe =3D 131071, max_mr =3D 131056 [2010-12-03 07:09:00.92883] E [rdma.c:2079:rdma_create_cq] rpc-transport/rdma: testdir-client-0: creation of send_cq failed [2010-12-03 07:09:00.93156] E [rdma.c:3785:rdma_get_device] rpc-transport/rdma: testdir-client-0: could not create CQ [2010-12-03 07:09:00.93224] E [rdma.c:3971:rdma_init] rpc-transport/rdma: could not create rdma device for mthca0 [2010-12-03 07:09:00.93313] E [rdma.c:4803:init] testdir-client-0: Failed to initialize IB Device [2010-12-03 07:09:00.93332] E [rpc-transport.c:971:rpc_transport_load] rpc-transport: ''rdma'' initialization failed Given volfile: +------------------------------------------------------------------------------+ 1: volume testdir-client-0 2: type protocol/client 3: option remote-host submit-1 4: option remote-subvolume /mnt/gluster 5: option transport-type rdma 6: end-volume 7: 8: volume testdir-write-behind 9: type performance/write-behind 10: subvolumes testdir-client-0 11: end-volume 12: 13: volume testdir-read-ahead 14: type performance/read-ahead 15: subvolumes testdir-write-behind 16: end-volume 17: 18: volume testdir-io-cache 19: type performance/io-cache 20: subvolumes testdir-read-ahead 21: end-volume 22: 23: volume testdir-quick-read 24: type performance/quick-read 25: subvolumes testdir-io-cache 26: end-volume 27: 28: volume testdir-stat-prefetch 29: type performance/stat-prefetch 30: subvolumes testdir-quick-read 31: end-volume 32: 33: volume testdir 34: type debug/io-stats 35: subvolumes testdir-stat-prefetch 36: end-volume +------------------------------------------------------------------------------+ On Fri, Dec 3, 2010 at 12:38 AM, Raghavendra G <raghavendra at gluster.com> wrote:> Hi Jeremy, > > Can you apply the attached patch, rebuild and start glusterfs? Please make sure to send us the logs of glusterfs. > > regards, > ----- Original Message ----- > From: "Jeremy Stout" <stout.jeremy at gmail.com> > To: gluster-users at gluster.org > Sent: Friday, December 3, 2010 6:38:00 AM > Subject: Re: [Gluster-users] RDMA Problems with GlusterFS 3.1.1 > > I''m currently using OFED 1.5.2. > > For the sake of testing, I just compiled GlusterFS 3.1.1 from source, > without any modifications, on two systems that have a 2.6.33.7 kernel > and OFED 1.5.2 built from source. Here are the results: > > Server: > [2010-12-02 21:17:55.886563] I > [glusterd-handler.c:936:glusterd_handle_cli_start_volume] glusterd: > Received start vol reqfor volume testdir > [2010-12-02 21:17:55.886597] I [glusterd-utils.c:232:glusterd_lock] > glusterd: Cluster lock held by 7dd23af5-277e-4ea1-a495-2a9d882287ec > [2010-12-02 21:17:55.886607] I > [glusterd-handler.c:2835:glusterd_op_txn_begin] glusterd: Acquired > local lock > [2010-12-02 21:17:55.886628] I > [glusterd3_1-mops.c:1091:glusterd3_1_cluster_lock] glusterd: Sent lock > req to 0 peers > [2010-12-02 21:17:55.887031] I > [glusterd3_1-mops.c:1233:glusterd3_1_stage_op] glusterd: Sent op req > to 0 peers > [2010-12-02 21:17:56.60427] I > [glusterd-utils.c:971:glusterd_volume_start_glusterfs] : About to > start glusterfs for brick submit-1:/mnt/gluster > [2010-12-02 21:17:56.104896] I > [glusterd3_1-mops.c:1323:glusterd3_1_commit_op] glusterd: Sent op req > to 0 peers > [2010-12-02 21:17:56.104935] I > [glusterd3_1-mops.c:1145:glusterd3_1_cluster_unlock] glusterd: Sent > unlock req to 0 peers > [2010-12-02 21:17:56.104953] I > [glusterd-op-sm.c:4738:glusterd_op_txn_complete] glusterd: Cleared > local lock > [2010-12-02 21:17:56.114764] I > [glusterd-pmap.c:281:pmap_registry_remove] pmap: removing brick (null) > on port 24009 > > Client: > [2010-12-02 21:17:25.503395] W [io-stats.c:1644:init] testdir: > dangling volume. check volfile > [2010-12-02 21:17:25.503434] W [dict.c:1204:data_to_str] dict: @data=3D(nil) > [2010-12-02 21:17:25.503447] W [dict.c:1204:data_to_str] dict: @data=3D(nil) > [2010-12-02 21:17:25.543409] E [rdma.c:2066:rdma_create_cq] > rpc-transport/rdma: testdir-client-0: creation of send_cq failed > [2010-12-02 21:17:25.543660] E [rdma.c:3771:rdma_get_device] > rpc-transport/rdma: testdir-client-0: could not create CQ > [2010-12-02 21:17:25.543725] E [rdma.c:3957:rdma_init] > rpc-transport/rdma: could not create rdma device for mthca0 > [2010-12-02 21:17:25.543812] E [rdma.c:4789:init] testdir-client-0: > Failed to initialize IB Device > [2010-12-02 21:17:25.543830] E > [rpc-transport.c:971:rpc_transport_load] rpc-transport: ''rdma'' > initialization failed > > Thank you for the help so far. > > On Thu, Dec 2, 2010 at 8:02 PM, Craig Carl <craig at gluster.com> wrote: >> Jeremy - >> =C2=A0 What version of OFED are you running? Would you mind install version 1.5.2 >> from source? We have seen this resolve several issues of this type. >> http://www.openfabrics.org/downloads/OFED/ofed-1.5.2/ >> >> >> Thanks, >> >> Craig >> >> --> >> Craig Carl >> Senior Systems Engineer >> Gluster >> >> >> On 12/02/2010 10:05 AM, Jeremy Stout wrote: >>> >>> An another follow-up, I tested several compilations today with >>> different values for send/receive count. I found the maximum value I >>> could use for both variables was 127. With a value of 127, GlusterFS >>> did not produce any errors. However, when I changed the value back to >>> 128, the RDMA errors appeared again. >>> >>> I also tried setting soft/hard "memlock" to unlimited in the >>> limits.conf file, but still ran into RDMA errors on the client side >>> when the count variables were set to 128. >>> >>> On Thu, Dec 2, 2010 at 9:04 AM, Jeremy Stout<stout.jeremy at gmail.com> >>> =C2=A0wrote: >>>> >>>> Thank you for the response. I''ve been testing GlusterFS 3.1.1 on two >>>> different OpenSUSE 11.3 systems. Since both systems generated the same >>>> error messages, I''ll include the output for both. >>>> >>>> System #1: >>>> fs-1:~ # cat /proc/meminfo >>>> MemTotal: =C2=A0 =C2=A0 =C2=A0 16468756 kB >>>> MemFree: =C2=A0 =C2=A0 =C2=A0 =C2=A016126680 kB >>>> Buffers: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 15680 kB >>>> Cached: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 155860 kB >>>> SwapCached: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00 kB >>>> Active: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A065228 kB >>>> Inactive: =C2=A0 =C2=A0 =C2=A0 =C2=A0 123100 kB >>>> Active(anon): =C2=A0 =C2=A0 =C2=A018632 kB >>>> Inactive(anon): =C2=A0 =C2=A0 =C2=A0 48 kB >>>> Active(file): =C2=A0 =C2=A0 =C2=A046596 kB >>>> Inactive(file): =C2=A0 123052 kB >>>> Unevictable: =C2=A0 =C2=A0 =C2=A0 =C2=A01988 kB >>>> Mlocked: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A01988 kB >>>> SwapTotal: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0 kB >>>> SwapFree: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00 kB >>>> Dirty: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 30072 kB >>>> Writeback: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 4 kB >>>> AnonPages: =C2=A0 =C2=A0 =C2=A0 =C2=A0 18780 kB >>>> Mapped: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A012136 kB >>>> Shmem: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 220 kB >>>> Slab: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A039592 kB >>>> SReclaimable: =C2=A0 =C2=A0 =C2=A013108 kB >>>> SUnreclaim: =C2=A0 =C2=A0 =C2=A0 =C2=A026484 kB >>>> KernelStack: =C2=A0 =C2=A0 =C2=A0 =C2=A02360 kB >>>> PageTables: =C2=A0 =C2=A0 =C2=A0 =C2=A0 2036 kB >>>> NFS_Unstable: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00 kB >>>> Bounce: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00 kB >>>> WritebackTmp: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00 kB >>>> CommitLimit: =C2=A0 =C2=A0 8234376 kB >>>> Committed_AS: =C2=A0 =C2=A0 107304 kB >>>> VmallocTotal: =C2=A0 34359738367 kB >>>> VmallocUsed: =C2=A0 =C2=A0 =C2=A0314316 kB >>>> VmallocChunk: =C2=A0 34349860776 kB >>>> HardwareCorrupted: =C2=A0 =C2=A0 0 kB >>>> HugePages_Total: =C2=A0 =C2=A0 =C2=A0 0 >>>> HugePages_Free: =C2=A0 =C2=A0 =C2=A0 =C2=A00 >>>> HugePages_Rsvd: =C2=A0 =C2=A0 =C2=A0 =C2=A00 >>>> HugePages_Surp: =C2=A0 =C2=A0 =C2=A0 =C2=A00 >>>> Hugepagesize: =C2=A0 =C2=A0 =C2=A0 2048 kB >>>> DirectMap4k: =C2=A0 =C2=A0 =C2=A0 =C2=A09856 kB >>>> DirectMap2M: =C2=A0 =C2=A0 3135488 kB >>>> DirectMap1G: =C2=A0 =C2=A013631488 kB >>>> >>>> fs-1:~ # uname -a >>>> Linux fs-1 2.6.32.25-November2010 #2 SMP PREEMPT Mon Nov 1 15:19:55 >>>> EDT 2010 x86_64 x86_64 x86_64 GNU/Linux >>>> >>>> fs-1:~ # ulimit -l >>>> 64 >>>> >>>> System #2: >>>> submit-1:~ # cat /proc/meminfo >>>> MemTotal: =C2=A0 =C2=A0 =C2=A0 16470424 kB >>>> MemFree: =C2=A0 =C2=A0 =C2=A0 =C2=A016197292 kB >>>> Buffers: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 11788 kB >>>> Cached: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A085492 kB >>>> SwapCached: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00 kB >>>> Active: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A039120 kB >>>> Inactive: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A076548 kB >>>> Active(anon): =C2=A0 =C2=A0 =C2=A018532 kB >>>> Inactive(anon): =C2=A0 =C2=A0 =C2=A0 48 kB >>>> Active(file): =C2=A0 =C2=A0 =C2=A020588 kB >>>> Inactive(file): =C2=A0 =C2=A076500 kB >>>> Unevictable: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0 kB >>>> Mlocked: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0 kB >>>> SwapTotal: =C2=A0 =C2=A0 =C2=A067100656 kB >>>> SwapFree: =C2=A0 =C2=A0 =C2=A0 67100656 kB >>>> Dirty: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A024 kB >>>> Writeback: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0 kB >>>> AnonPages: =C2=A0 =C2=A0 =C2=A0 =C2=A0 18408 kB >>>> Mapped: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A011644 kB >>>> Shmem: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 184 kB >>>> Slab: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A034000 kB >>>> SReclaimable: =C2=A0 =C2=A0 =C2=A0 8512 kB >>>> SUnreclaim: =C2=A0 =C2=A0 =C2=A0 =C2=A025488 kB >>>> KernelStack: =C2=A0 =C2=A0 =C2=A0 =C2=A02160 kB >>>> PageTables: =C2=A0 =C2=A0 =C2=A0 =C2=A0 1952 kB >>>> NFS_Unstable: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00 kB >>>> Bounce: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00 kB >>>> WritebackTmp: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00 kB >>>> CommitLimit: =C2=A0 =C2=A075335868 kB >>>> Committed_AS: =C2=A0 =C2=A0 105620 kB >>>> VmallocTotal: =C2=A0 34359738367 kB >>>> VmallocUsed: =C2=A0 =C2=A0 =C2=A0 76416 kB >>>> VmallocChunk: =C2=A0 34359652640 kB >>>> HardwareCorrupted: =C2=A0 =C2=A0 0 kB >>>> HugePages_Total: =C2=A0 =C2=A0 =C2=A0 0 >>>> HugePages_Free: =C2=A0 =C2=A0 =C2=A0 =C2=A00 >>>> HugePages_Rsvd: =C2=A0 =C2=A0 =C2=A0 =C2=A00 >>>> HugePages_Surp: =C2=A0 =C2=A0 =C2=A0 =C2=A00 >>>> Hugepagesize: =C2=A0 =C2=A0 =C2=A0 2048 kB >>>> DirectMap4k: =C2=A0 =C2=A0 =C2=A0 =C2=A07488 kB >>>> DirectMap2M: =C2=A0 =C2=A016769024 kB >>>> >>>> submit-1:~ # uname -a >>>> Linux submit-1 2.6.33.7-November2010 #1 SMP PREEMPT Mon Nov 8 13:49:00 >>>> EST 2010 x86_64 x86_64 x86_64 GNU/Linux >>>> >>>> submit-1:~ # ulimit -l >>>> 64 >>>> >>>> I retrieved the memory information on each machine after starting the >>>> glusterd process. >>>> >>>> On Thu, Dec 2, 2010 at 3:51 AM, Raghavendra G<raghavendra at gluster.com> >>>> =C2=A0wrote: >>>>> >>>>> Hi Jeremy, >>>>> >>>>> can you also get the output of, >>>>> >>>>> #uname -a >>>>> >>>>> #ulimit -l >>>>> >>>>> regards, >>>>> ----- Original Message ----- >>>>> From: "Raghavendra G"<raghavendra at gluster.com> >>>>> To: "Jeremy Stout"<stout.jeremy at gmail.com> >>>>> Cc: gluster-users at gluster.org >>>>> Sent: Thursday, December 2, 2010 10:20:04 AM >>>>> Subject: Re: [Gluster-users] RDMA Problems with GlusterFS 3.1.1 >>>>> >>>>> Hi Jeremy, >>>>> >>>>> In order to diagnoise why completion queue creation is failing (as >>>>> indicated by logs), we want to know what was the free memory available in >>>>> your system when glusterfs was started. >>>>> >>>>> regards, >>>>> ----- Original Message ----- >>>>> From: "Raghavendra G"<raghavendra at gluster.com> >>>>> To: "Jeremy Stout"<stout.jeremy at gmail.com> >>>>> Cc: gluster-users at gluster.org >>>>> Sent: Thursday, December 2, 2010 10:11:18 AM >>>>> Subject: Re: [Gluster-users] RDMA Problems with GlusterFS 3.1.1 >>>>> >>>>> Hi Jeremy, >>>>> >>>>> Yes, there might be some performance decrease. But, it should not affect >>>>> working of rdma. >>>>> >>>>> regards, >>>>> ----- Original Message ----- >>>>> From: "Jeremy Stout"<stout.jeremy at gmail.com> >>>>> To: gluster-users at gluster.org >>>>> Sent: Thursday, December 2, 2010 8:30:20 AM >>>>> Subject: Re: [Gluster-users] RDMA Problems with GlusterFS 3.1.1 >>>>> >>>>> As an update to my situation, I think I have GlusterFS 3.1.1 working >>>>> now. I was able to create and mount RDMA volumes without any errors. >>>>> >>>>> To fix the problem, I had to make the following changes on lines 3562 >>>>> and 3563 in rdma.c: >>>>> options->send_count =3D 32; >>>>> options->recv_count =3D 32; >>>>> >>>>> The values were set to 128. >>>>> >>>>> I''ll run some tests tomorrow to verify that it is working correctly. >>>>> Assuming it does, what would be the expected side-effect of changing >>>>> the values from 128 to 32? Will there be a decrease in performance? >>>>> >>>>> >>>>> On Wed, Dec 1, 2010 at 10:07 AM, Jeremy Stout<stout.jeremy at gmail.com> >>>>> =C2=A0wrote: >>>>>> >>>>>> Here are the results of the test: >>>>>> submit-1:/usr/local/glusterfs/3.1.1/var/log/glusterfs # >>>>>> ibv_srq_pingpong >>>>>> =C2=A0local address: =C2=A0LID 0x0002, QPN 0x000406, PSN 0x703b96, GID :: >>>>>> =C2=A0local address: =C2=A0LID 0x0002, QPN 0x000407, PSN 0x618cc8, GID :: >>>>>> =C2=A0local address: =C2=A0LID 0x0002, QPN 0x000408, PSN 0xd62272, GID :: >>>>>> =C2=A0local address: =C2=A0LID 0x0002, QPN 0x000409, PSN 0x5db5d9, GID :: >>>>>> =C2=A0local address: =C2=A0LID 0x0002, QPN 0x00040a, PSN 0xc51978, GID :: >>>>>> =C2=A0local address: =C2=A0LID 0x0002, QPN 0x00040b, PSN 0x05fd7a, GID :: >>>>>> =C2=A0local address: =C2=A0LID 0x0002, QPN 0x00040c, PSN 0xaa4a51, GID :: >>>>>> =C2=A0local address: =C2=A0LID 0x0002, QPN 0x00040d, PSN 0xb7a676, GID :: >>>>>> =C2=A0local address: =C2=A0LID 0x0002, QPN 0x00040e, PSN 0x56bde2, GID :: >>>>>> =C2=A0local address: =C2=A0LID 0x0002, QPN 0x00040f, PSN 0xa662bc, GID :: >>>>>> =C2=A0local address: =C2=A0LID 0x0002, QPN 0x000410, PSN 0xee27b0, GID :: >>>>>> =C2=A0local address: =C2=A0LID 0x0002, QPN 0x000411, PSN 0x89c683, GID :: >>>>>> =C2=A0local address: =C2=A0LID 0x0002, QPN 0x000412, PSN 0xd025b3, GID :: >>>>>> =C2=A0local address: =C2=A0LID 0x0002, QPN 0x000413, PSN 0xcec8e4, GID :: >>>>>> =C2=A0local address: =C2=A0LID 0x0002, QPN 0x000414, PSN 0x37e5d2, GID :: >>>>>> =C2=A0local address: =C2=A0LID 0x0002, QPN 0x000415, PSN 0x29562e, GID :: >>>>>> =C2=A0remote address: LID 0x000b, QPN 0x000406, PSN 0x3b644e, GID :: >>>>>> =C2=A0remote address: LID 0x000b, QPN 0x000407, PSN 0x173320, GID :: >>>>>> =C2=A0remote address: LID 0x000b, QPN 0x000408, PSN 0xc105ea, GID :: >>>>>> =C2=A0remote address: LID 0x000b, QPN 0x000409, PSN 0x5e5ff1, GID :: >>>>>> =C2=A0remote address: LID 0x000b, QPN 0x00040a, PSN 0xff15b0, GID :: >>>>>> =C2=A0remote address: LID 0x000b, QPN 0x00040b, PSN 0xf0b152, GID :: >>>>>> =C2=A0remote address: LID 0x000b, QPN 0x00040c, PSN 0x4ced49, GID :: >>>>>> =C2=A0remote address: LID 0x000b, QPN 0x00040d, PSN 0x01da0e, GID :: >>>>>> =C2=A0remote address: LID 0x000b, QPN 0x00040e, PSN 0x69459a, GID :: >>>>>> =C2=A0remote address: LID 0x000b, QPN 0x00040f, PSN 0x197c14, GID :: >>>>>> =C2=A0remote address: LID 0x000b, QPN 0x000410, PSN 0xd50228, GID :: >>>>>> =C2=A0remote address: LID 0x000b, QPN 0x000411, PSN 0xbc9b9b, GID :: >>>>>> =C2=A0remote address: LID 0x000b, QPN 0x000412, PSN 0x0870eb, GID :: >>>>>> =C2=A0remote address: LID 0x000b, QPN 0x000413, PSN 0xfb1fbc, GID :: >>>>>> =C2=A0remote address: LID 0x000b, QPN 0x000414, PSN 0x3eefca, GID :: >>>>>> =C2=A0remote address: LID 0x000b, QPN 0x000415, PSN 0xbd64c6, GID :: >>>>>> 8192000 bytes in 0.01 seconds =3D 5917.47 Mbit/sec >>>>>> 1000 iters in 0.01 seconds =3D 11.07 usec/iter >>>>>> >>>>>> fs-1:/usr/local/glusterfs/3.1.1/var/log/glusterfs # ibv_srq_pingpong >>>>>> submit-1 >>>>>> =C2=A0local address: =C2=A0LID 0x000b, QPN 0x000406, PSN 0x3b644e, GID :: >>>>>> =C2=A0local address: =C2=A0LID 0x000b, QPN 0x000407, PSN 0x173320, GID :: >>>>>> =C2=A0local address: =C2=A0LID 0x000b, QPN 0x000408, PSN 0xc105ea, GID :: >>>>>> =C2=A0local address: =C2=A0LID 0x000b, QPN 0x000409, PSN 0x5e5ff1, GID :: >>>>>> =C2=A0local address: =C2=A0LID 0x000b, QPN 0x00040a, PSN 0xff15b0, GID :: >>>>>> =C2=A0local address: =C2=A0LID 0x000b, QPN 0x00040b, PSN 0xf0b152, GID :: >>>>>> =C2=A0local address: =C2=A0LID 0x000b, QPN 0x00040c, PSN 0x4ced49, GID :: >>>>>> =C2=A0local address: =C2=A0LID 0x000b, QPN 0x00040d, PSN 0x01da0e, GID :: >>>>>> =C2=A0local address: =C2=A0LID 0x000b, QPN 0x00040e, PSN 0x69459a, GID :: >>>>>> =C2=A0local address: =C2=A0LID 0x000b, QPN 0x00040f, PSN 0x197c14, GID :: >>>>>> =C2=A0local address: =C2=A0LID 0x000b, QPN 0x000410, PSN 0xd50228, GID :: >>>>>> =C2=A0local address: =C2=A0LID 0x000b, QPN 0x000411, PSN 0xbc9b9b, GID :: >>>>>> =C2=A0local address: =C2=A0LID 0x000b, QPN 0x000412, PSN 0x0870eb, GID :: >>>>>> =C2=A0local address: =C2=A0LID 0x000b, QPN 0x000413, PSN 0xfb1fbc, GID :: >>>>>> =C2=A0local address: =C2=A0LID 0x000b, QPN 0x000414, PSN 0x3eefca, GID :: >>>>>> =C2=A0local address: =C2=A0LID 0x000b, QPN 0x000415, PSN 0xbd64c6, GID :: >>>>>> =C2=A0remote address: LID 0x0002, QPN 0x000406, PSN 0x703b96, GID :: >>>>>> =C2=A0remote address: LID 0x0002, QPN 0x000407, PSN 0x618cc8, GID :: >>>>>> =C2=A0remote address: LID 0x0002, QPN 0x000408, PSN 0xd62272, GID :: >>>>>> =C2=A0remote address: LID 0x0002, QPN 0x000409, PSN 0x5db5d9, GID :: >>>>>> =C2=A0remote address: LID 0x0002, QPN 0x00040a, PSN 0xc51978, GID :: >>>>>> =C2=A0remote address: LID 0x0002, QPN 0x00040b, PSN 0x05fd7a, GID :: >>>>>> =C2=A0remote address: LID 0x0002, QPN 0x00040c, PSN 0xaa4a51, GID :: >>>>>> =C2=A0remote address: LID 0x0002, QPN 0x00040d, PSN 0xb7a676, GID :: >>>>>> =C2=A0remote address: LID 0x0002, QPN 0x00040e, PSN 0x56bde2, GID :: >>>>>> =C2=A0remote address: LID 0x0002, QPN 0x00040f, PSN 0xa662bc, GID :: >>>>>> =C2=A0remote address: LID 0x0002, QPN 0x000410, PSN 0xee27b0, GID :: >>>>>> =C2=A0remote address: LID 0x0002, QPN 0x000411, PSN 0x89c683, GID :: >>>>>> =C2=A0remote address: LID 0x0002, QPN 0x000412, PSN 0xd025b3, GID :: >>>>>> =C2=A0remote address: LID 0x0002, QPN 0x000413, PSN 0xcec8e4, GID :: >>>>>> =C2=A0remote address: LID 0x0002, QPN 0x000414, PSN 0x37e5d2, GID :: >>>>>> =C2=A0remote address: LID 0x0002, QPN 0x000415, PSN 0x29562e, GID :: >>>>>> 8192000 bytes in 0.01 seconds =3D 7423.65 Mbit/sec >>>>>> 1000 iters in 0.01 seconds =3D 8.83 usec/iter >>>>>> >>>>>> Based on the output, I believe it ran correctly. >>>>>> >>>>>> On Wed, Dec 1, 2010 at 9:51 AM, Anand Avati<anand.avati at gmail.com> >>>>>> =C2=A0wrote: >>>>>>> >>>>>>> Can you verify that ibv_srq_pingpong works from the server where this >>>>>>> log >>>>>>> file is from? >>>>>>> >>>>>>> Thanks, >>>>>>> Avati >>>>>>> >>>>>>> On Wed, Dec 1, 2010 at 7:44 PM, Jeremy Stout<stout.jeremy at gmail.com> >>>>>>> =C2=A0wrote: >>>>>>>> >>>>>>>> Whenever I try to start or mount a GlusterFS 3.1.1 volume that uses >>>>>>>> RDMA, I''m seeing the following error messages in the log file on the >>>>>>>> server: >>>>>>>> [2010-11-30 18:37:53.51270] I [nfs.c:652:init] nfs: NFS service >>>>>>>> started >>>>>>>> [2010-11-30 18:37:53.51362] W [dict.c:1204:data_to_str] dict: >>>>>>>> @data=3D(nil) >>>>>>>> [2010-11-30 18:37:53.51375] W [dict.c:1204:data_to_str] dict: >>>>>>>> @data=3D(nil) >>>>>>>> [2010-11-30 18:37:53.59628] E [rdma.c:2066:rdma_create_cq] >>>>>>>> rpc-transport/rdma: testdir-client-0: creation of send_cq failed >>>>>>>> [2010-11-30 18:37:53.59851] E [rdma.c:3771:rdma_get_device] >>>>>>>> rpc-transport/rdma: testdir-client-0: could not create CQ >>>>>>>> [2010-11-30 18:37:53.59925] E [rdma.c:3957:rdma_init] >>>>>>>> rpc-transport/rdma: could not create rdma device for mthca0 >>>>>>>> [2010-11-30 18:37:53.60009] E [rdma.c:4789:init] testdir-client-0: >>>>>>>> Failed to initialize IB Device >>>>>>>> [2010-11-30 18:37:53.60030] E >>>>>>>> [rpc-transport.c:971:rpc_transport_load] >>>>>>>> rpc-transport: ''rdma'' initialization failed >>>>>>>> >>>>>>>> On the client, I see: >>>>>>>> [2010-11-30 18:43:49.653469] W [io-stats.c:1644:init] testdir: >>>>>>>> dangling volume. check volfile >>>>>>>> [2010-11-30 18:43:49.653573] W [dict.c:1204:data_to_str] dict: >>>>>>>> @data=3D(nil) >>>>>>>> [2010-11-30 18:43:49.653607] W [dict.c:1204:data_to_str] dict: >>>>>>>> @data=3D(nil) >>>>>>>> [2010-11-30 18:43:49.736275] E [rdma.c:2066:rdma_create_cq] >>>>>>>> rpc-transport/rdma: testdir-client-0: creation of send_cq failed >>>>>>>> [2010-11-30 18:43:49.736651] E [rdma.c:3771:rdma_get_device] >>>>>>>> rpc-transport/rdma: testdir-client-0: could not create CQ >>>>>>>> [2010-11-30 18:43:49.736689] E [rdma.c:3957:rdma_init] >>>>>>>> rpc-transport/rdma: could not create rdma device for mthca0 >>>>>>>> [2010-11-30 18:43:49.736805] E [rdma.c:4789:init] testdir-client-0: >>>>>>>> Failed to initialize IB Device >>>>>>>> [2010-11-30 18:43:49.736841] E >>>>>>>> [rpc-transport.c:971:rpc_transport_load] rpc-transport: ''rdma'' >>>>>>>> initialization failed >>>>>>>> >>>>>>>> This results in an unsuccessful mount. >>>>>>>> >>>>>>>> I created the mount using the following commands: >>>>>>>> /usr/local/glusterfs/3.1.1/sbin/gluster volume create testdir >>>>>>>> transport rdma submit-1:/exports >>>>>>>> /usr/local/glusterfs/3.1.1/sbin/gluster volume start testdir >>>>>>>> >>>>>>>> To mount the directory, I use: >>>>>>>> mount -t glusterfs submit-1:/testdir /mnt/glusterfs >>>>>>>> >>>>>>>> I don''t think it is an Infiniband problem since GlusterFS 3.0.6 and >>>>>>>> GlusterFS 3.1.0 worked on the same systems. For GlusterFS 3.1.0, the >>>>>>>> commands listed above produced no error messages. >>>>>>>> >>>>>>>> If anyone can provide help with debugging these error messages, it >>>>>>>> would be appreciated. >>>>>>>> _______________________________________________ >>>>>>>> Gluster-users mailing list >>>>>>>> Gluster-users at gluster.org >>>>>>>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users >>>>>>> >>>>> _______________________________________________ >>>>> Gluster-users mailing list >>>>> Gluster-users at gluster.org >>>>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users >>>>> _______________________________________________ >>>>> Gluster-users mailing list >>>>> Gluster-users at gluster.org >>>>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users >>>>> _______________________________________________ >>>>> Gluster-users mailing list >>>>> Gluster-users at gluster.org >>>>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users >>>>> >>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users at gluster.org >>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users >> > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://gluster.org/cgi-bin/mailman/listinfo/gluster-users >