Hi, I''ve recompiled Lustre 1.5.95 under SLES10 with OFED-1.1-rc6 how do you then use it with the o2ib driver ? Do i need to edit the following line ? options lnet networks= ??? If i use lctl list_nids, i can only see the tcp interface. do i need to do something else ? would be grateful for an example! (works fine with tcp but would really like to use it with openib gen2). Thierry.
Thierry, I''m using ip2nets in this manner: options lnet ip2nets="o2ib0 10.10.101.[0-255]; tcp0(eth0) 10.10.100.[0-255]" networks=o2ib should work as well. paul Thierry Delaitre wrote:>Hi, > >I''ve recompiled Lustre 1.5.95 under SLES10 with OFED-1.1-rc6 > >how do you then use it with the o2ib driver ? > >Do i need to edit the following line ? > >options lnet networks= ??? > >If i use lctl list_nids, i can only see the tcp interface. do i need to do >something else ? > >would be grateful for an example! (works fine with tcp but would really >like to use it with openib gen2). > >Thierry. > >_______________________________________________ >Lustre-discuss mailing list >Lustre-discuss@clusterfs.com >https://mail.clusterfs.com/mailman/listinfo/lustre-discuss >
To the Lustre community, Due to a recent reorganization, I''ll be leaving my job with Hewlett-Packard and will no longer be contributing to the Lustre development. I wish you all well, and will watch from sidelines as the world of Lustre evolves. It has been a pleasure working with all of you. Enjoy, Don Capps capps@iozone.org
On Fri, 22 Sep 2006, pauln wrote:> Thierry, > I''m using ip2nets in this manner: > options lnet ip2nets="o2ib0 10.10.101.[0-255]; tcp0(eth0) > 10.10.100.[0-255]" > > networks=o2ib should work as well. > paulPaul, thanks. I currently have a problem with the o2ib''s ko2iblnd module but my question is: does the o2ib0 uses IPOIB or RDMA ? I believe it uses RDMA eventough the ip address of ib0 is specified. ko2iblnd: disagrees about version of symbol ib_create_cq ko2iblnd: Unknown symbol ib_create_cq ko2iblnd: no version for "rdma_resolve_addr" found: kernel tainted. ko2iblnd: disagrees about version of symbol ib_dereg_mr ko2iblnd: Unknown symbol ib_dereg_mr ko2iblnd: disagrees about version of symbol ib_destroy_cq ko2iblnd: Unknown symbol ib_destroy_cq ko2iblnd: disagrees about version of symbol ib_get_dma_mr ko2iblnd: Unknown symbol ib_get_dma_mr ko2iblnd: disagrees about version of symbol ib_alloc_pd ko2iblnd: Unknown symbol ib_alloc_pd ko2iblnd: disagrees about version of symbol ib_modify_qp ko2iblnd: Unknown symbol ib_modify_qp ko2iblnd: disagrees about version of symbol ib_dealloc_pd ko2iblnd: Unknown symbol ib_dealloc_pd LustreError: 3982:0:(api-ni.c:1002:lnet_startup_lndnis()) Can''t load LND o2ib, module ko2iblnd, rc=256 Thierry.> > Thierry Delaitre wrote: > > >Hi, > > > >I''ve recompiled Lustre 1.5.95 under SLES10 with OFED-1.1-rc6 > > > >how do you then use it with the o2ib driver ? > > > >Do i need to edit the following line ? > > > >options lnet networks= ??? > > > >If i use lctl list_nids, i can only see the tcp interface. do i need to do > >something else ? > > > >would be grateful for an example! (works fine with tcp but would really > >like to use it with openib gen2). > > > >Thierry.
I believe the ib0 address is used by ipnets to determine whether the module should be loaded on a given node. Technically you don''t need ipoib to run the infiniband nal - but (I could be wrong here) you do need some sort of ip connection on your nodes so that the infiniband nals can initialize their peers. If this is the case then the node part of the lustre nid (ie oss0 in oss0@o2ib) should be associated with an ip address. As far as your module error below, it looks like you have an incompatibility of some sort. I haven''t seen this particular error. My guess is that your lustre and ib modules were compiled for different kernels. Thierry Delaitre wrote:>On Fri, 22 Sep 2006, pauln wrote: > > > >>Thierry, >>I''m using ip2nets in this manner: >> options lnet ip2nets="o2ib0 10.10.101.[0-255]; tcp0(eth0) >>10.10.100.[0-255]" >> >>networks=o2ib should work as well. >>paul >> >> > >Paul, > >thanks. I currently have a problem with the o2ib''s ko2iblnd module but my >question is: does the o2ib0 uses IPOIB or RDMA ? I believe it uses RDMA >eventough the ip address of ib0 is specified. > >ko2iblnd: disagrees about version of symbol ib_create_cq >ko2iblnd: Unknown symbol ib_create_cq >ko2iblnd: no version for "rdma_resolve_addr" found: kernel tainted. >ko2iblnd: disagrees about version of symbol ib_dereg_mr >ko2iblnd: Unknown symbol ib_dereg_mr >ko2iblnd: disagrees about version of symbol ib_destroy_cq >ko2iblnd: Unknown symbol ib_destroy_cq >ko2iblnd: disagrees about version of symbol ib_get_dma_mr >ko2iblnd: Unknown symbol ib_get_dma_mr >ko2iblnd: disagrees about version of symbol ib_alloc_pd >ko2iblnd: Unknown symbol ib_alloc_pd >ko2iblnd: disagrees about version of symbol ib_modify_qp >ko2iblnd: Unknown symbol ib_modify_qp >ko2iblnd: disagrees about version of symbol ib_dealloc_pd >ko2iblnd: Unknown symbol ib_dealloc_pd >LustreError: 3982:0:(api-ni.c:1002:lnet_startup_lndnis()) Can''t load LND o2ib, module ko2iblnd, rc=256 > >Thierry. > > > >>Thierry Delaitre wrote: >> >> >> >>>Hi, >>> >>>I''ve recompiled Lustre 1.5.95 under SLES10 with OFED-1.1-rc6 >>> >>>how do you then use it with the o2ib driver ? >>> >>>Do i need to edit the following line ? >>> >>>options lnet networks= ??? >>> >>>If i use lctl list_nids, i can only see the tcp interface. do i need to do >>>something else ? >>> >>>would be grateful for an example! (works fine with tcp but would really >>>like to use it with openib gen2). >>> >>>Thierry. >>> >>> > >_______________________________________________ >Lustre-discuss mailing list >Lustre-discuss@clusterfs.com >https://mail.clusterfs.com/mailman/listinfo/lustre-discuss > >
Thierry,> I believe the ib0 address is used by ipnets to determine whether the > module should be loaded on a given node. Technically you don''t need > ipoib to run the infiniband nal - but (I could be wrong here) you do > need some sort of ip connection on your nodes so that the infiniband > nals can initialize their peers. If this is the case then the node > part of the lustre nid (ie oss0 in oss0@o2ib) should be associated > with an ip address.The OpenFabrics LND (o2iblnd) uses the same address resolution stuff as its IPoIB does, so you do in fact need IPoIB to be working for o2iblnd to work. Make sure you can ping everywhere over IPoIB before you start.> As far as your module error below, it looks like you have an > incompatibility of some sort. I haven''t seen this particular > error. My guess is that your lustre and ib modules were compiled for > different kernels.I agree. You''ll have to check that both lustre and your IB modules were built against the kernel you''re running.> Thierry Delaitre wrote: > > >Paul, > > > >thanks. I currently have a problem with the o2ib''s ko2iblnd module > >but my question is: does the o2ib0 uses IPOIB or RDMA ? I believe > >it uses RDMA eventough the ip address of ib0 is specified.RDMA -- Cheers, Eric --------------------------------------------------- |Eric Barton Barton Software | |9 York Gardens Tel: +44 (117) 330 1575 | |Clifton Mobile: +44 (7909) 680 356 | |Bristol BS8 4LL Fax: call first | |United Kingdom E-Mail: eeb@bartonsoftware.com| ---------------------------------------------------
Hi In fact the IP address is used to identify the other endpoint to establish an IB native "connection". That connection traffic goes over IP. /etc/modprobe.conf can also use an IP address to identify a collection of IP addresses as a Lustre network, when Lustre routing is used. The o2ib uses verbs, and is below the IPOIB stack. If you want to run IPOIB you need to use the socknal which is (was?) loaded by default. - Peter -> -----Original Message----- > From: lustre-discuss-bounces@clusterfs.com > [mailto:lustre-discuss-bounces@clusterfs.com] On Behalf Of pauln > Sent: Friday, September 22, 2006 2:39 PM > To: Thierry Delaitre > Cc: lustre-discuss@clusterfs.com > Subject: Re: [Lustre-discuss] howto for o2ib & lustre ? > > I believe the ib0 address is used by ipnets to determine > whether the module should be loaded on a given node. > Technically you don''t need ipoib to run the infiniband nal - > but (I could be wrong here) you do need some sort of ip > connection on your nodes so that the infiniband nals can > initialize their peers. If this is the case then the node > part of the lustre nid (ie oss0 in oss0@o2ib) should be > associated with an ip address. > > As far as your module error below, it looks like you have an > incompatibility of some sort. I haven''t seen this particular > error. My guess is that your lustre and ib modules were > compiled for different kernels. > > Thierry Delaitre wrote: > > >On Fri, 22 Sep 2006, pauln wrote: > > > > > > > >>Thierry, > >>I''m using ip2nets in this manner: > >> options lnet ip2nets="o2ib0 10.10.101.[0-255]; tcp0(eth0) > >>10.10.100.[0-255]" > >> > >>networks=o2ib should work as well. > >>paul > >> > >> > > > >Paul, > > > >thanks. I currently have a problem with the o2ib''s ko2iblnd > module but > >my question is: does the o2ib0 uses IPOIB or RDMA ? I > believe it uses > >RDMA eventough the ip address of ib0 is specified. > > > >ko2iblnd: disagrees about version of symbol ib_create_cq > >ko2iblnd: Unknown symbol ib_create_cq > >ko2iblnd: no version for "rdma_resolve_addr" found: kernel tainted. > >ko2iblnd: disagrees about version of symbol ib_dereg_mr > >ko2iblnd: Unknown symbol ib_dereg_mr > >ko2iblnd: disagrees about version of symbol ib_destroy_cq > >ko2iblnd: Unknown symbol ib_destroy_cq > >ko2iblnd: disagrees about version of symbol ib_get_dma_mr > >ko2iblnd: Unknown symbol ib_get_dma_mr > >ko2iblnd: disagrees about version of symbol ib_alloc_pd > >ko2iblnd: Unknown symbol ib_alloc_pd > >ko2iblnd: disagrees about version of symbol ib_modify_qp > >ko2iblnd: Unknown symbol ib_modify_qp > >ko2iblnd: disagrees about version of symbol ib_dealloc_pd > >ko2iblnd: Unknown symbol ib_dealloc_pd > >LustreError: 3982:0:(api-ni.c:1002:lnet_startup_lndnis()) Can''t load > >LND o2ib, module ko2iblnd, rc=256 > > > >Thierry. > > > > > > > >>Thierry Delaitre wrote: > >> > >> > >> > >>>Hi, > >>> > >>>I''ve recompiled Lustre 1.5.95 under SLES10 with OFED-1.1-rc6 > >>> > >>>how do you then use it with the o2ib driver ? > >>> > >>>Do i need to edit the following line ? > >>> > >>>options lnet networks= ??? > >>> > >>>If i use lctl list_nids, i can only see the tcp interface. > do i need > >>>to do something else ? > >>> > >>>would be grateful for an example! (works fine with tcp but would > >>>really like to use it with openib gen2). > >>> > >>>Thierry. > >>> > >>> > > > >_______________________________________________ > >Lustre-discuss mailing list > >Lustre-discuss@clusterfs.com > >https://mail.clusterfs.com/mailman/listinfo/lustre-discuss > > > > > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss@clusterfs.com > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
On Fri, 22 Sep 2006, pauln wrote:> I believe the ib0 address is used by ipnets to determine whether the module > should be loaded on a given node. Technically you don''t need ipoib to > run the > infiniband nal - but (I could be wrong here) you do need some sort of ip > connection > on your nodes so that the infiniband nals can initialize their peers. If > this is > the case then the node part of the lustre nid (ie oss0 in oss0@o2ib) > should be > associated with an ip address.thanks for the above info from Paul,Eric,Peter! Regarding the issue below, i agree that it is something like a mismatch of libs or modules and it''s probably caused by the fact i''m not too clear on the steps to compile lustre with o2ib and ofed. the steps i did are as follows: 1) install sles10 linux-2.6.16.21-0.8 kernel-source 2) patch it with 2.6-sles10 lustre kernel patch series 3) install new kernel and reboot 4) download OFED-1.1-rc6.tgz and use install script 5) rm -fr /usr/src/linux-2.6.16.21-0.8/drivers/infiniband rm -fr /usr/src/linux-2.6.16.21-0.8/include/rdma ln -s /usr/local/ofed/src/openib-1.1/drivers/infiniband/ /usr/src/linux-2.6.16.21-0.8/drivers/infiniband 6) compile lustre with gen2 support. maybe the problem is that i had the following enabled when i recompiled the kernel for step 2 & 3 ? it seems the linux kernel 2.6.16 comes with infiniband support. do i need to disable this before compiling it in step 2 & 3 or does ofed replaces the native modules of the kernel built in steps 2 & 3 ? hope this makes sense! # InfiniBand support CONFIG_INFINIBAND=m CONFIG_INFINIBAND_USER_MAD=m CONFIG_INFINIBAND_USER_ACCESS=m CONFIG_INFINIBAND_MTHCA=m CONFIG_INFINIBAND_MTHCA_DEBUG=y CONFIG_INFINIBAND_IPOIB=m CONFIG_INFINIBAND_IPOIB_DEBUG=y # CONFIG_INFINIBAND_IPOIB_DEBUG_DATA is not set CONFIG_INFINIBAND_SRP=m Thierry.> As far as your module error below, it looks like you have an > incompatibility of > some sort. I haven''t seen this particular error. My guess is that your > lustre and > ib modules were compiled for different kernels. > > Thierry Delaitre wrote: > > >On Fri, 22 Sep 2006, pauln wrote: > > > > > > > >>Thierry, > >>I''m using ip2nets in this manner: > >> options lnet ip2nets="o2ib0 10.10.101.[0-255]; tcp0(eth0) > >>10.10.100.[0-255]" > >> > >>networks=o2ib should work as well. > >>paul > >> > >> > > > >Paul, > > > >thanks. I currently have a problem with the o2ib''s ko2iblnd module but my > >question is: does the o2ib0 uses IPOIB or RDMA ? I believe it uses RDMA > >eventough the ip address of ib0 is specified. > > > >ko2iblnd: disagrees about version of symbol ib_create_cq > >ko2iblnd: Unknown symbol ib_create_cq > >ko2iblnd: no version for "rdma_resolve_addr" found: kernel tainted. > >ko2iblnd: disagrees about version of symbol ib_dereg_mr > >ko2iblnd: Unknown symbol ib_dereg_mr > >ko2iblnd: disagrees about version of symbol ib_destroy_cq > >ko2iblnd: Unknown symbol ib_destroy_cq > >ko2iblnd: disagrees about version of symbol ib_get_dma_mr > >ko2iblnd: Unknown symbol ib_get_dma_mr > >ko2iblnd: disagrees about version of symbol ib_alloc_pd > >ko2iblnd: Unknown symbol ib_alloc_pd > >ko2iblnd: disagrees about version of symbol ib_modify_qp > >ko2iblnd: Unknown symbol ib_modify_qp > >ko2iblnd: disagrees about version of symbol ib_dealloc_pd > >ko2iblnd: Unknown symbol ib_dealloc_pd > >LustreError: 3982:0:(api-ni.c:1002:lnet_startup_lndnis()) Can''t load LND o2ib, module ko2iblnd, rc=256 > > > >Thierry. > > > > > > > >>Thierry Delaitre wrote: > >> > >> > >> > >>>Hi, > >>> > >>>I''ve recompiled Lustre 1.5.95 under SLES10 with OFED-1.1-rc6 > >>> > >>>how do you then use it with the o2ib driver ? > >>> > >>>Do i need to edit the following line ? > >>> > >>>options lnet networks= ??? > >>> > >>>If i use lctl list_nids, i can only see the tcp interface. do i need to do > >>>something else ? > >>> > >>>would be grateful for an example! (works fine with tcp but would really > >>>like to use it with openib gen2). > >>> > >>>Thierry.
Have you tried the ib drivers which come with 2.6.16? I''m running the ib drivers included in rhel 2.6.9-42 and not the OFED drivers with the lustre o2ib nal. To answer your questions, I don''t think you need you disable ib support in the kernel. The OFED package builds its drivers in /tmp and will presumably overwrite the kernel modules in /lib/modules placed there by the kernel install. Take a look at the -k and -kver options to the OFED install.sh script. paul Thierry Delaitre wrote:>On Fri, 22 Sep 2006, pauln wrote: > > >>I believe the ib0 address is used by ipnets to determine whether the module >>should be loaded on a given node. Technically you don''t need ipoib to >>run the >>infiniband nal - but (I could be wrong here) you do need some sort of ip >>connection >>on your nodes so that the infiniband nals can initialize their peers. If >>this is >>the case then the node part of the lustre nid (ie oss0 in oss0@o2ib) >>should be >>associated with an ip address. >> >> > >thanks for the above info from Paul,Eric,Peter! > >Regarding the issue below, i agree that it is something like a mismatch of >libs or modules and it''s probably caused by the fact i''m not too clear on >the steps to compile lustre with o2ib and ofed. the steps i did are as >follows: > >1) install sles10 linux-2.6.16.21-0.8 kernel-source >2) patch it with 2.6-sles10 lustre kernel patch series >3) install new kernel and reboot >4) download OFED-1.1-rc6.tgz and use install script >5) >rm -fr /usr/src/linux-2.6.16.21-0.8/drivers/infiniband >rm -fr /usr/src/linux-2.6.16.21-0.8/include/rdma >ln -s /usr/local/ofed/src/openib-1.1/drivers/infiniband/ /usr/src/linux-2.6.16.21-0.8/drivers/infiniband > >6) compile lustre with gen2 support. > >maybe the problem is that i had the following enabled when i recompiled >the kernel for step 2 & 3 ? it seems the linux kernel 2.6.16 comes with >infiniband support. do i need to disable this before compiling it in step >2 & 3 or does ofed replaces the native modules of the kernel built in >steps 2 & 3 ? hope this makes sense! > ># InfiniBand support >CONFIG_INFINIBAND=m >CONFIG_INFINIBAND_USER_MAD=m >CONFIG_INFINIBAND_USER_ACCESS=m >CONFIG_INFINIBAND_MTHCA=m >CONFIG_INFINIBAND_MTHCA_DEBUG=y >CONFIG_INFINIBAND_IPOIB=m >CONFIG_INFINIBAND_IPOIB_DEBUG=y ># CONFIG_INFINIBAND_IPOIB_DEBUG_DATA is not set >CONFIG_INFINIBAND_SRP=m > >Thierry. > > > >>As far as your module error below, it looks like you have an >>incompatibility of >>some sort. I haven''t seen this particular error. My guess is that your >>lustre and >>ib modules were compiled for different kernels. >> >>Thierry Delaitre wrote: >> >> >> >>>On Fri, 22 Sep 2006, pauln wrote: >>> >>> >>> >>> >>> >>>>Thierry, >>>>I''m using ip2nets in this manner: >>>> options lnet ip2nets="o2ib0 10.10.101.[0-255]; tcp0(eth0) >>>>10.10.100.[0-255]" >>>> >>>>networks=o2ib should work as well. >>>>paul >>>> >>>> >>>> >>>> >>>Paul, >>> >>>thanks. I currently have a problem with the o2ib''s ko2iblnd module but my >>>question is: does the o2ib0 uses IPOIB or RDMA ? I believe it uses RDMA >>>eventough the ip address of ib0 is specified. >>> >>>ko2iblnd: disagrees about version of symbol ib_create_cq >>>ko2iblnd: Unknown symbol ib_create_cq >>>ko2iblnd: no version for "rdma_resolve_addr" found: kernel tainted. >>>ko2iblnd: disagrees about version of symbol ib_dereg_mr >>>ko2iblnd: Unknown symbol ib_dereg_mr >>>ko2iblnd: disagrees about version of symbol ib_destroy_cq >>>ko2iblnd: Unknown symbol ib_destroy_cq >>>ko2iblnd: disagrees about version of symbol ib_get_dma_mr >>>ko2iblnd: Unknown symbol ib_get_dma_mr >>>ko2iblnd: disagrees about version of symbol ib_alloc_pd >>>ko2iblnd: Unknown symbol ib_alloc_pd >>>ko2iblnd: disagrees about version of symbol ib_modify_qp >>>ko2iblnd: Unknown symbol ib_modify_qp >>>ko2iblnd: disagrees about version of symbol ib_dealloc_pd >>>ko2iblnd: Unknown symbol ib_dealloc_pd >>>LustreError: 3982:0:(api-ni.c:1002:lnet_startup_lndnis()) Can''t load LND o2ib, module ko2iblnd, rc=256 >>> >>>Thierry. >>> >>> >>> >>> >>> >>>>Thierry Delaitre wrote: >>>> >>>> >>>> >>>> >>>> >>>>>Hi, >>>>> >>>>>I''ve recompiled Lustre 1.5.95 under SLES10 with OFED-1.1-rc6 >>>>> >>>>>how do you then use it with the o2ib driver ? >>>>> >>>>>Do i need to edit the following line ? >>>>> >>>>>options lnet networks= ??? >>>>> >>>>>If i use lctl list_nids, i can only see the tcp interface. do i need to do >>>>>something else ? >>>>> >>>>>would be grateful for an example! (works fine with tcp but would really >>>>>like to use it with openib gen2). >>>>> >>>>>Thierry. >>>>> >>>>> > >_______________________________________________ >Lustre-discuss mailing list >Lustre-discuss@clusterfs.com >https://mail.clusterfs.com/mailman/listinfo/lustre-discuss > >
On Fri, 22 Sep 2006, Eric Barton wrote:> Thierry, > > > I believe the ib0 address is used by ipnets to determine whether the > > module should be loaded on a given node. Technically you don''t need > > ipoib to run the infiniband nal - but (I could be wrong here) you do > > need some sort of ip connection on your nodes so that the infiniband > > nals can initialize their peers. If this is the case then the node > > part of the lustre nid (ie oss0 in oss0@o2ib) should be associated > > with an ip address. > > The OpenFabrics LND (o2iblnd) uses the same address resolution stuff > as its IPoIB does, so you do in fact need IPoIB to be working for > o2iblnd to work. Make sure you can ping everywhere over IPoIB before > you start. > > > As far as your module error below, it looks like you have an > > incompatibility of some sort. I haven''t seen this particular > > error. My guess is that your lustre and ib modules were compiled for > > different kernels. > > I agree. You''ll have to check that both lustre and your IB modules > were built against the kernel you''re running.>ko2iblnd: disagrees about version of symbol ib_create_cq >ko2iblnd: Unknown symbol ib_create_cq >ko2iblnd: no version for "rdma_resolve_addr" found: kernel tainted. >ko2iblnd: disagrees about version of symbol ib_dereg_mr >ko2iblnd: Unknown symbol ib_dereg_mr >ko2iblnd: disagrees about version of symbol ib_destroy_cq >ko2iblnd: Unknown symbol ib_destroy_cq >ko2iblnd: disagrees about version of symbol ib_get_dma_mr >ko2iblnd: Unknown symbol ib_get_dma_mr >ko2iblnd: disagrees about version of symbol ib_alloc_pd >ko2iblnd: Unknown symbol ib_alloc_pd >ko2iblnd: disagrees about version of symbol ib_modify_qp >ko2iblnd: Unknown symbol ib_modify_qp >ko2iblnd: disagrees about version of symbol ib_dealloc_pd >ko2iblnd: Unknown symbol ib_dealloc_pd >LustreError: 3982:0:(api-ni.c:1002:lnet_startup_lndnis()) Can''t load LNDo2ib, module ko2iblnd, rc=256 I''ve double checked that i''m using ib modules recompiled for the kernel i have. I deleted the ofed rpms, ib modules in the kernel, recompiled ofed which installs the ib modules, then recompiled lustre but still experiences the same issues :-( I noticed the following warnings while recompiling lustre. Is this important and is this related to the problems i''m experiencing ? Thierry. CC [M] /root/lustre-1.5.95/lustre/quota/quotacheck_test.o In file included from /root/lustre-1.5.95/lnet/include/libcfs/kp30.h:12, from /root/lustre-1.5.95/lustre/include/obd_support.h:26, from /root/lustre-1.5.95/lustre/include/obd_class.h:26, from /root/lustre-1.5.95/lustre/quota/quotacheck_test.c:31:/root/lustre-1.5.95/lnet/include/libcfs/linux/kp30.h:184:6: warning: "KLWT_SUPPORT" is not defined In file included from /root/lustre-1.5.95/lustre/include/obd_class.h:26, from /root/lustre-1.5.95/lustre/quota/quotacheck_test.c:31: /root/lustre-1.5.95/lustre/include/obd_support.h:317:5: warning: "POISON_BULK" is not defined Building modules, stage 2. MODPOST WARNING: could not find /root/lustre-1.5.95/lustre/lvfs/.fsfilt-ldiskfs.o.cmd for /root/lustre-1.5.95/lustre/ lvfs/fsfilt-ldiskfs.o WARNING: "rdma_accept" [/root/lustre-1.5.95/lnet/klnds/o2iblnd/ko2iblnd.ko] undefined! WARNING: "rdma_destroy_id" [/root/lustre-1.5.95/lnet/klnds/o2iblnd/ko2iblnd.ko] undefined! WARNING: "rdma_connect" [/root/lustre-1.5.95/lnet/klnds/o2iblnd/ko2iblnd.ko] undefined! WARNING: "rdma_destroy_qp" [/root/lustre-1.5.95/lnet/klnds/o2iblnd/ko2iblnd.ko] undefined! WARNING: "rdma_listen" [/root/lustre-1.5.95/lnet/klnds/o2iblnd/ko2iblnd.ko] undefined! WARNING: "rdma_create_id" [/root/lustre-1.5.95/lnet/klnds/o2iblnd/ko2iblnd.ko] undefined! WARNING: "rdma_create_qp" [/root/lustre-1.5.95/lnet/klnds/o2iblnd/ko2iblnd.ko] undefined! WARNING: "rdma_bind_addr" [/root/lustre-1.5.95/lnet/klnds/o2iblnd/ko2iblnd.ko] undefined! WARNING: "rdma_resolve_route" [/root/lustre-1.5.95/lnet/klnds/o2iblnd/ko2iblnd.ko] undefined! WARNING: "rdma_disconnect" [/root/lustre-1.5.95/lnet/klnds/o2iblnd/ko2iblnd.ko] undefined! WARNING: "rdma_reject" [/root/lustre-1.5.95/lnet/klnds/o2iblnd/ko2iblnd.ko] undefined! WARNING: "rdma_resolve_addr" [/root/lustre-1.5.95/lnet/klnds/o2iblnd/ko2iblnd.ko] undefined! CC /root/lustre-1.5.95/lnet/klnds/o2iblnd/ko2iblnd.mod.o LD [M] /root/lustre-1.5.95/lnet/klnds/o2iblnd/ko2iblnd.ko CC /root/lustre-1.5.95/lnet/klnds/socklnd/ksocklnd.mod.o LD [M] /root/lustre-1.5.95/lnet/klnds/socklnd/ksocklnd.ko CC /root/lustre-1.5.95/lnet/libcfs/libcfs.mod.o LD [M] /root/lustre-1.5.95/lnet/libcfs/libcfs.ko> > Thierry Delaitre wrote: > > > > >Paul, > > > > > >thanks. I currently have a problem with the o2ib''s ko2iblnd module > > >but my question is: does the o2ib0 uses IPOIB or RDMA ? I believe > > >it uses RDMA eventough the ip address of ib0 is specified. > > RDMA > > -- > > Cheers, > Eric > > --------------------------------------------------- > |Eric Barton Barton Software | > |9 York Gardens Tel: +44 (117) 330 1575 | > |Clifton Mobile: +44 (7909) 680 356 | > |Bristol BS8 4LL Fax: call first | > |United Kingdom E-Mail: eeb@bartonsoftware.com| > --------------------------------------------------- > > >---------------------------------------- Dr Thierry DELAITRE Systems and Services Manager, CSCS University of Westminster 115 New Cavendish Street, London W1W 6UW Tel: 020 7911 5000 ext: 3586 Fax: 020 7911 5089 Mobile short dial code 1788 http://www.cscs.wmin.ac.uk/~delaitt ---------------------------------------- This e-mail and its attachments are intended for the above named only and may be confidential. If they have come to you in error you must not copy or show them to anyone, nor should you take any action based on them, other than to notify the error by replying to the sender.
> I''ve double checked that i''m using ib modules recompiled for the kernel i > have. I deleted the ofed rpms, ib modules in the kernel, recompiled ofed > which installs the ib modules, then recompiled lustre but still > experiences the same issues :-(What did you say for "--with-o2ib=???????" when you configured lustre? Can you mail me <lustre>/config.log? Cheers, Eric --------------------------------------------------- |Eric Barton Barton Software | |9 York Gardens Tel: +44 (117) 330 1575 | |Clifton Mobile: +44 (7909) 680 356 | |Bristol BS8 4LL Fax: call first | |United Kingdom E-Mail: eeb@bartonsoftware.com| ---------------------------------------------------
On Sat, 23 Sep 2006, Eric Barton wrote:> > I''ve double checked that i''m using ib modules recompiled for the kernel i > > have. I deleted the ofed rpms, ib modules in the kernel, recompiled ofed > > which installs the ib modules, then recompiled lustre but still > > experiences the same issues :-( > > What did you say for "--with-o2ib=???????" when you configured lustre?$ ./configure --with-o2ib=/usr/local/ofed/src/openib-1.1> Can you mail me <lustre>/config.log?it is attached (compressed). Cheers, Thierry. -------------- next part -------------- A non-text attachment was scrubbed... Name: config.log.gz Type: application/octet-stream Size: 13548 bytes Desc: Url : http://mail.clusterfs.com/pipermail/lustre-discuss/attachments/20060923/10b1a6fe/config.log-0001.obj
On Sat, 23 Sep 2006, Eric Barton wrote:> > I''ve double checked that i''m using ib modules recompiled for the kernel i > > have. I deleted the ofed rpms, ib modules in the kernel, recompiled ofed > > which installs the ib modules, then recompiled lustre but still > > experiences the same issues :-( > > What did you say for "--with-o2ib=???????" when you configured lustre? > Can you mail me <lustre>/config.log?i''ve set modversion to ''n'' and recompiled the kernel. i''m getting slightly more meaningful messages. Sep 23 11:17:22 n32 kernel: Lustre: Added LNI 192.168.1.98@tcp [8/256] Sep 23 11:17:22 n32 kernel: ib_cm: Unknown symbol ib_init_ah_from_wc Sep 23 11:17:22 n32 kernel: ib_cm: Unknown symbol ib_init_ah_from_path Sep 23 11:17:22 n32 modprobe: WARNING: Error inserting ib_cm (/lib/modules/2.6.16.21-0.8-smp/kernel/drivers/infiniband/core/ib_cm.ko): Unknown symbol in module, or unknown parameter (see dmesg) Sep 23 11:17:22 n32 kernel: rdma_cm: Unknown symbol ib_cm_listen Sep 23 11:17:22 n32 kernel: rdma_cm: Unknown symbol ib_destroy_cm_id Sep 23 11:17:22 n32 kernel: rdma_cm: Unknown symbol ib_create_cm_id Sep 23 11:17:22 n32 kernel: rdma_cm: Unknown symbol ib_send_cm_rep Sep 23 11:17:22 n32 kernel: rdma_cm: Unknown symbol ib_cm_init_qp_attr Sep 23 11:17:22 n32 kernel: rdma_cm: Unknown symbol ib_send_cm_drep Sep 23 11:17:22 n32 kernel: rdma_cm: Unknown symbol ib_send_cm_rtu Sep 23 11:17:22 n32 kernel: rdma_cm: Unknown symbol ib_send_cm_dreq Sep 23 11:17:22 n32 kernel: rdma_cm: Unknown symbol ib_send_cm_req Sep 23 11:17:22 n32 kernel: rdma_cm: Unknown symbol ib_cm_establish Sep 23 11:17:22 n32 modprobe: WARNING: Error inserting rdma_cm (/lib/modules/2.6.16.21-0.8-smp/kernel/drivers/infiniband/core/rdma_cm.ko): Unknown symbol in module, or unknown parameter (see dmesg) Sep 23 11:17:22 n32 kernel: rdma_cm: Unknown symbol ib_send_cm_rej Sep 23 11:17:22 n32 kernel: ko2iblnd: Unknown symbol rdma_resolve_addr Sep 23 11:17:22 n32 modprobe: FATAL: Error inserting ko2iblnd (/lib/modules/2.6.16.21-0.8-smp/kernel/net/lustre/ko2iblnd.ko): Unknown symbol in module, or unknown parameter (see dmesg) Sep 23 11:17:22 n32 kernel: ko2iblnd: Unknown symbol rdma_reject Sep 23 11:17:22 n32 kernel: ko2iblnd: Unknown symbol rdma_disconnect Sep 23 11:17:22 n32 kernel: ko2iblnd: Unknown symbol rdma_resolve_route Sep 23 11:17:22 n32 kernel: ko2iblnd: Unknown symbol rdma_bind_addr Sep 23 11:17:22 n32 kernel: ko2iblnd: Unknown symbol rdma_create_qp Sep 23 11:17:22 n32 kernel: ko2iblnd: Unknown symbol rdma_create_id Sep 23 11:17:22 n32 kernel: ko2iblnd: Unknown symbol rdma_listen Sep 23 11:17:22 n32 kernel: ko2iblnd: Unknown symbol rdma_destroy_qp Sep 23 11:17:22 n32 kernel: ko2iblnd: Unknown symbol rdma_connect Sep 23 11:17:22 n32 kernel: ko2iblnd: Unknown symbol rdma_destroy_id Sep 23 11:17:22 n32 kernel: ko2iblnd: Unknown symbol rdma_accept Sep 23 11:17:22 n32 kernel: LustreError: 20844:0:(api-ni.c:1002:lnet_startup_lndnis()) Can''t load LND o2ib, module ko2iblnd, rc=256 Sep 23 11:17:23 n32 kernel: Lustre: Removed LNI 192.168.1.98@tcp cheers, Thierry.
it seems the problem is with ofed (1.1-rc6). it seems those 2 ib modules cannot load and will hence prevent ko2iblnd from loading up. Sep 23 11:30:30 n32 kernel: ib_cm: Unknown symbol ib_init_ah_from_wc Sep 23 11:30:30 n32 kernel: ib_cm: Unknown symbol ib_init_ah_from_path Sep 23 11:32:37 n32 kernel: rdma_cm: Unknown symbol ib_cm_listen Sep 23 11:32:37 n32 kernel: rdma_cm: Unknown symbol ib_destroy_cm_id Sep 23 11:32:37 n32 kernel: rdma_cm: Unknown symbol ib_create_cm_id Sep 23 11:32:37 n32 kernel: rdma_cm: Unknown symbol ib_send_cm_rep Sep 23 11:32:37 n32 kernel: rdma_cm: Unknown symbol ib_cm_init_qp_attr Sep 23 11:32:37 n32 kernel: rdma_cm: Unknown symbol ib_send_cm_drep Sep 23 11:32:37 n32 kernel: rdma_cm: Unknown symbol ib_send_cm_rtu Sep 23 11:32:37 n32 kernel: rdma_cm: Unknown symbol ib_send_cm_dreq Sep 23 11:32:37 n32 kernel: rdma_cm: Unknown symbol ib_send_cm_req Sep 23 11:32:37 n32 kernel: rdma_cm: Unknown symbol ib_cm_establish Sep 23 11:32:37 n32 kernel: rdma_cm: Unknown symbol ib_send_cm_rej Thierry. On Sat, 23 Sep 2006, Thierry Delaitre wrote:> > On Sat, 23 Sep 2006, Eric Barton wrote: > > > > I''ve double checked that i''m using ib modules recompiled for the kernel i > > > have. I deleted the ofed rpms, ib modules in the kernel, recompiled ofed > > > which installs the ib modules, then recompiled lustre but still > > > experiences the same issues :-( > > > > What did you say for "--with-o2ib=???????" when you configured lustre? > > Can you mail me <lustre>/config.log? > > i''ve set modversion to ''n'' and recompiled the kernel. i''m getting slightly > more meaningful messages. > > Sep 23 11:17:22 n32 kernel: Lustre: Added LNI 192.168.1.98@tcp [8/256] > Sep 23 11:17:22 n32 kernel: ib_cm: Unknown symbol ib_init_ah_from_wc > Sep 23 11:17:22 n32 kernel: ib_cm: Unknown symbol ib_init_ah_from_path > Sep 23 11:17:22 n32 modprobe: WARNING: Error inserting ib_cm > (/lib/modules/2.6.16.21-0.8-smp/kernel/drivers/infiniband/core/ib_cm.ko): > Unknown symbol in module, or unknown parameter (see dmesg) > Sep 23 11:17:22 n32 kernel: rdma_cm: Unknown symbol ib_cm_listen > Sep 23 11:17:22 n32 kernel: rdma_cm: Unknown symbol ib_destroy_cm_id > Sep 23 11:17:22 n32 kernel: rdma_cm: Unknown symbol ib_create_cm_id > Sep 23 11:17:22 n32 kernel: rdma_cm: Unknown symbol ib_send_cm_rep > Sep 23 11:17:22 n32 kernel: rdma_cm: Unknown symbol ib_cm_init_qp_attr > Sep 23 11:17:22 n32 kernel: rdma_cm: Unknown symbol ib_send_cm_drep > Sep 23 11:17:22 n32 kernel: rdma_cm: Unknown symbol ib_send_cm_rtu > Sep 23 11:17:22 n32 kernel: rdma_cm: Unknown symbol ib_send_cm_dreq > Sep 23 11:17:22 n32 kernel: rdma_cm: Unknown symbol ib_send_cm_req > Sep 23 11:17:22 n32 kernel: rdma_cm: Unknown symbol ib_cm_establish > Sep 23 11:17:22 n32 modprobe: WARNING: Error inserting rdma_cm > (/lib/modules/2.6.16.21-0.8-smp/kernel/drivers/infiniband/core/rdma_cm.ko): > Unknown symbol in module, or unknown parameter (see dmesg) > Sep 23 11:17:22 n32 kernel: rdma_cm: Unknown symbol ib_send_cm_rej > Sep 23 11:17:22 n32 kernel: ko2iblnd: Unknown symbol rdma_resolve_addr > Sep 23 11:17:22 n32 modprobe: FATAL: Error inserting ko2iblnd > (/lib/modules/2.6.16.21-0.8-smp/kernel/net/lustre/ko2iblnd.ko): Unknown > symbol in module, or unknown parameter (see dmesg) > Sep 23 11:17:22 n32 kernel: ko2iblnd: Unknown symbol rdma_reject > Sep 23 11:17:22 n32 kernel: ko2iblnd: Unknown symbol rdma_disconnect > Sep 23 11:17:22 n32 kernel: ko2iblnd: Unknown symbol rdma_resolve_route > Sep 23 11:17:22 n32 kernel: ko2iblnd: Unknown symbol rdma_bind_addr > Sep 23 11:17:22 n32 kernel: ko2iblnd: Unknown symbol rdma_create_qp > Sep 23 11:17:22 n32 kernel: ko2iblnd: Unknown symbol rdma_create_id > Sep 23 11:17:22 n32 kernel: ko2iblnd: Unknown symbol rdma_listen > Sep 23 11:17:22 n32 kernel: ko2iblnd: Unknown symbol rdma_destroy_qp > Sep 23 11:17:22 n32 kernel: ko2iblnd: Unknown symbol rdma_connect > Sep 23 11:17:22 n32 kernel: ko2iblnd: Unknown symbol rdma_destroy_id > Sep 23 11:17:22 n32 kernel: ko2iblnd: Unknown symbol rdma_accept > Sep 23 11:17:22 n32 kernel: LustreError: > 20844:0:(api-ni.c:1002:lnet_startup_lndnis()) Can''t load LND o2ib, module > ko2iblnd, rc=256 > Sep 23 11:17:23 n32 kernel: Lustre: Removed LNI 192.168.1.98@tcp > > cheers, > > Thierry. > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss@clusterfs.com > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss > >---------------------------------------- Dr Thierry DELAITRE Systems and Services Manager, CSCS University of Westminster 115 New Cavendish Street, London W1W 6UW Tel: 020 7911 5000 ext: 3586 Fax: 020 7911 5089 Mobile short dial code 1788 http://www.cscs.wmin.ac.uk/~delaitt ---------------------------------------- This e-mail and its attachments are intended for the above named only and may be confidential. If they have come to you in error you must not copy or show them to anyone, nor should you take any action based on them, other than to notify the error by replying to the sender.
Thierry, Can you confirm that your installed kernel is consistent with /usr/src/linux? Where do /lib/modules/`uname -r`/{build,source} point to? Can you verify that the infiniband modules under /lib/modules/`uname -r` are actually the ones you built when you installed OFED? Cheers, Eric --------------------------------------------------- |Eric Barton Barton Software | |9 York Gardens Tel: +44 (117) 330 1575 | |Clifton Mobile: +44 (7909) 680 356 | |Bristol BS8 4LL Fax: call first | |United Kingdom E-Mail: eeb@bartonsoftware.com| ---------------------------------------------------
Thierry,> it seems the problem is with ofed (1.1-rc6)....Hmmm. Have you tried just using the infiniband support that came in your kernel sources? Cheers, Eric --------------------------------------------------- |Eric Barton Barton Software | |9 York Gardens Tel: +44 (117) 330 1575 | |Clifton Mobile: +44 (7909) 680 356 | |Bristol BS8 4LL Fax: call first | |United Kingdom E-Mail: eeb@bartonsoftware.com| ---------------------------------------------------
Eric, On Sat, 23 Sep 2006, Eric Barton wrote:> Thierry, > > Can you confirm that your installed kernel is consistent with > /usr/src/linux?yes it is.> Where do /lib/modules/`uname -r`/{build,source} point to?n32:~ # ls -l /lib/modules/2.6.16.21-0.8-smp/ lrwxrwxrwx 1 root root 28 Sep 23 11:01 build -> /usr/src/linux-2.6.16.21-0.8 lrwxrwxrwx 1 root root 28 Sep 23 11:01 source -> /usr/src/linux-2.6.16.21-0.8> Can you verify that the infiniband modules under /lib/modules/`uname -r` are > actually the ones you built when you installed OFED?i checked and it matches the installation time of /usr/local/ofed>> it seems the problem is with ofed (1.1-rc6)....>Hmmm. Have you tried just using the infiniband support that came in your >kernel sources?no because the lustre''s ./configure script does not seem to detect the default IB gen2 stuff that comes natively with the kernel. I''ve now progressed a little bit further. I''ve enabled the UCM_LOAD to yes in /etc/infiniband/openib.conf and the rdma_cm now loads in the kernel. However, when i do a ''lctl network up'' i get the following exception: Thierry. Lustre: Removed LNI 192.168.1.98@tcp Unable to handle kernel NULL pointer dereference at virtual address 00000000 printing eip: 00000000 *pde = 00000000 Oops: 0000 [#1] SMP last sysfs file: /devices/pci0000:00/0000:00:1e.0/0000:01:0c.0/resource Modules linked in: ko2iblnd lnet libcfs nfs lockd nfs_acl sunrpc rdma_ucm ib_ucm rdma_cm ib_addr ib_cm af_packet ipv6 ib_uverbs ib_umad ib_mthca ib_ipoib ib_sa ib_mad ib_core button battery ac apparmor aamatch_pcre loop dm_mod e1000 reiserfs edd fan thermal processor sg aic79xx scsi_transport_spi piix sd_mod scsi_mod ide_disk ide_core CPU: 0 EIP: 0060:[<00000000>] Tainted: G U VLI EFLAGS: 00010293 (2.6.16.21-0.8-smp #3) EIP is at _stext+0x3feffd40/0x29 eax: db5d4080 ebx: cfd41e40 ecx: ddf2e000 edx: 00000001 esi: de8d8600 edi: db5d4080 ebp: ce275a90 esp: ce275a7c ds: 007b es: 007b ss: 0068 Process lctl (pid: 3994, threadinfo=ce274000 task=deaceb70) Stack: <0>e1be7e7a ce275ab0 cef541c0 cef541c0 00000000 00306269 00000000 00000001 69326f6b 646e6c62 d1b1b800 ce275b08 00000000 db030002 30534aa1 00000000 00000000 451521ff 000e0071 00000001 ffffff00 a14a5330 01000000 00000001 Call Trace: [<e1be7e7a>] kiblnd_startup+0xc4d/0xe93 [ko2iblnd] [<e2c04af9>] lnet_startup_lndnis+0x187/0x6cf [lnet] [<e2c05759>] LNetNIInit+0x108/0x1bf [lnet] [<e1b97376>] libcfs_ioctl+0x0/0x6da [libcfs] [<e2c1316a>] lnet_configure+0x22/0x44 [lnet] [<e1b97963>] libcfs_ioctl+0x5ed/0x6da [libcfs] [<e093027b>] reiserfs_async_progress_wait+0x21/0x6c [reiserfs] [<e092a0ec>] pathrelse+0x18/0x2b [reiserfs] [<c01308d7>] autoremove_wake_function+0x0/0x2d [<e0930dfc>] do_journal_end+0xb36/0xb5f [reiserfs] [<c015d3bd>] __find_get_block+0x17b/0x185 [<c01733f9>] mntput_no_expire+0x12/0xaf [<c0168f4b>] link_path_walk+0xb3/0xbd [<c015d3ee>] __getblk+0x27/0x229 [<c015dc34>] ll_rw_block+0x7f/0x8e [<e0934161>] xattr_lookup_poison+0x52/0x5f [reiserfs] [<c016f6be>] __d_lookup+0x96/0xd9 [<c0166c3a>] do_lookup+0x3c/0x7a [<c016f993>] dput+0x1a/0x118 [<c0168d3a>] __link_path_walk+0xd98/0xef6 [<c0141616>] find_get_page+0x18/0x38 [<c015d078>] __find_get_block_slow+0xfe/0x107 [<e093027b>] reiserfs_async_progress_wait+0x21/0x6c [reiserfs] [<e092a0ec>] pathrelse+0x18/0x2b [reiserfs] [<c01308d7>] autoremove_wake_function+0x0/0x2d [<e0930dfc>] do_journal_end+0xb36/0xb5f [reiserfs] [<c01733f9>] mntput_no_expire+0x12/0xaf [<c0168f4b>] link_path_walk+0xb3/0xbd [<c0169276>] do_path_lookup+0x1df/0x242 [<c015794e>] shmem_permission+0x0/0xa [<c0166dd3>] permission+0x97/0xa3 [<c0167c20>] may_open+0x53/0x200 [<e1b92ee1>] cfs_alloc+0x31/0x60 [libcfs] [<e1b96cb3>] libcfs_psdev_open+0x0/0x32f [libcfs] [<e1b96d8f>] libcfs_psdev_open+0xdc/0x32f [libcfs] [<e1b96cb3>] libcfs_psdev_open+0x0/0x32f [libcfs] [<e1b953a9>] libcfs_psdev_open+0x1d/0x23 [libcfs] [<c02039d7>] misc_open+0x119/0x1c9 [<c0162f0e>] chrdev_open+0x12b/0x161 [<c0162de3>] chrdev_open+0x0/0x161 [<c0159f72>] __dentry_open+0xf5/0x1c4 [<c015a0b1>] nameidata_to_filp+0x25/0x37 [<c015a115>] do_filp_open+0x52/0x5a [<e1b97376>] libcfs_ioctl+0x0/0x6da [libcfs] [<e1b9550c>] libcfs_ioctl+0x13a/0x151 [libcfs] [<c016b0c4>] do_ioctl+0x48/0x5e [<c016b326>] vfs_ioctl+0x24c/0x25e [<c016b389>] sys_ioctl+0x51/0x68 [<c0103bcb>] sysenter_past_esp+0x54/0x79 Code: Bad EIP value.