Hi List, We''re getting ready to upgrade the OS/software stack on one of our clusters and I''m looking at which Lustre and OFED versions will work best. It looks like the changelog for 1.8.4 and the compatibility matrix have conflicting information. The Lustre compatibility matrix indicates that on Lustre 1.8.4; the highest OFED revision with o2iblnd support is 1.4.2: http://wiki.lustre.org/index.php/Lustre_Release_Information The changelog for 1.8.4 indicates that o2iblnd is supported with OFED 1.5.1: http://wiki.lustre.org/index.php/Change_Log_1.8#Changes_from_v1.8.3_to_v1.8.4 Can someone clarify whether 1.8.4 supports o2iblnd with OFED 1.5.1? Are there any pitfalls to this configuration? Has anyone found any instabilities with this configuration? Thanks much. -Ed Walter Carnegie Mellon University
OFED 1.5.1 should work fine with Lustre 1.8.4, although I believe more people are using the in-kernel OFED now: Lustre (finally) defaulted to the in-kernel OFED for RedHat, so it is no longer _necessary_ to build either OFED or Lustre. Kevin Edward Walter wrote:> Hi List, > > We''re getting ready to upgrade the OS/software stack on one of our > clusters and I''m looking at which Lustre and OFED versions will work best. > > It looks like the changelog for 1.8.4 and the compatibility matrix have > conflicting information. > > The Lustre compatibility matrix indicates that on Lustre 1.8.4; the > highest OFED revision with o2iblnd support is 1.4.2: > http://wiki.lustre.org/index.php/Lustre_Release_Information > > The changelog for 1.8.4 indicates that o2iblnd is supported with OFED 1.5.1: > http://wiki.lustre.org/index.php/Change_Log_1.8#Changes_from_v1.8.3_to_v1.8.4 > > > Can someone clarify whether 1.8.4 supports o2iblnd with OFED 1.5.1? Are > there any pitfalls to this configuration? Has anyone found any > instabilities with this configuration? > > Thanks much. > > -Ed Walter > Carnegie Mellon University > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss >
we have being use OFED 1.5.1 with Lustre 1.8.4 nowadays on a 400 node Cluster, on basis of RHEL 5.4. It is no problem at all. One thing need attention: If using default OFED 1.5.1, just install with RPM package, no need to build either Lustre or OFED. If using revised driver, such as BX-OFED 1.5.1, in some cases, users need to recompile linux kernel with increased stack size, because lustre and ofed may use up stack (both are stack greedy) and thus lead to system hang issue. YiLei On Thu, Jun 2, 2011 at 1:36 AM, Kevin Van Maren <kevin.van.maren at oracle.com>wrote:> OFED 1.5.1 should work fine with Lustre 1.8.4, although I believe more > people are using the in-kernel OFED now: Lustre (finally) defaulted to > the in-kernel OFED for RedHat, so it is no longer _necessary_ to build > either OFED or Lustre. > > Kevin > > > Edward Walter wrote: > > Hi List, > > > > We''re getting ready to upgrade the OS/software stack on one of our > > clusters and I''m looking at which Lustre and OFED versions will work > best. > > > > It looks like the changelog for 1.8.4 and the compatibility matrix have > > conflicting information. > > > > The Lustre compatibility matrix indicates that on Lustre 1.8.4; the > > highest OFED revision with o2iblnd support is 1.4.2: > > http://wiki.lustre.org/index.php/Lustre_Release_Information > > > > The changelog for 1.8.4 indicates that o2iblnd is supported with OFED > 1.5.1: > > > http://wiki.lustre.org/index.php/Change_Log_1.8#Changes_from_v1.8.3_to_v1.8.4 > > > > > > Can someone clarify whether 1.8.4 supports o2iblnd with OFED 1.5.1? Are > > there any pitfalls to this configuration? Has anyone found any > > instabilities with this configuration? > > > > Thanks much. > > > > -Ed Walter > > Carnegie Mellon University > > _______________________________________________ > > Lustre-discuss mailing list > > Lustre-discuss at lists.lustre.org > > http://lists.lustre.org/mailman/listinfo/lustre-discuss > > > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20110605/892e71c8/attachment.html
Thanks for all of the advice here. We seem to be running into a hiccup using Lustre 1.8.4 with O2IB and OFED 1.5.1 First of all, our lustre servers are all up and running fine (using the vendor OFED - 1.4.1). Our trouble is all client side. We want to use a newer OFED (1.5.1) to potentially enable NFS over RDMA (we have NFS servers in addition to lustre). We installed the current Lustre 1.8.4 rpms from Sun/Oracle:> kernel-2.6.18-194.3.1.el5_lustre.1.8.4 > lustre-1.8.4-2.6.18_194.3.1.el5_lustre.1.8.4 > lustre-modules-1.8.4-2.6.18_194.3.1.el5_lustre.1.8.4 > > kernel-devel-2.6.18-194.3.1.el5_lustre.1.8.4 > kernel-headers-2.6.18-194.3.1.el5_lustre.1.8.4We rebooted with kernel-2.6.18-194.3.1.el5_lustre.1.8.4. Next we downloaded the OFED 1.5.1 sources and built the basic and hpc packages. These built and installed without incident. I don''t believe Open Fabrics group provides binary RPMS. Otherwise; we would have used them. Here are the lustre/IB lines from our modprobe.conf:> alias ib0 ib_ipoib > alias net-pf-27 ib_sdp > options lnet networks=o2ibAnd our fstab:> 172.16.1.3 at o2ib:172.16.1.4 at o2ib:/data /lustre > lustre defaults,_netdev,localflock 0 0OpenIB is working properly, we have a subnet manager running and can ping our Lustre OSS and MDS servers over IB. Trying to mount /lustre generates the following error:> mount.lustre: mount 172.16.1.3 at o2ib:172.16.1.4 at o2ib:/data at /lustre > failed: No such device > Are the lustre modules loaded? > Check /etc/modprobe.conf and /proc/filesystems > Note ''alias lustre llite'' should be removed from modprobe.confdmesg shows that the ko2iblnd module cannot be loaded:> Lustre: OBD class driver, http://www.lustre.org/ > Lustre: Lustre Version: 1.8.4 > Lustre: Build Version: > 1.8.4-20100723170646-PRISTINE-2.6.18-194.3.1.el5_lustre.1.8.4 > ko2iblnd: disagrees about version of symbol ib_fmr_pool_unmap > ko2iblnd: Unknown symbol ib_fmr_pool_unmap > ko2iblnd: disagrees about version of symbol ib_create_cq > ko2iblnd: Unknown symbol ib_create_cq > ko2iblnd: disagrees about version of symbol rdma_resolve_addr > ko2iblnd: Unknown symbol rdma_resolve_addr > ko2iblnd: disagrees about version of symbol ib_reg_phys_mr > ko2iblnd: Unknown symbol ib_reg_phys_mr > ko2iblnd: disagrees about version of symbol ib_create_fmr_pool > ko2iblnd: Unknown symbol ib_create_fmr_pool > ko2iblnd: disagrees about version of symbol ib_dereg_mr > ko2iblnd: Unknown symbol ib_dereg_mr > ko2iblnd: disagrees about version of symbol rdma_reject > ko2iblnd: Unknown symbol rdma_reject > ko2iblnd: disagrees about version of symbol rdma_disconnect > ko2iblnd: Unknown symbol rdma_disconnect > ko2iblnd: disagrees about version of symbol rdma_resolve_route > ko2iblnd: Unknown symbol rdma_resolve_route > ko2iblnd: disagrees about version of symbol rdma_bind_addr > ko2iblnd: Unknown symbol rdma_bind_addr > ko2iblnd: disagrees about version of symbol rdma_create_qp > ko2iblnd: Unknown symbol rdma_create_qp > ko2iblnd: disagrees about version of symbol ib_destroy_cq > ko2iblnd: Unknown symbol ib_destroy_cq > ko2iblnd: disagrees about version of symbol rdma_create_id > ko2iblnd: Unknown symbol rdma_create_id > ko2iblnd: disagrees about version of symbol rdma_listen > ko2iblnd: Unknown symbol rdma_listen > ko2iblnd: disagrees about version of symbol rdma_destroy_qp > ko2iblnd: Unknown symbol rdma_destroy_qp > ko2iblnd: disagrees about version of symbol ib_query_device > ko2iblnd: Unknown symbol ib_query_device > ko2iblnd: disagrees about version of symbol ib_get_dma_mr > ko2iblnd: Unknown symbol ib_get_dma_mr > ko2iblnd: disagrees about version of symbol ib_alloc_pd > ko2iblnd: Unknown symbol ib_alloc_pd > ko2iblnd: disagrees about version of symbol rdma_connect > ko2iblnd: Unknown symbol rdma_connect > ko2iblnd: disagrees about version of symbol ib_modify_qp > ko2iblnd: Unknown symbol ib_modify_qp > ko2iblnd: disagrees about version of symbol rdma_destroy_id > ko2iblnd: Unknown symbol rdma_destroy_id > ko2iblnd: disagrees about version of symbol rdma_accept > ko2iblnd: Unknown symbol rdma_accept > ko2iblnd: disagrees about version of symbol ib_dealloc_pd > ko2iblnd: Unknown symbol ib_dealloc_pd > ko2iblnd: disagrees about version of symbol ib_fmr_pool_map_phys > ko2iblnd: Unknown symbol ib_fmr_pool_map_phys > LustreError: 7461:0:(api-ni.c:1081:lnet_startup_lndnis()) Can''t load > LND o2ib, module ko2iblnd, rc=256 > LustreError: 7461:0:(events.c:725:ptlrpc_init_portals()) network > initialisation failedAm I missing something obvious here. Thanks much. -Ed On 06/05/2011 05:48 AM, Wu, Yilei wrote:> we have being use OFED 1.5.1 with Lustre 1.8.4 nowadays on a 400 node > Cluster, on basis of RHEL 5.4. It is no problem at all. > > One thing need attention: > > If using default OFED 1.5.1, just install with RPM package, no need to > build either Lustre or OFED. > > If using revised driver, such as BX-OFED 1.5.1, in some cases, users > need to recompile linux kernel with increased stack size, because > lustre and ofed may use up stack (both are stack greedy) and thus lead > to system hang issue. > > YiLei > > > On Thu, Jun 2, 2011 at 1:36 AM, Kevin Van Maren > <kevin.van.maren at oracle.com <mailto:kevin.van.maren at oracle.com>> wrote: > > OFED 1.5.1 should work fine with Lustre 1.8.4, although I believe more > people are using the in-kernel OFED now: Lustre (finally) defaulted to > the in-kernel OFED for RedHat, so it is no longer _necessary_ to build > either OFED or Lustre. > > Kevin > > > Edward Walter wrote: > > Hi List, > > > > We''re getting ready to upgrade the OS/software stack on one of our > > clusters and I''m looking at which Lustre and OFED versions will > work best. > > > > It looks like the changelog for 1.8.4 and the compatibility > matrix have > > conflicting information. > > > > The Lustre compatibility matrix indicates that on Lustre 1.8.4; the > > highest OFED revision with o2iblnd support is 1.4.2: > > http://wiki.lustre.org/index.php/Lustre_Release_Information > > > > The changelog for 1.8.4 indicates that o2iblnd is supported with > OFED 1.5.1: > > > http://wiki.lustre.org/index.php/Change_Log_1.8#Changes_from_v1.8.3_to_v1.8.4 > > > > > > Can someone clarify whether 1.8.4 supports o2iblnd with OFED > 1.5.1? Are > > there any pitfalls to this configuration? Has anyone found any > > instabilities with this configuration? > > > > Thanks much. > > > > -Ed Walter > > Carnegie Mellon University > > _______________________________________________ > > Lustre-discuss mailing list > > Lustre-discuss at lists.lustre.org > <mailto:Lustre-discuss at lists.lustre.org> > > http://lists.lustre.org/mailman/listinfo/lustre-discuss > > > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > <mailto:Lustre-discuss at lists.lustre.org> > http://lists.lustre.org/mailman/listinfo/lustre-discuss > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20110609/363dd32f/attachment-0001.html
We rebooted after installing the Lustre rpms so that we would be sure that OFED built against the running kernel. We''ve also rebooted a couple of times since then and tried to manually load modules. Here''s the output from /etc/infiniband/info:> prefix=/opt/ofed > Kernel=2.6.18-194.3.1.el5_lustre.1.8.4 > > Configure options: --with-core-mod --with-user_mad-mod > --with-user_access-mod --with-addr_trans-mod --with-mthca-mod > --with-mlx4-mod --with-mlx4_en-mod --with-cxgb3-mod --with-nes-mod > --with-ipoib-modand ''uname -r''> 2.6.18-194.3.1.el5_lustre.1.8.4The kernel-ib version looks correct too:> # rpm -qa |grep kernel-ib > kernel-ib-1.5.1-2.6.18_194.3.1.el5_lustre.1.8.4Doing a manual modprobe on lustre also fails:> # modprobe lustre > WARNING: Error inserting osc > (/lib/modules/2.6.18-194.3.1.el5_lustre.1.8.4/updates/kernel/fs/lustre/osc.ko): > Unknown symbol in module, or unknown parameter (see dmesg) > WARNING: Error inserting mdc > (/lib/modules/2.6.18-194.3.1.el5_lustre.1.8.4/updates/kernel/fs/lustre/mdc.ko): > Unknown symbol in module, or unknown parameter (see dmesg) > WARNING: Error inserting lov > (/lib/modules/2.6.18-194.3.1.el5_lustre.1.8.4/updates/kernel/fs/lustre/lov.ko): > Unknown symbol in module, or unknown parameter (see dmesg) > FATAL: Error inserting lustre > (/lib/modules/2.6.18-194.3.1.el5_lustre.1.8.4/updates/kernel/fs/lustre/lustre.ko): > Unknown symbol in module, or unknown parameter (see dmesg)As far as symbol versions are concerned; aren''t these all defined in the kernel-headers and kernel-devel packages? The versions we''re using match our Lustre kernel version:> # rpm -qa |grep kernel-devel > kernel-devel-2.6.18-194.3.1.el5_lustre.1.8.4 > # rpm -qa |grep kernel-headers > kernel-headers-2.6.18-194.3.1.el5_lustre.1.8.4Thanks again. -Ed On 06/09/2011 05:53 PM, Hebenstreit, Michael wrote:> are you sure you did a reboot after installing the mdoules? otherwise > this looks like a build error where outdated symbols were used > Michael > > ------------------------------------------------------------------------ > *From:* lustre-discuss-bounces at lists.lustre.org > [mailto:lustre-discuss-bounces at lists.lustre.org] *On Behalf Of *Edward > Walter > *Sent:* Thursday, June 09, 2011 1:56 PM > *To:* lustre-discuss at lists.lustre.org > *Subject:* Re: [Lustre-discuss] lustre ofed compatibility > > Thanks for all of the advice here. We seem to be running into a > hiccup using Lustre 1.8.4 with O2IB and OFED 1.5.1 > > First of all, our lustre servers are all up and running fine (using > the vendor OFED - 1.4.1). Our trouble is all client side. > > We want to use a newer OFED (1.5.1) to potentially enable NFS over > RDMA (we have NFS servers in addition to lustre). > > We installed the current Lustre 1.8.4 rpms from Sun/Oracle: >> kernel-2.6.18-194.3.1.el5_lustre.1.8.4 >> lustre-1.8.4-2.6.18_194.3.1.el5_lustre.1.8.4 >> lustre-modules-1.8.4-2.6.18_194.3.1.el5_lustre.1.8.4 >> >> kernel-devel-2.6.18-194.3.1.el5_lustre.1.8.4 >> kernel-headers-2.6.18-194.3.1.el5_lustre.1.8.4 > > We rebooted with kernel-2.6.18-194.3.1.el5_lustre.1.8.4. > > Next we downloaded the OFED 1.5.1 sources and built the basic and hpc > packages. These built and installed without incident. I don''t > believe Open Fabrics group provides binary RPMS. Otherwise; we would > have used them. > > Here are the lustre/IB lines from our modprobe.conf: >> alias ib0 ib_ipoib >> alias net-pf-27 ib_sdp >> options lnet networks=o2ib > > And our fstab: >> 172.16.1.3 at o2ib:172.16.1.4 at o2ib:/data >> /lustre lustre defaults,_netdev,localflock 0 0 > > OpenIB is working properly, we have a subnet manager running and can > ping our Lustre OSS and MDS servers over IB. > > Trying to mount /lustre generates the following error: >> mount.lustre: mount 172.16.1.3 at o2ib:172.16.1.4 at o2ib:/data at /lustre >> failed: No such device >> Are the lustre modules loaded? >> Check /etc/modprobe.conf and /proc/filesystems >> Note ''alias lustre llite'' should be removed from modprobe.conf > > dmesg shows that the ko2iblnd module cannot be loaded: >> Lustre: OBD class driver, http://www.lustre.org/ >> Lustre: Lustre Version: 1.8.4 >> Lustre: Build Version: >> 1.8.4-20100723170646-PRISTINE-2.6.18-194.3.1.el5_lustre.1.8.4 >> ko2iblnd: disagrees about version of symbol ib_fmr_pool_unmap >> ko2iblnd: Unknown symbol ib_fmr_pool_unmap >> ko2iblnd: disagrees about version of symbol ib_create_cq >> ko2iblnd: Unknown symbol ib_create_cq >> ko2iblnd: disagrees about version of symbol rdma_resolve_addr >> ko2iblnd: Unknown symbol rdma_resolve_addr >> ko2iblnd: disagrees about version of symbol ib_reg_phys_mr >> ko2iblnd: Unknown symbol ib_reg_phys_mr >> ko2iblnd: disagrees about version of symbol ib_create_fmr_pool >> ko2iblnd: Unknown symbol ib_create_fmr_pool >> ko2iblnd: disagrees about version of symbol ib_dereg_mr >> ko2iblnd: Unknown symbol ib_dereg_mr >> ko2iblnd: disagrees about version of symbol rdma_reject >> ko2iblnd: Unknown symbol rdma_reject >> ko2iblnd: disagrees about version of symbol rdma_disconnect >> ko2iblnd: Unknown symbol rdma_disconnect >> ko2iblnd: disagrees about version of symbol rdma_resolve_route >> ko2iblnd: Unknown symbol rdma_resolve_route >> ko2iblnd: disagrees about version of symbol rdma_bind_addr >> ko2iblnd: Unknown symbol rdma_bind_addr >> ko2iblnd: disagrees about version of symbol rdma_create_qp >> ko2iblnd: Unknown symbol rdma_create_qp >> ko2iblnd: disagrees about version of symbol ib_destroy_cq >> ko2iblnd: Unknown symbol ib_destroy_cq >> ko2iblnd: disagrees about version of symbol rdma_create_id >> ko2iblnd: Unknown symbol rdma_create_id >> ko2iblnd: disagrees about version of symbol rdma_listen >> ko2iblnd: Unknown symbol rdma_listen >> ko2iblnd: disagrees about version of symbol rdma_destroy_qp >> ko2iblnd: Unknown symbol rdma_destroy_qp >> ko2iblnd: disagrees about version of symbol ib_query_device >> ko2iblnd: Unknown symbol ib_query_device >> ko2iblnd: disagrees about version of symbol ib_get_dma_mr >> ko2iblnd: Unknown symbol ib_get_dma_mr >> ko2iblnd: disagrees about version of symbol ib_alloc_pd >> ko2iblnd: Unknown symbol ib_alloc_pd >> ko2iblnd: disagrees about version of symbol rdma_connect >> ko2iblnd: Unknown symbol rdma_connect >> ko2iblnd: disagrees about version of symbol ib_modify_qp >> ko2iblnd: Unknown symbol ib_modify_qp >> ko2iblnd: disagrees about version of symbol rdma_destroy_id >> ko2iblnd: Unknown symbol rdma_destroy_id >> ko2iblnd: disagrees about version of symbol rdma_accept >> ko2iblnd: Unknown symbol rdma_accept >> ko2iblnd: disagrees about version of symbol ib_dealloc_pd >> ko2iblnd: Unknown symbol ib_dealloc_pd >> ko2iblnd: disagrees about version of symbol ib_fmr_pool_map_phys >> ko2iblnd: Unknown symbol ib_fmr_pool_map_phys >> LustreError: 7461:0:(api-ni.c:1081:lnet_startup_lndnis()) Can''t load >> LND o2ib, module ko2iblnd, rc=256 >> LustreError: 7461:0:(events.c:725:ptlrpc_init_portals()) network >> initialisation failed > > Am I missing something obvious here. > > Thanks much. > > -Ed > > On 06/05/2011 05:48 AM, Wu, Yilei wrote: >> we have being use OFED 1.5.1 with Lustre 1.8.4 nowadays on a 400 node >> Cluster, on basis of RHEL 5.4. It is no problem at all. >> >> One thing need attention: >> >> If using default OFED 1.5.1, just install with RPM package, no need >> to build either Lustre or OFED. >> >> If using revised driver, such as BX-OFED 1.5.1, in some cases, users >> need to recompile linux kernel with increased stack size, because >> lustre and ofed may use up stack (both are stack greedy) and thus >> lead to system hang issue. >> >> YiLei >> >> >> On Thu, Jun 2, 2011 at 1:36 AM, Kevin Van Maren >> <kevin.van.maren at oracle.com <mailto:kevin.van.maren at oracle.com>> wrote: >> >> OFED 1.5.1 should work fine with Lustre 1.8.4, although I believe >> more >> people are using the in-kernel OFED now: Lustre (finally) >> defaulted to >> the in-kernel OFED for RedHat, so it is no longer _necessary_ to >> build >> either OFED or Lustre. >> >> Kevin >> >> >> Edward Walter wrote: >> > Hi List, >> > >> > We''re getting ready to upgrade the OS/software stack on one of our >> > clusters and I''m looking at which Lustre and OFED versions will >> work best. >> > >> > It looks like the changelog for 1.8.4 and the compatibility >> matrix have >> > conflicting information. >> > >> > The Lustre compatibility matrix indicates that on Lustre 1.8.4; the >> > highest OFED revision with o2iblnd support is 1.4.2: >> > http://wiki.lustre.org/index.php/Lustre_Release_Information >> > >> > The changelog for 1.8.4 indicates that o2iblnd is supported >> with OFED 1.5.1: >> > >> http://wiki.lustre.org/index.php/Change_Log_1.8#Changes_from_v1.8.3_to_v1.8.4 >> > >> > >> > Can someone clarify whether 1.8.4 supports o2iblnd with OFED >> 1.5.1? Are >> > there any pitfalls to this configuration? Has anyone found any >> > instabilities with this configuration? >> > >> > Thanks much. >> > >> > -Ed Walter >> > Carnegie Mellon University >> > _______________________________________________ >> > Lustre-discuss mailing list >> > Lustre-discuss at lists.lustre.org >> <mailto:Lustre-discuss at lists.lustre.org> >> > http://lists.lustre.org/mailman/listinfo/lustre-discuss >> > >> >> _______________________________________________ >> Lustre-discuss mailing list >> Lustre-discuss at lists.lustre.org >> <mailto:Lustre-discuss at lists.lustre.org> >> http://lists.lustre.org/mailman/listinfo/lustre-discuss >> >> >>-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20110609/90331aed/attachment.html
You must rebuild Lustre if you replace OFED. Kevin On Jun 9, 2011, at 4:55 PM, Edward Walter <ewalter at cs.cmu.edu> wrote:> Thanks for all of the advice here. We seem to be running into a > hiccup using Lustre 1.8.4 with O2IB and OFED 1.5.1 > > First of all, our lustre servers are all up and running fine (using > the vendor OFED - 1.4.1). Our trouble is all client side. > > We want to use a newer OFED (1.5.1) to potentially enable NFS > over RDMA (we have NFS servers in addition to lustre). > > We installed the current Lustre 1.8.4 rpms from Sun/Oracle: >> kernel-2.6.18-194.3.1.el5_lustre.1.8.4 >> lustre-1.8.4-2.6.18_194.3.1.el5_lustre.1.8.4 >> lustre-modules-1.8.4-2.6.18_194.3.1.el5_lustre.1.8.4 >> >> kernel-devel-2.6.18-194.3.1.el5_lustre.1.8.4 >> kernel-headers-2.6.18-194.3.1.el5_lustre.1.8.4 > > We rebooted with kernel-2.6.18-194.3.1.el5_lustre.1.8.4. > > Next we downloaded the OFED 1.5.1 sources and built the basic and > hpc packages. These built and installed without incident. I don''t > believe Open Fabrics group provides binary RPMS. Otherwise; we > would have used them. > > Here are the lustre/IB lines from our modprobe.conf: >> alias ib0 ib_ipoib >> alias net-pf-27 ib_sdp >> options lnet networks=o2ib > > And our fstab: >> 172.16.1.3 at o2ib:172.16.1.4 at o2ib:/data / >> lustre lustre defaults,_netdev,localflock 0 0 > > OpenIB is working properly, we have a subnet manager running and can > ping our Lustre OSS and MDS servers over IB. > > Trying to mount /lustre generates the following error: >> mount.lustre: mount 172.16.1.3 at o2ib:172.16.1.4 at o2ib:/data at / >> lustre failed: No such device >> Are the lustre modules loaded? >> Check /etc/modprobe.conf and /proc/filesystems >> Note ''alias lustre llite'' should be removed from modprobe.conf > > dmesg shows that the ko2iblnd module cannot be loaded: >> Lustre: OBD class driver, http://www.lustre.org/ >> Lustre: Lustre Version: 1.8.4 >> Lustre: Build Version: 1.8.4-20100723170646- >> PRISTINE-2.6.18-194.3.1.el5_lustre.1.8.4 >> ko2iblnd: disagrees about version of symbol ib_fmr_pool_unmap >> ko2iblnd: Unknown symbol ib_fmr_pool_unmap >> ko2iblnd: disagrees about version of symbol ib_create_cq >> ko2iblnd: Unknown symbol ib_create_cq >> ko2iblnd: disagrees about version of symbol rdma_resolve_addr >> ko2iblnd: Unknown symbol rdma_resolve_addr >> ko2iblnd: disagrees about version of symbol ib_reg_phys_mr >> ko2iblnd: Unknown symbol ib_reg_phys_mr >> ko2iblnd: disagrees about version of symbol ib_create_fmr_pool >> ko2iblnd: Unknown symbol ib_create_fmr_pool >> ko2iblnd: disagrees about version of symbol ib_dereg_mr >> ko2iblnd: Unknown symbol ib_dereg_mr >> ko2iblnd: disagrees about version of symbol rdma_reject >> ko2iblnd: Unknown symbol rdma_reject >> ko2iblnd: disagrees about version of symbol rdma_disconnect >> ko2iblnd: Unknown symbol rdma_disconnect >> ko2iblnd: disagrees about version of symbol rdma_resolve_route >> ko2iblnd: Unknown symbol rdma_resolve_route >> ko2iblnd: disagrees about version of symbol rdma_bind_addr >> ko2iblnd: Unknown symbol rdma_bind_addr >> ko2iblnd: disagrees about version of symbol rdma_create_qp >> ko2iblnd: Unknown symbol rdma_create_qp >> ko2iblnd: disagrees about version of symbol ib_destroy_cq >> ko2iblnd: Unknown symbol ib_destroy_cq >> ko2iblnd: disagrees about version of symbol rdma_create_id >> ko2iblnd: Unknown symbol rdma_create_id >> ko2iblnd: disagrees about version of symbol rdma_listen >> ko2iblnd: Unknown symbol rdma_listen >> ko2iblnd: disagrees about version of symbol rdma_destroy_qp >> ko2iblnd: Unknown symbol rdma_destroy_qp >> ko2iblnd: disagrees about version of symbol ib_query_device >> ko2iblnd: Unknown symbol ib_query_device >> ko2iblnd: disagrees about version of symbol ib_get_dma_mr >> ko2iblnd: Unknown symbol ib_get_dma_mr >> ko2iblnd: disagrees about version of symbol ib_alloc_pd >> ko2iblnd: Unknown symbol ib_alloc_pd >> ko2iblnd: disagrees about version of symbol rdma_connect >> ko2iblnd: Unknown symbol rdma_connect >> ko2iblnd: disagrees about version of symbol ib_modify_qp >> ko2iblnd: Unknown symbol ib_modify_qp >> ko2iblnd: disagrees about version of symbol rdma_destroy_id >> ko2iblnd: Unknown symbol rdma_destroy_id >> ko2iblnd: disagrees about version of symbol rdma_accept >> ko2iblnd: Unknown symbol rdma_accept >> ko2iblnd: disagrees about version of symbol ib_dealloc_pd >> ko2iblnd: Unknown symbol ib_dealloc_pd >> ko2iblnd: disagrees about version of symbol ib_fmr_pool_map_phys >> ko2iblnd: Unknown symbol ib_fmr_pool_map_phys >> LustreError: 7461:0:(api-ni.c:1081:lnet_startup_lndnis()) >> Can''t load LND o2ib, module ko2iblnd, rc=256 >> LustreError: 7461:0:(events.c:725:ptlrpc_init_portals()) network >> initialisation failed > > Am I missing something obvious here. > > Thanks much. > > -Ed > > On 06/05/2011 05:48 AM, Wu, Yilei wrote: >> >> we have being use OFED 1.5.1 with Lustre 1.8.4 nowadays on a 400 >> node Cluster, on basis of RHEL 5.4. It is no problem at all. >> >> One thing need attention: >> >> If using default OFED 1.5.1, just install with RPM package, no need >> to build either Lustre or OFED. >> >> If using revised driver, such as BX-OFED 1.5.1, in some cases, >> users need to recompile linux kernel with increased stack size, >> because lustre and ofed may use up stack (both are stack greedy) >> and thus lead to system hang issue. >> >> YiLei >> >> >> On Thu, Jun 2, 2011 at 1:36 AM, Kevin Van Maren <kevin.van.maren at oracle.com >> > wrote: >> OFED 1.5.1 should work fine with Lustre 1.8.4, although I believe >> more >> people are using the in-kernel OFED now: Lustre (finally) defaulted >> to >> the in-kernel OFED for RedHat, so it is no longer _necessary_ to >> build >> either OFED or Lustre. >> >> Kevin >> >> >> Edward Walter wrote: >> > Hi List, >> > >> > We''re getting ready to upgrade the OS/software stack on one of our >> > clusters and I''m looking at which Lustre and OFED versions will >> work best. >> > >> > It looks like the changelog for 1.8.4 and the compatibility >> matrix have >> > conflicting information. >> > >> > The Lustre compatibility matrix indicates that on Lustre 1.8.4; the >> > highest OFED revision with o2iblnd support is 1.4.2: >> > http://wiki.lustre.org/index.php/Lustre_Release_Information >> > >> > The changelog for 1.8.4 indicates that o2iblnd is supported with >> OFED 1.5.1: >> > http://wiki.lustre.org/index.php/Change_Log_1.8#Changes_from_v1.8.3_to_v1.8.4 >> > >> > >> > Can someone clarify whether 1.8.4 supports o2iblnd with OFED >> 1.5.1? Are >> > there any pitfalls to this configuration? Has anyone found any >> > instabilities with this configuration? >> > >> > Thanks much. >> > >> > -Ed Walter >> > Carnegie Mellon University >> > _______________________________________________ >> > Lustre-discuss mailing list >> > Lustre-discuss at lists.lustre.org >> > http://lists.lustre.org/mailman/listinfo/lustre-discuss >> > >> >> _______________________________________________ >> Lustre-discuss mailing list >> Lustre-discuss at lists.lustre.org >> http://lists.lustre.org/mailman/listinfo/lustre-discuss >> >> > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20110609/11b05438/attachment-0001.html
>Doing a manual modprobe on lustre also fails: >> # modprobe lustre >> WARNING: Error inserting osc >> (/lib/modules/2.6.18-194.3.1.el5_lustre.1.8.4/updates/kernel/fs/lustre/osc.ko):The core problem is Lustre is compiled using the wrong symbol versions for the OFed you''ve actually got in the kernel. This might help you figure out what is wrong: http://lists.lustre.org/pipermail/lustre-discuss/2010-March/012853.html My suggestion: read that message, poke around a bit (that message will explain how) and figure out how the symbol versions differ, and fix it. If that message doesn''t help you, let us know and we can go from there. --Ken