Snider, Tim
2007-Feb-05 07:40 UTC
[Lustre-discuss] [Lustre-devel] Using Infiniband with 1.5.95
Skipped content of type multipart/alternative-------------- next part -------------- _______________________________________________ Lustre-devel mailing list Lustre-devel@clusterfs.com https://mail.clusterfs.com/mailman/listinfo/lustre-devel
Eric Barton
2007-Feb-05 09:39 UTC
[Lustre-discuss] [Lustre-devel] Using Infiniband with 1.5.95
Is that OFED 1.1? Does /etc/modprobe.conf contain... options lnet networks=o2ib ...or the equivalent using ip2nets? If this isn''t clear, please see the lustre manual for an explanation of network setup. Can you bring up lustre networking on the mgs and a client node... modprobe lnet; lctl net up ...and then check /proc/sys/lnet/nis? It should list the local NIDs (e.g.... <ipoib IP address>@o2ib 0@lo ...). If that looks OK, run an lnet ping from the client to the MGS... lctl ping 182.168.3.3@o2ib Please note that by default, network error messages are logged internally, but are not printed to the console or /var/log/messages, so it may help to "echo + neterror > /proc/sys/lnet/printk" to enable verbose network messages while you are debugging connectivity. Cheers, Eric _____ From: lustre-discuss-bounces@clusterfs.com [mailto:lustre-discuss-bounces@clusterfs.com] On Behalf Of Snider, Tim Sent: 05 February 2007 2:40 PM To: lustre-discuss@clusterfs.com Subject: [Lustre-discuss] [Lustre-devel] Using Infiniband with 1.5.95 We''re trying to set up a Lustre configuration using infiniband ipoib with 1.5.95. openib 1.1 (was formally openib gen 2) is installed. We can successfully ping between the mdt/mgs nad ost servers using the ipoib address. Lustre fs creation is "apparently" successfull. Mounting the lustre device fails. 1. Does 1.5.95 work properly with ipoib? 2. What is the proper form of mgsnode specification, should o2ib or openiib be used? 2.a Should we specify the ipoib address or the adapter/port #? The ost command line we''re trying is: mkfs.lustre --fsname=testfs --mgsnode=192.168.3.3@o2ib /dev/sdb1 Thanks, Timothy Snider Storage Architect Strategic Planning, Technology and Architecture LSI Logic Corporation 3718 North Rock Road Wichita, KS 67226 (316) 636-8736 <mailto:tim.snider@lsi.com> tim.snider@lsi.com -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.clusterfs.com/pipermail/lustre-discuss/attachments/20070205/8f96572e/attachment-0001.html
Snider, Tim
2007-Feb-06 09:20 UTC
[Lustre-discuss] [Lustre-devel] Using Infiniband with 1.5.95
I can successfully ping other servers thru ib using ipoib ip addresses. Loading lnet or trying to mount a lustre device using o2ib using OFED 1.1.1 modprobe lnet generates complaints about symbol versions of ib related routines. What versions of the OFED driver (1.0, 1.1, or 1.1.1) are compatible with Lustre 1.5.95? Thanks for the advice. Tim /etc/modprobe.conf alias eth0 tg3 alias eth1 tg3 alias scsi_hostadapter mptbase alias scsi_hostadapter1 mptscsih alias usb-controller ohci-hcd options lnet networks=tcp,o2ib # specify both ethernet and ib networks for Lustre. alias ib0 ib_ipoib alias ib1 ib_ipoib alias net-pf-27 ib_sdp Sample of messages: Feb 6 14:34:21 FedoraCore121 root: =========start lnet and debug Feb 6 14:34:27 FedoraCore121 kernel: Lustre: 2306:0:(module.c:382:init_libcfs_module()) maximum lustre stack 8192 Feb 6 14:34:46 FedoraCore121 kernel: Lustre: Added LNI 172.22.14.121@tcp <mailto:172.22.14.121@tcp> [8/256] Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about version of symbol ib_create_cq Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol ib_create_cq Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about version of symbol rdma_resolve_addr Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol rdma_resolve_addr Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about version of symbol ib_dereg_mr Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol ib_dereg_mr Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about version of symbol rdma_reject Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol rdma_reject Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about version of symbol rdma_disconnect Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol rdma_disconnect Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about version of symbol rdma_resolve_route Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol rdma_resolve_route Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about version of symbol rdma_bind_addr Feb 6 14:34:46 FedoraCore121 modprobe: FATAL: Error inserting ko2iblnd (/lib/modules/2.6.9-42.EL_lustre.1.5.95smp/kernel/net/lustre/ko2iblnd.ko ): Unknown symbol in module, or unknown parameter (see dmesg) Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol rdma_bind_addr Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about version of symbol rdma_create_qp Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol rdma_create_qp Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about version of symbol ib_destroy_cq Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol ib_destroy_cq Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about version of symbol rdma_create_id Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol rdma_create_id Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about version of symbol rdma_listen Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol rdma_listen Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about version of symbol rdma_destroy_qp Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol rdma_destroy_qp Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about version of symbol ib_get_dma_mr Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol ib_get_dma_mr Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about version of symbol ib_alloc_pd Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol ib_alloc_pd Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about version of symbol rdma_connect Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol rdma_connect Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about version of symbol ib_modify_qp Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol ib_modify_qp Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about version of symbol rdma_destroy_id Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol rdma_destroy_id Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about version of symbol rdma_accept Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol rdma_accept Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about version of symbol ib_dealloc_pd Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol ib_dealloc_pd Feb 6 14:34:47 FedoraCore121 kernel: Lustre: Removed LNI 172.22.14.121@tcp <mailto:172.22.14.121@tcp> Feb 6 14:35:01 FedoraCore121 sendmail[2268]: sql_select option missing Feb 6 14:35:01 FedoraCore121 sendmail[2268]: auxpropfunc error no mechanism available ________________________________ From: Eric Barton [mailto:eeb@bartonsoftware.com] Sent: Monday, February 05, 2007 10:42 AM To: Snider, Tim; lustre-discuss@clusterfs.com Subject: RE: [Lustre-discuss] [Lustre-devel] Using Infiniband with 1.5.95 Is that OFED 1.1? Does /etc/modprobe.conf contain... options lnet networks=o2ib ...or the equivalent using ip2nets? If this isn''t clear, please see the lustre manual for an explanation of network setup. Can you bring up lustre networking on the mgs and a client node... modprobe lnet; lctl net up ...and then check /proc/sys/lnet/nis? It should list the local NIDs (e.g.... <ipoib IP address>@o2ib 0@lo ...). If that looks OK, run an lnet ping from the client to the MGS... lctl ping 182.168.3.3@o2ib Please note that by default, network error messages are logged internally, but are not printed to the console or /var/log/messages, so it may help to "echo + neterror > /proc/sys/lnet/printk" to enable verbose network messages while you are debugging connectivity. Cheers, Eric ________________________________ From: lustre-discuss-bounces@clusterfs.com [mailto:lustre-discuss-bounces@clusterfs.com] On Behalf Of Snider, Tim Sent: 05 February 2007 2:40 PM To: lustre-discuss@clusterfs.com Subject: [Lustre-discuss] [Lustre-devel] Using Infiniband with 1.5.95 We''re trying to set up a Lustre configuration using infiniband ipoib with 1.5.95. openib 1.1 (was formally openib gen 2) is installed. We can successfully ping between the mdt/mgs nad ost servers using the ipoib address. Lustre fs creation is "apparently" successfull. Mounting the lustre device fails. 1. Does 1.5.95 work properly with ipoib? 2. What is the proper form of mgsnode specification, should o2ib or openiib be used? 2.a Should we specify the ipoib address or the adapter/port #? The ost command line we''re trying is: mkfs.lustre --fsname=testfs --mgsnode=192.168.3.3@o2ib /dev/sdb1 Thanks, Timothy Snider Storage Architect Strategic Planning, Technology and Architecture LSI Logic Corporation 3718 North Rock Road Wichita, KS 67226 (316) 636-8736 tim.snider@lsi.com <mailto:tim.snider@lsi.com> -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.clusterfs.com/pipermail/lustre-discuss/attachments/20070206/4ee59671/attachment.html
Snider, Tim
2007-Feb-06 13:46 UTC
[Lustre-discuss] [Lustre-devel] Using Infiniband with 1.5.95
Ok - more details. ipoib itself is working on all servers. there are ipoib ping utilities that run successfully between all the servers in the fabric. I was able to successfully mount on the mdt/mgs after installing Lustre modules by hand using the force option. Mounting the OST device still fails. ptlrpc refuses to load manually with the force option. All kernel / lustre versions are identical between the servers. What am I missing? uname -a Linux FedoraCore120 2.6.9-42.EL_lustre.1.5.95smp #1 SMP Thu Sep 28 06:36:13 MDT 2006 i686 i686 i386 GNU/Linux [root@FedoraCore120 mnt]# modprobe -vf ptlrpc insmod /lib/modules/2.6.9-42.EL_lustre.1.5.95smp/kernel/fs/lustre/ptlrpc.ko FATAL: Error inserting ptlrpc (/lib/modules/2.6.9-42.EL_lustre.1.5.95smp/kernel/fs/lustre/ptlrpc.ko): Input/output error /var/log/messages Feb 6 17:03:20 FedoraCore120 kernel: ptlrpc: no version magic, tainting kernel. Feb 6 17:03:20 FedoraCore120 kernel: Lustre: Added LNI 172.22.14.120@tcp [8/256] Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: disagrees about version of symbol ib_create_cq Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: Unknown symbol ib_create_cq Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: disagrees about version of symbol rdma_resolve_addr Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: Unknown symbol rdma_resolve_addr Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: disagrees about version of symbol ib_dereg_mr Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: Unknown symbol ib_dereg_mr Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: disagrees about version of symbol rdma_reject Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: Unknown symbol rdma_reject Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: disagrees about version of symbol rdma_disconnect Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: Unknown symbol rdma_disconnect Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: disagrees about version of symbol rdma_resolve_route Feb 6 17:03:20 FedoraCore120 modprobe: FATAL: Error inserting ko2iblnd (/lib/modules/2.6.9-42.EL_lustre.1.5.95smp/kernel/net/lustre/ko2iblnd.ko ): Unknown symbol in module, or unknown parameter (see dmesg) Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: Unknown symbol rdma_resolve_route Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: disagrees about version of symbol rdma_bind_addr Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: Unknown symbol rdma_bind_addr Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: disagrees about version of symbol rdma_create_qp <<<similar messages are displayed for awhile same as before>>> Feb 6 17:03:21 FedoraCore120 kernel: ko2iblnd: disagrees about version of symbol ib_dealloc_pd Feb 6 17:03:21 FedoraCore120 kernel: ko2iblnd: Unknown symbol ib_dealloc_pd Feb 6 17:03:21 FedoraCore120 kernel: LustreError: 4753:0:(api-ni.c:1002:lnet_startup_lndnis()) Can''t load LND o2ib, module ko2iblnd, rc=256 Feb 6 17:03:21 FedoraCore120 kernel: Lustre: Removed LNI 172.22.14.120@tcp Feb 6 17:03:21 FedoraCore120 kernel: LustreError: 4753:0:(events.c:581:ptlrpc_init_portals()) network initialisation failed ________________________________ From: lustre-discuss-bounces@clusterfs.com [mailto:lustre-discuss-bounces@clusterfs.com] On Behalf Of Snider, Tim Sent: Tuesday, February 06, 2007 10:19 AM To: Eric Barton; lustre-discuss@clusterfs.com Subject: RE: [Lustre-discuss] [Lustre-devel] Using Infiniband with 1.5.95 I can successfully ping other servers thru ib using ipoib ip addresses. Loading lnet or trying to mount a lustre device using o2ib using OFED 1.1.1 modprobe lnet generates complaints about symbol versions of ib related routines. What versions of the OFED driver (1.0, 1.1, or 1.1.1) are compatible with Lustre 1.5.95? Thanks for the advice. Tim /etc/modprobe.conf alias eth0 tg3 alias eth1 tg3 alias scsi_hostadapter mptbase alias scsi_hostadapter1 mptscsih alias usb-controller ohci-hcd options lnet networks=tcp,o2ib # specify both ethernet and ib networks for Lustre. alias ib0 ib_ipoib alias ib1 ib_ipoib alias net-pf-27 ib_sdp Sample of messages: Feb 6 14:34:21 FedoraCore121 root: =========start lnet and debug Feb 6 14:34:27 FedoraCore121 kernel: Lustre: 2306:0:(module.c:382:init_libcfs_module()) maximum lustre stack 8192 Feb 6 14:34:46 FedoraCore121 kernel: Lustre: Added LNI 172.22.14.121@tcp <mailto:172.22.14.121@tcp> [8/256] Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about version of symbol ib_create_cq Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol ib_create_cq Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about version of symbol rdma_resolve_addr Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol rdma_resolve_addr Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about version of symbol ib_dereg_mr Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol ib_dereg_mr Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about version of symbol rdma_reject Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol rdma_reject Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about version of symbol rdma_disconnect Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol rdma_disconnect Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about version of symbol rdma_resolve_route Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol rdma_resolve_route Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about version of symbol rdma_bind_addr Feb 6 14:34:46 FedoraCore121 modprobe: FATAL: Error inserting ko2iblnd (/lib/modules/2.6.9-42.EL_lustre.1.5.95smp/kernel/net/lustre/ko2iblnd.ko ): Unknown symbol in module, or unknown parameter (see dmesg) Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol rdma_bind_addr Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about version of symbol rdma_create_qp Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol rdma_create_qp Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about version of symbol ib_destroy_cq Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol ib_destroy_cq Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about version of symbol rdma_create_id Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol rdma_create_id Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about version of symbol rdma_listen Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol rdma_listen Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about version of symbol rdma_destroy_qp Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol rdma_destroy_qp Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about version of symbol ib_get_dma_mr Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol ib_get_dma_mr Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about version of symbol ib_alloc_pd Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol ib_alloc_pd Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about version of symbol rdma_connect Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol rdma_connect Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about version of symbol ib_modify_qp Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol ib_modify_qp Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about version of symbol rdma_destroy_id Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol rdma_destroy_id Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about version of symbol rdma_accept Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol rdma_accept Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about version of symbol ib_dealloc_pd Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol ib_dealloc_pd Feb 6 14:34:47 FedoraCore121 kernel: Lustre: Removed LNI 172.22.14.121@tcp <mailto:172.22.14.121@tcp> Feb 6 14:35:01 FedoraCore121 sendmail[2268]: sql_select option missing Feb 6 14:35:01 FedoraCore121 sendmail[2268]: auxpropfunc error no mechanism available ________________________________ From: Eric Barton [mailto:eeb@bartonsoftware.com] Sent: Monday, February 05, 2007 10:42 AM To: Snider, Tim; lustre-discuss@clusterfs.com Subject: RE: [Lustre-discuss] [Lustre-devel] Using Infiniband with 1.5.95 Is that OFED 1.1? Does /etc/modprobe.conf contain... options lnet networks=o2ib ...or the equivalent using ip2nets? If this isn''t clear, please see the lustre manual for an explanation of network setup. Can you bring up lustre networking on the mgs and a client node... modprobe lnet; lctl net up ...and then check /proc/sys/lnet/nis? It should list the local NIDs (e.g.... <ipoib IP address>@o2ib 0@lo ...). If that looks OK, run an lnet ping from the client to the MGS... lctl ping 182.168.3.3@o2ib Please note that by default, network error messages are logged internally, but are not printed to the console or /var/log/messages, so it may help to "echo + neterror > /proc/sys/lnet/printk" to enable verbose network messages while you are debugging connectivity. Cheers, Eric ________________________________ From: lustre-discuss-bounces@clusterfs.com [mailto:lustre-discuss-bounces@clusterfs.com] On Behalf Of Snider, Tim Sent: 05 February 2007 2:40 PM To: lustre-discuss@clusterfs.com Subject: [Lustre-discuss] [Lustre-devel] Using Infiniband with 1.5.95 We''re trying to set up a Lustre configuration using infiniband ipoib with 1.5.95. openib 1.1 (was formally openib gen 2) is installed. We can successfully ping between the mdt/mgs nad ost servers using the ipoib address. Lustre fs creation is "apparently" successfull. Mounting the lustre device fails. 1. Does 1.5.95 work properly with ipoib? 2. What is the proper form of mgsnode specification, should o2ib or openiib be used? 2.a Should we specify the ipoib address or the adapter/port #? The ost command line we''re trying is: mkfs.lustre --fsname=testfs --mgsnode=192.168.3.3@o2ib /dev/sdb1 Thanks, Timothy Snider Storage Architect Strategic Planning, Technology and Architecture LSI Logic Corporation 3718 North Rock Road Wichita, KS 67226 (316) 636-8736 tim.snider@lsi.com <mailto:tim.snider@lsi.com> -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.clusterfs.com/pipermail/lustre-discuss/attachments/20070206/0967a5b5/attachment-0001.html
Nathaniel Rutman
2007-Feb-06 15:25 UTC
[Lustre-discuss] [Lustre-devel] Using Infiniband with 1.5.95
This is strictly a compile issue -- Lustre won''t work over o2ib until the ko2iblnd module can load successfully. The default header path the o2iblnd uses is $LINUX/drivers/infiniband - you need to make sure Lustre is compiled against the o2ib/OFED headers that your kernel modules actually use. The ./configure flag for Lustre is: --with-o2ib=path build o2iblnd against path HTH Snider, Tim wrote:> Ok - more details. ipoib itself is working on all servers. there are > ipoib ping utilities that run successfully between all the servers in > the fabric. > I was able to successfully mount on the mdt/mgs after installing > Lustre modules by hand using the force option. > Mounting the OST device still fails. ptlrpc refuses to load manually > with the force option. All kernel / lustre versions are identical > between the servers. > > What am I missing? > > uname -a > Linux FedoraCore120 2.6.9-42.EL_lustre.1.5.95smp #1 SMP Thu > Sep 28 06:36:13 MDT 2006 i686 i686 i386 GNU/Linux > [root@FedoraCore120 mnt]# modprobe -vf ptlrpc > insmod > /lib/modules/2.6.9-42.EL_lustre.1.5.95smp/kernel/fs/lustre/ptlrpc.ko > FATAL: Error inserting ptlrpc > (/lib/modules/2.6.9-42.EL_lustre.1.5.95smp/kernel/fs/lustre/ptlrpc.ko): > Input/output error > /var/log/messages > Feb 6 17:03:20 FedoraCore120 kernel: ptlrpc: no version magic, > tainting kernel. > Feb 6 17:03:20 FedoraCore120 kernel: Lustre: Added LNI > 172.22.14.120@tcp <mailto:172.22.14.120@tcp> [8/256] > Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: disagrees about > version of symbol ib_create_cq > Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: Unknown symbol > ib_create_cq > Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: disagrees about > version of symbol rdma_resolve_addr > Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: Unknown symbol > rdma_resolve_addr > Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: disagrees about > version of symbol ib_dereg_mr > Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: Unknown symbol > ib_dereg_mr > Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: disagrees about > version of symbol rdma_reject > Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: Unknown symbol > rdma_reject > Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: disagrees about > version of symbol rdma_disconnect > Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: Unknown symbol > rdma_disconnect > Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: disagrees about > version of symbol rdma_resolve_route > Feb 6 17:03:20 FedoraCore120 modprobe: FATAL: Error inserting > ko2iblnd > (/lib/modules/2.6.9-42.EL_lustre.1.5.95smp/kernel/net/lustre/ko2iblnd.ko): > Unknown symbol in module, or unknown parameter (see dmesg) > Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: Unknown symbol > rdma_resolve_route > Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: disagrees about > version of symbol rdma_bind_addr > Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: Unknown symbol > rdma_bind_addr > Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: disagrees about > version of symbol rdma_create_qp > <<<similar messages are displayed for awhile same as > before>>> > Feb 6 17:03:21 FedoraCore120 kernel: ko2iblnd: disagrees about > version of symbol ib_dealloc_pd > Feb 6 17:03:21 FedoraCore120 kernel: ko2iblnd: Unknown symbol > ib_dealloc_pd > Feb 6 17:03:21 FedoraCore120 kernel: LustreError: > 4753:0:(api-ni.c:1002:lnet_startup_lndnis()) Can''t load LND o2ib, > module ko2iblnd, rc=256 > Feb 6 17:03:21 FedoraCore120 kernel: Lustre: Removed LNI > 172.22.14.120@tcp <mailto:172.22.14.120@tcp> > Feb 6 17:03:21 FedoraCore120 kernel: LustreError: > 4753:0:(events.c:581:ptlrpc_init_portals()) network initialisation failed > > > > ------------------------------------------------------------------------ > *From:* lustre-discuss-bounces@clusterfs.com > [mailto:lustre-discuss-bounces@clusterfs.com] *On Behalf Of *Snider, Tim > *Sent:* Tuesday, February 06, 2007 10:19 AM > *To:* Eric Barton; lustre-discuss@clusterfs.com > *Subject:* RE: [Lustre-discuss] [Lustre-devel] Using Infiniband with > 1.5.95 > > I can successfully ping other servers thru ib using ipoib ip addresses. > Loading lnet or trying to mount a lustre device using o2ib using OFED > 1.1.1 > modprobe lnet generates complaints about symbol versions of ib related > routines. > What versions of the OFED driver (1.0, 1.1, or 1.1.1) are compatible > with Lustre 1.5.95? > > Thanks for the advice. > Tim > > /etc/modprobe.conf > alias eth0 tg3 > alias eth1 tg3 > alias scsi_hostadapter mptbase > alias scsi_hostadapter1 mptscsih > alias usb-controller ohci-hcd > options lnet networks=tcp,o2ib # specify both ethernet and ib > networks for Lustre. > alias ib0 ib_ipoib > alias ib1 ib_ipoib > alias net-pf-27 ib_sdp > > Sample of messages: > Feb 6 14:34:21 FedoraCore121 root: =========start lnet and debug > Feb 6 14:34:27 FedoraCore121 kernel: Lustre: > 2306:0:(module.c:382:init_libcfs_module()) maximum lustre stack 8192 > Feb 6 14:34:46 FedoraCore121 kernel: Lustre: Added LNI > 172.22.14.121@tcp <mailto:172.22.14.121@tcp> [8/256] > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about > version of symbol ib_create_cq > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol > ib_create_cq > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about > version of symbol rdma_resolve_addr > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol > rdma_resolve_addr > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about > version of symbol ib_dereg_mr > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol > ib_dereg_mr > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about > version of symbol rdma_reject > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol > rdma_reject > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about > version of symbol rdma_disconnect > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol > rdma_disconnect > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about > version of symbol rdma_resolve_route > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol > rdma_resolve_route > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about > version of symbol rdma_bind_addr > Feb 6 14:34:46 FedoraCore121 modprobe: FATAL: Error inserting > ko2iblnd > (/lib/modules/2.6.9-42.EL_lustre.1.5.95smp/kernel/net/lustre/ko2iblnd.ko): > Unknown symbol in module, or unknown parameter (see dmesg) > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol > rdma_bind_addr > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about > version of symbol rdma_create_qp > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol > rdma_create_qp > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about > version of symbol ib_destroy_cq > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol > ib_destroy_cq > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about > version of symbol rdma_create_id > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol > rdma_create_id > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about > version of symbol rdma_listen > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol > rdma_listen > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about > version of symbol rdma_destroy_qp > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol > rdma_destroy_qp > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about > version of symbol ib_get_dma_mr > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol > ib_get_dma_mr > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about > version of symbol ib_alloc_pd > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol > ib_alloc_pd > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about > version of symbol rdma_connect > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol > rdma_connect > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about > version of symbol ib_modify_qp > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol > ib_modify_qp > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about > version of symbol rdma_destroy_id > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol > rdma_destroy_id > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about > version of symbol rdma_accept > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol > rdma_accept > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about > version of symbol ib_dealloc_pd > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol > ib_dealloc_pd > Feb 6 14:34:47 FedoraCore121 kernel: Lustre: Removed LNI > 172.22.14.121@tcp <mailto:172.22.14.121@tcp> > Feb 6 14:35:01 FedoraCore121 sendmail[2268]: sql_select option missing > Feb 6 14:35:01 FedoraCore121 sendmail[2268]: auxpropfunc error no > mechanism available > > > ------------------------------------------------------------------------ > *From:* Eric Barton [mailto:eeb@bartonsoftware.com] > *Sent:* Monday, February 05, 2007 10:42 AM > *To:* Snider, Tim; lustre-discuss@clusterfs.com > *Subject:* RE: [Lustre-discuss] [Lustre-devel] Using Infiniband with > 1.5.95 > > Is that OFED 1.1? Does /etc/modprobe.conf contain... > > options lnet networks=o2ib > > ...or the equivalent using ip2nets? If this isn''t clear, please see > the lustre manual for an explanation of network setup. > > Can you bring up lustre networking on the mgs and a client node... > > modprobe lnet; lctl net up > > ...and then check /proc/sys/lnet/nis? It should list the local NIDs > (e.g.... > > <ipoib IP address>@o2ib > 0@lo <mailto:0@lo> > > ...). If that looks OK, run an lnet ping from the client to the MGS... > > lctl ping 182.168.3.3@o2ib <mailto:182.168.3.3@o2ib> > > Please note that by default, network error messages are logged > internally, but are not printed to the console or /var/log/messages, > so it may help to "echo + neterror > /proc/sys/lnet/printk" to enable > verbose network messages while you are debugging connectivity. > > Cheers, > Eric > > ------------------------------------------------------------------------ > *From:* lustre-discuss-bounces@clusterfs.com > [mailto:lustre-discuss-bounces@clusterfs.com] *On Behalf Of > *Snider, Tim > *Sent:* 05 February 2007 2:40 PM > *To:* lustre-discuss@clusterfs.com > *Subject:* [Lustre-discuss] [Lustre-devel] Using Infiniband with > 1.5.95 > > We''re trying to set up a Lustre configuration using infiniband > ipoib with 1.5.95. openib 1.1 (was formally openib gen 2) is > installed. We can successfully ping between the mdt/mgs nad ost > servers using the ipoib address. Lustre fs creation is > "apparently" successfull. Mounting the lustre device fails. > 1. Does 1.5.95 work properly with ipoib? > 2. What is the proper form of mgsnode specification, should > o2ib or openiib be used? > 2.a Should we specify the ipoib address or the adapter/port #? > > The ost command line we''re trying is: > mkfs.lustre --fsname=testfs --mgsnode=192.168.3.3@o2ib > <mailto:--mgsnode=192.168.3.3@o2ib> /dev/sdb1 > > Thanks, > Timothy Snider > Storage Architect > Strategic Planning, Technology and Architecture > > LSI Logic Corporation > 3718 North Rock Road > Wichita, KS 67226 > (316) 636-8736 > _tim.snider@lsi.com <mailto:tim.snider@lsi.com>_ > > > > ------------------------------------------------------------------------ > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss@clusterfs.com > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss >
Snider, Tim
2007-Feb-07 06:12 UTC
[Lustre-discuss] [Lustre-devel] Using Infiniband with 1.5.95
Ok - Can you provide more insight? I''m using the same disto, kernel, and Lustre RPMs on all the servers. Why would modules load on one server but not the others? And a more practical point what target do I build? make make install make rpms? Thx -----Original Message----- From: Nathaniel Rutman [mailto:nathan@clusterfs.com] Sent: Tuesday, February 06, 2007 4:25 PM To: Snider, Tim Cc: Eric Barton; lustre-discuss@clusterfs.com Subject: Re: [Lustre-discuss] [Lustre-devel] Using Infiniband with 1.5.95 This is strictly a compile issue -- Lustre won''t work over o2ib until the ko2iblnd module can load successfully. The default header path the o2iblnd uses is $LINUX/drivers/infiniband - you need to make sure Lustre is compiled against the o2ib/OFED headers that your kernel modules actually use. The ./configure flag for Lustre is: --with-o2ib=path build o2iblnd against path HTH Snider, Tim wrote:> Ok - more details. ipoib itself is working on all servers. there are > ipoib ping utilities that run successfully between all the servers in > the fabric. > I was able to successfully mount on the mdt/mgs after installing > Lustre modules by hand using the force option. > Mounting the OST device still fails. ptlrpc refuses to load manually > with the force option. All kernel / lustre versions are identical > between the servers. > > What am I missing? > > uname -a > Linux FedoraCore120 2.6.9-42.EL_lustre.1.5.95smp #1 SMP Thu > Sep 28 06:36:13 MDT 2006 i686 i686 i386 GNU/Linux [root@FedoraCore120 > mnt]# modprobe -vf ptlrpc > insmod > /lib/modules/2.6.9-42.EL_lustre.1.5.95smp/kernel/fs/lustre/ptlrpc.ko > FATAL: Error inserting ptlrpc >(/lib/modules/2.6.9-42.EL_lustre.1.5.95smp/kernel/fs/lustre/ptlrpc.ko):> Input/output error > /var/log/messages > Feb 6 17:03:20 FedoraCore120 kernel: ptlrpc: no version magic, > tainting kernel. > Feb 6 17:03:20 FedoraCore120 kernel: Lustre: Added LNI > 172.22.14.120@tcp <mailto:172.22.14.120@tcp> [8/256] > Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: disagrees about > version of symbol ib_create_cq > Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: Unknown symbol > ib_create_cq > Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: disagrees about > version of symbol rdma_resolve_addr > Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: Unknown symbol > rdma_resolve_addr > Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: disagrees about > version of symbol ib_dereg_mr > Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: Unknown symbol > ib_dereg_mr > Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: disagrees about > version of symbol rdma_reject > Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: Unknown symbol > rdma_reject > Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: disagrees about > version of symbol rdma_disconnect > Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: Unknown symbol > rdma_disconnect > Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: disagrees about > version of symbol rdma_resolve_route > Feb 6 17:03:20 FedoraCore120 modprobe: FATAL: Error inserting > ko2iblnd >(/lib/modules/2.6.9-42.EL_lustre.1.5.95smp/kernel/net/lustre/ko2iblnd.ko ):> Unknown symbol in module, or unknown parameter (see dmesg) > Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: Unknown symbol > rdma_resolve_route > Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: disagrees about > version of symbol rdma_bind_addr > Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: Unknown symbol > rdma_bind_addr > Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: disagrees about > version of symbol rdma_create_qp > <<<similar messages are displayed for awhile same as > before>>> > Feb 6 17:03:21 FedoraCore120 kernel: ko2iblnd: disagrees about > version of symbol ib_dealloc_pd > Feb 6 17:03:21 FedoraCore120 kernel: ko2iblnd: Unknown symbol > ib_dealloc_pd > Feb 6 17:03:21 FedoraCore120 kernel: LustreError: > 4753:0:(api-ni.c:1002:lnet_startup_lndnis()) Can''t load LND o2ib, > module ko2iblnd, rc=256 > Feb 6 17:03:21 FedoraCore120 kernel: Lustre: Removed LNI > 172.22.14.120@tcp <mailto:172.22.14.120@tcp> > Feb 6 17:03:21 FedoraCore120 kernel: LustreError: > 4753:0:(events.c:581:ptlrpc_init_portals()) network initialisation > failed > > > > ---------------------------------------------------------------------- > -- > *From:* lustre-discuss-bounces@clusterfs.com > [mailto:lustre-discuss-bounces@clusterfs.com] *On Behalf Of *Snider, > Tim > *Sent:* Tuesday, February 06, 2007 10:19 AM > *To:* Eric Barton; lustre-discuss@clusterfs.com > *Subject:* RE: [Lustre-discuss] [Lustre-devel] Using Infiniband with > 1.5.95 > > I can successfully ping other servers thru ib using ipoib ipaddresses.> Loading lnet or trying to mount a lustre device using o2ib using OFED > 1.1.1 > modprobe lnet generates complaints about symbol versions of ib related> routines. > What versions of the OFED driver (1.0, 1.1, or 1.1.1) are compatible > with Lustre 1.5.95? > > Thanks for the advice. > Tim > > /etc/modprobe.conf > alias eth0 tg3 > alias eth1 tg3 > alias scsi_hostadapter mptbase > alias scsi_hostadapter1 mptscsih > alias usb-controller ohci-hcd > options lnet networks=tcp,o2ib # specify both ethernet and ib > networks for Lustre. > alias ib0 ib_ipoib > alias ib1 ib_ipoib > alias net-pf-27 ib_sdp > > Sample of messages: > Feb 6 14:34:21 FedoraCore121 root: =========start lnet and debug > Feb 6 14:34:27 FedoraCore121 kernel: Lustre: > 2306:0:(module.c:382:init_libcfs_module()) maximum lustre stack 8192 > Feb 6 14:34:46 FedoraCore121 kernel: Lustre: Added LNI > 172.22.14.121@tcp <mailto:172.22.14.121@tcp> [8/256] > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about > version of symbol ib_create_cq > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol > ib_create_cq > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about > version of symbol rdma_resolve_addr > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol > rdma_resolve_addr > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about > version of symbol ib_dereg_mr > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol > ib_dereg_mr > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about > version of symbol rdma_reject > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol > rdma_reject > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about > version of symbol rdma_disconnect > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol > rdma_disconnect > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about > version of symbol rdma_resolve_route > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol > rdma_resolve_route > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about > version of symbol rdma_bind_addr > Feb 6 14:34:46 FedoraCore121 modprobe: FATAL: Error inserting > ko2iblnd >(/lib/modules/2.6.9-42.EL_lustre.1.5.95smp/kernel/net/lustre/ko2iblnd.ko ):> Unknown symbol in module, or unknown parameter (see dmesg) > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol > rdma_bind_addr > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about > version of symbol rdma_create_qp > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol > rdma_create_qp > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about > version of symbol ib_destroy_cq > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol > ib_destroy_cq > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about > version of symbol rdma_create_id > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol > rdma_create_id > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about > version of symbol rdma_listen > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol > rdma_listen > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about > version of symbol rdma_destroy_qp > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol > rdma_destroy_qp > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about > version of symbol ib_get_dma_mr > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol > ib_get_dma_mr > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about > version of symbol ib_alloc_pd > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol > ib_alloc_pd > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about > version of symbol rdma_connect > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol > rdma_connect > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about > version of symbol ib_modify_qp > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol > ib_modify_qp > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about > version of symbol rdma_destroy_id > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol > rdma_destroy_id > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about > version of symbol rdma_accept > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol > rdma_accept > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about > version of symbol ib_dealloc_pd > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol > ib_dealloc_pd > Feb 6 14:34:47 FedoraCore121 kernel: Lustre: Removed LNI > 172.22.14.121@tcp <mailto:172.22.14.121@tcp> > Feb 6 14:35:01 FedoraCore121 sendmail[2268]: sql_select optionmissing> Feb 6 14:35:01 FedoraCore121 sendmail[2268]: auxpropfunc error no > mechanism available > > > ---------------------------------------------------------------------- > -- > *From:* Eric Barton [mailto:eeb@bartonsoftware.com] > *Sent:* Monday, February 05, 2007 10:42 AM > *To:* Snider, Tim; lustre-discuss@clusterfs.com > *Subject:* RE: [Lustre-discuss] [Lustre-devel] Using Infiniband with > 1.5.95 > > Is that OFED 1.1? Does /etc/modprobe.conf contain... > > options lnet networks=o2ib > > ...or the equivalent using ip2nets? If this isn''t clear, please see > the lustre manual for an explanation of network setup. > > Can you bring up lustre networking on the mgs and a client node... > > modprobe lnet; lctl net up > > ...and then check /proc/sys/lnet/nis? It should list the local NIDs > (e.g.... > > <ipoib IP address>@o2ib > 0@lo <mailto:0@lo> > > ...). If that looks OK, run an lnet ping from the client to theMGS...> > lctl ping 182.168.3.3@o2ib <mailto:182.168.3.3@o2ib> > > Please note that by default, network error messages are logged > internally, but are not printed to the console or /var/log/messages, > so it may help to "echo + neterror > /proc/sys/lnet/printk" to enable > verbose network messages while you are debugging connectivity. > > Cheers, > Eric > >------------------------------------------------------------------------> *From:* lustre-discuss-bounces@clusterfs.com > [mailto:lustre-discuss-bounces@clusterfs.com] *On Behalf Of > *Snider, Tim > *Sent:* 05 February 2007 2:40 PM > *To:* lustre-discuss@clusterfs.com > *Subject:* [Lustre-discuss] [Lustre-devel] Using Infiniband with > 1.5.95 > > We''re trying to set up a Lustre configuration using infiniband > ipoib with 1.5.95. openib 1.1 (was formally openib gen 2) is > installed. We can successfully ping between the mdt/mgs nad ost > servers using the ipoib address. Lustre fs creation is > "apparently" successfull. Mounting the lustre device fails. > 1. Does 1.5.95 work properly with ipoib? > 2. What is the proper form of mgsnode specification, should > o2ib or openiib be used? > 2.a Should we specify the ipoib address or the adapter/port#?> > The ost command line we''re trying is: > mkfs.lustre --fsname=testfs --mgsnode=192.168.3.3@o2ib > <mailto:--mgsnode=192.168.3.3@o2ib> /dev/sdb1 > > Thanks, > Timothy Snider > Storage Architect > Strategic Planning, Technology and Architecture > > LSI Logic Corporation > 3718 North Rock Road > Wichita, KS 67226 > (316) 636-8736 > _tim.snider@lsi.com <mailto:tim.snider@lsi.com>_ > > > > ---------------------------------------------------------------------- > -- > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss@clusterfs.com > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss >
Nathaniel Rutman
2007-Feb-07 17:40 UTC
[Lustre-discuss] [Lustre-devel] Using Infiniband with 1.5.95
Robin Humble wrote:> Hi Tim, > > On Wed, Feb 07, 2007 at 06:11:39AM -0700, Snider, Tim wrote: > >> Ok - Can you provide more insight? I''m using the same disto, kernel, >> > > I''m using 1.5.97 (beta7) which unlike beta5 has (possibly useless) IB > modules in the RHEL kernel rpms from clusterfs. > > can you please try 1.5.97 and let me know how you go? > >Ah yes, I think the latest RHEL kernels now include IB, in which case we should be compiling and distributing the matching ib lnd -- eeb / scjody do you know more about this?>> I''m using the same disto, kernel, and Lustre RPMs on all the servers. Why would modules load on one >> server but not the others? >>They wouldn''t. If the kernels are identical, the Lustre modules will either load everywhere or get symbol conflicts everywhere. You could probably "make rpm" in the kernel source directory from a working kernel and install it on your non-working nodes.>> And a more practical point what target do I build? >> make >> make install >> make rpms? >>./configure --with-o2ib=/path/to/ib/headers make install should do it.> > I was hoping to avoid all that :-/ hence my previous email. > > it''s not clear to me what''s the best order in which to build/install > new OFED and patch the RHEL kernel build tree with Lustre. something to > keep me amused today. > >I would lustre-patch first, build/install ofed, build the kernel, and then build lustre. [Disclaimer: I''ve never actually done this myself. :( But maybe eeb or scjody can add something here.]> cheers, > robin > > >> Thx >> >> -----Original Message----- >> From: Nathaniel Rutman [mailto:nathan@clusterfs.com] >> Sent: Tuesday, February 06, 2007 4:25 PM >> To: Snider, Tim >> Cc: Eric Barton; lustre-discuss@clusterfs.com >> Subject: Re: [Lustre-discuss] [Lustre-devel] Using Infiniband with >> 1.5.95 >> >> This is strictly a compile issue -- Lustre won''t work over o2ib until >> the ko2iblnd module can load successfully. >> The default header path the o2iblnd uses is $LINUX/drivers/infiniband - >> you need to make sure Lustre is compiled against the o2ib/OFED headers >> that your kernel modules actually use. The ./configure flag for Lustre >> is: >> --with-o2ib=path build o2iblnd against path >> HTH >> >> >> Snider, Tim wrote: >> >>> Ok - more details. ipoib itself is working on all servers. there are >>> ipoib ping utilities that run successfully between all the servers in >>> the fabric. >>> I was able to successfully mount on the mdt/mgs after installing >>> Lustre modules by hand using the force option. >>> Mounting the OST device still fails. ptlrpc refuses to load manually >>> with the force option. All kernel / lustre versions are identical >>> between the servers. >>> >>> What am I missing? >>> >>> uname -a >>> Linux FedoraCore120 2.6.9-42.EL_lustre.1.5.95smp #1 SMP Thu >>> Sep 28 06:36:13 MDT 2006 i686 i686 i386 GNU/Linux [root@FedoraCore120 >>> mnt]# modprobe -vf ptlrpc >>> insmod >>> /lib/modules/2.6.9-42.EL_lustre.1.5.95smp/kernel/fs/lustre/ptlrpc.ko >>> FATAL: Error inserting ptlrpc >>> >>> >> (/lib/modules/2.6.9-42.EL_lustre.1.5.95smp/kernel/fs/lustre/ptlrpc.ko): >> >>> Input/output error >>> /var/log/messages >>> Feb 6 17:03:20 FedoraCore120 kernel: ptlrpc: no version magic, >>> tainting kernel. >>> Feb 6 17:03:20 FedoraCore120 kernel: Lustre: Added LNI >>> 172.22.14.120@tcp <mailto:172.22.14.120@tcp> [8/256] >>> Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: disagrees about >>> version of symbol ib_create_cq >>> Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: Unknown symbol >>> ib_create_cq >>> Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: disagrees about >>> version of symbol rdma_resolve_addr >>> Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: Unknown symbol >>> rdma_resolve_addr >>> Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: disagrees about >>> version of symbol ib_dereg_mr >>> Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: Unknown symbol >>> ib_dereg_mr >>> Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: disagrees about >>> version of symbol rdma_reject >>> Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: Unknown symbol >>> rdma_reject >>> Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: disagrees about >>> version of symbol rdma_disconnect >>> Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: Unknown symbol >>> rdma_disconnect >>> Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: disagrees about >>> version of symbol rdma_resolve_route >>> Feb 6 17:03:20 FedoraCore120 modprobe: FATAL: Error inserting >>> ko2iblnd >>> >>> >> (/lib/modules/2.6.9-42.EL_lustre.1.5.95smp/kernel/net/lustre/ko2iblnd.ko >> ): >> >>> Unknown symbol in module, or unknown parameter (see dmesg) >>> Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: Unknown symbol >>> rdma_resolve_route >>> Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: disagrees about >>> version of symbol rdma_bind_addr >>> Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: Unknown symbol >>> rdma_bind_addr >>> Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: disagrees about >>> version of symbol rdma_create_qp >>> <<<similar messages are displayed for awhile same as >>> before>>> >>> Feb 6 17:03:21 FedoraCore120 kernel: ko2iblnd: disagrees about >>> version of symbol ib_dealloc_pd >>> Feb 6 17:03:21 FedoraCore120 kernel: ko2iblnd: Unknown symbol >>> ib_dealloc_pd >>> Feb 6 17:03:21 FedoraCore120 kernel: LustreError: >>> 4753:0:(api-ni.c:1002:lnet_startup_lndnis()) Can''t load LND o2ib, >>> module ko2iblnd, rc=256 >>> Feb 6 17:03:21 FedoraCore120 kernel: Lustre: Removed LNI >>> 172.22.14.120@tcp <mailto:172.22.14.120@tcp> >>> Feb 6 17:03:21 FedoraCore120 kernel: LustreError: >>> 4753:0:(events.c:581:ptlrpc_init_portals()) network initialisation >>> failed >>> >>> >>> >>> ---------------------------------------------------------------------- >>> -- >>> *From:* lustre-discuss-bounces@clusterfs.com >>> [mailto:lustre-discuss-bounces@clusterfs.com] *On Behalf Of *Snider, >>> Tim >>> *Sent:* Tuesday, February 06, 2007 10:19 AM >>> *To:* Eric Barton; lustre-discuss@clusterfs.com >>> *Subject:* RE: [Lustre-discuss] [Lustre-devel] Using Infiniband with >>> 1.5.95 >>> >>> I can successfully ping other servers thru ib using ipoib ip >>> >> addresses. >> >>> Loading lnet or trying to mount a lustre device using o2ib using OFED >>> 1.1.1 >>> modprobe lnet generates complaints about symbol versions of ib related >>> >>> routines. >>> What versions of the OFED driver (1.0, 1.1, or 1.1.1) are compatible >>> with Lustre 1.5.95? >>> >>> Thanks for the advice. >>> Tim >>> >>> /etc/modprobe.conf >>> alias eth0 tg3 >>> alias eth1 tg3 >>> alias scsi_hostadapter mptbase >>> alias scsi_hostadapter1 mptscsih >>> alias usb-controller ohci-hcd >>> options lnet networks=tcp,o2ib # specify both ethernet and ib >>> networks for Lustre. >>> alias ib0 ib_ipoib >>> alias ib1 ib_ipoib >>> alias net-pf-27 ib_sdp >>> >>> Sample of messages: >>> Feb 6 14:34:21 FedoraCore121 root: =========start lnet and debug >>> Feb 6 14:34:27 FedoraCore121 kernel: Lustre: >>> 2306:0:(module.c:382:init_libcfs_module()) maximum lustre stack 8192 >>> Feb 6 14:34:46 FedoraCore121 kernel: Lustre: Added LNI >>> 172.22.14.121@tcp <mailto:172.22.14.121@tcp> [8/256] >>> Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about >>> version of symbol ib_create_cq >>> Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol >>> ib_create_cq >>> Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about >>> version of symbol rdma_resolve_addr >>> Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol >>> rdma_resolve_addr >>> Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about >>> version of symbol ib_dereg_mr >>> Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol >>> ib_dereg_mr >>> Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about >>> version of symbol rdma_reject >>> Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol >>> rdma_reject >>> Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about >>> version of symbol rdma_disconnect >>> Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol >>> rdma_disconnect >>> Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about >>> version of symbol rdma_resolve_route >>> Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol >>> rdma_resolve_route >>> Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about >>> version of symbol rdma_bind_addr >>> Feb 6 14:34:46 FedoraCore121 modprobe: FATAL: Error inserting >>> ko2iblnd >>> >>> >> (/lib/modules/2.6.9-42.EL_lustre.1.5.95smp/kernel/net/lustre/ko2iblnd.ko >> ): >> >>> Unknown symbol in module, or unknown parameter (see dmesg) >>> Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol >>> rdma_bind_addr >>> Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about >>> version of symbol rdma_create_qp >>> Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol >>> rdma_create_qp >>> Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about >>> version of symbol ib_destroy_cq >>> Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol >>> ib_destroy_cq >>> Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about >>> version of symbol rdma_create_id >>> Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol >>> rdma_create_id >>> Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about >>> version of symbol rdma_listen >>> Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol >>> rdma_listen >>> Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about >>> version of symbol rdma_destroy_qp >>> Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol >>> rdma_destroy_qp >>> Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about >>> version of symbol ib_get_dma_mr >>> Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol >>> ib_get_dma_mr >>> Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about >>> version of symbol ib_alloc_pd >>> Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol >>> ib_alloc_pd >>> Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about >>> version of symbol rdma_connect >>> Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol >>> rdma_connect >>> Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about >>> version of symbol ib_modify_qp >>> Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol >>> ib_modify_qp >>> Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about >>> version of symbol rdma_destroy_id >>> Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol >>> rdma_destroy_id >>> Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about >>> version of symbol rdma_accept >>> Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol >>> rdma_accept >>> Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about >>> version of symbol ib_dealloc_pd >>> Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol >>> ib_dealloc_pd >>> Feb 6 14:34:47 FedoraCore121 kernel: Lustre: Removed LNI >>> 172.22.14.121@tcp <mailto:172.22.14.121@tcp> >>> Feb 6 14:35:01 FedoraCore121 sendmail[2268]: sql_select option >>> >> missing >> >>> Feb 6 14:35:01 FedoraCore121 sendmail[2268]: auxpropfunc error no >>> mechanism available >>> >>> >>> ---------------------------------------------------------------------- >>> -- >>> *From:* Eric Barton [mailto:eeb@bartonsoftware.com] >>> *Sent:* Monday, February 05, 2007 10:42 AM >>> *To:* Snider, Tim; lustre-discuss@clusterfs.com >>> *Subject:* RE: [Lustre-discuss] [Lustre-devel] Using Infiniband with >>> 1.5.95 >>> >>> Is that OFED 1.1? Does /etc/modprobe.conf contain... >>> >>> options lnet networks=o2ib >>> >>> ...or the equivalent using ip2nets? If this isn''t clear, please see >>> the lustre manual for an explanation of network setup. >>> >>> Can you bring up lustre networking on the mgs and a client node... >>> >>> modprobe lnet; lctl net up >>> >>> ...and then check /proc/sys/lnet/nis? It should list the local NIDs >>> (e.g.... >>> >>> <ipoib IP address>@o2ib >>> 0@lo <mailto:0@lo> >>> >>> ...). If that looks OK, run an lnet ping from the client to the >>> >> MGS... >> >>> >>> lctl ping 182.168.3.3@o2ib <mailto:182.168.3.3@o2ib> >>> >>> Please note that by default, network error messages are logged >>> internally, but are not printed to the console or /var/log/messages, >>> so it may help to "echo + neterror > /proc/sys/lnet/printk" to enable >>> verbose network messages while you are debugging connectivity. >>> >>> Cheers, >>> Eric >>> >>> >>> >> ------------------------------------------------------------------------ >> >>> *From:* lustre-discuss-bounces@clusterfs.com >>> [mailto:lustre-discuss-bounces@clusterfs.com] *On Behalf Of >>> *Snider, Tim >>> *Sent:* 05 February 2007 2:40 PM >>> *To:* lustre-discuss@clusterfs.com >>> *Subject:* [Lustre-discuss] [Lustre-devel] Using Infiniband with >>> 1.5.95 >>> >>> We''re trying to set up a Lustre configuration using infiniband >>> ipoib with 1.5.95. openib 1.1 (was formally openib gen 2) is >>> installed. We can successfully ping between the mdt/mgs nad ost >>> servers using the ipoib address. Lustre fs creation is >>> "apparently" successfull. Mounting the lustre device fails. >>> 1. Does 1.5.95 work properly with ipoib? >>> 2. What is the proper form of mgsnode specification, should >>> o2ib or openiib be used? >>> 2.a Should we specify the ipoib address or the adapter/port >>> >> #? >> >>> >>> The ost command line we''re trying is: >>> mkfs.lustre --fsname=testfs --mgsnode=192.168.3.3@o2ib >>> <mailto:--mgsnode=192.168.3.3@o2ib> /dev/sdb1 >>> >>> Thanks, >>> Timothy Snider >>> Storage Architect >>> Strategic Planning, Technology and Architecture >>> >>> LSI Logic Corporation >>> 3718 North Rock Road >>> Wichita, KS 67226 >>> (316) 636-8736 >>> _tim.snider@lsi.com <mailto:tim.snider@lsi.com>_ >>> >>> >>> >>> ---------------------------------------------------------------------- >>> -- >>> >>> _______________________________________________ >>> Lustre-discuss mailing list >>> Lustre-discuss@clusterfs.com >>> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss >>> >>> >> _______________________________________________ >> Lustre-discuss mailing list >> Lustre-discuss@clusterfs.com >> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss >> >> > >
Robin Humble
2007-Feb-07 17:44 UTC
[Lustre-discuss] [Lustre-devel] Using Infiniband with 1.5.95
Hi Tim, On Wed, Feb 07, 2007 at 06:11:39AM -0700, Snider, Tim wrote:>Ok - Can you provide more insight? I''m using the same disto, kernel,I''m using 1.5.97 (beta7) which unlike beta5 has (possibly useless) IB modules in the RHEL kernel rpms from clusterfs. can you please try 1.5.97 and let me know how you go?>and Lustre RPMs on all the servers. Why would modules load on one >server but not the others? >And a more practical point what target do I build? >make >make install >make rpms?I was hoping to avoid all that :-/ hence my previous email. it''s not clear to me what''s the best order in which to build/install new OFED and patch the RHEL kernel build tree with Lustre. something to keep me amused today. cheers, robin>Thx > >-----Original Message----- >From: Nathaniel Rutman [mailto:nathan@clusterfs.com] >Sent: Tuesday, February 06, 2007 4:25 PM >To: Snider, Tim >Cc: Eric Barton; lustre-discuss@clusterfs.com >Subject: Re: [Lustre-discuss] [Lustre-devel] Using Infiniband with >1.5.95 > >This is strictly a compile issue -- Lustre won''t work over o2ib until >the ko2iblnd module can load successfully. >The default header path the o2iblnd uses is $LINUX/drivers/infiniband - >you need to make sure Lustre is compiled against the o2ib/OFED headers >that your kernel modules actually use. The ./configure flag for Lustre >is: > --with-o2ib=path build o2iblnd against path >HTH > > >Snider, Tim wrote: >> Ok - more details. ipoib itself is working on all servers. there are >> ipoib ping utilities that run successfully between all the servers in >> the fabric. >> I was able to successfully mount on the mdt/mgs after installing >> Lustre modules by hand using the force option. >> Mounting the OST device still fails. ptlrpc refuses to load manually >> with the force option. All kernel / lustre versions are identical >> between the servers. >> >> What am I missing? >> >> uname -a >> Linux FedoraCore120 2.6.9-42.EL_lustre.1.5.95smp #1 SMP Thu >> Sep 28 06:36:13 MDT 2006 i686 i686 i386 GNU/Linux [root@FedoraCore120 >> mnt]# modprobe -vf ptlrpc >> insmod >> /lib/modules/2.6.9-42.EL_lustre.1.5.95smp/kernel/fs/lustre/ptlrpc.ko >> FATAL: Error inserting ptlrpc >> >(/lib/modules/2.6.9-42.EL_lustre.1.5.95smp/kernel/fs/lustre/ptlrpc.ko): >> Input/output error >> /var/log/messages >> Feb 6 17:03:20 FedoraCore120 kernel: ptlrpc: no version magic, >> tainting kernel. >> Feb 6 17:03:20 FedoraCore120 kernel: Lustre: Added LNI >> 172.22.14.120@tcp <mailto:172.22.14.120@tcp> [8/256] >> Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: disagrees about >> version of symbol ib_create_cq >> Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: Unknown symbol >> ib_create_cq >> Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: disagrees about >> version of symbol rdma_resolve_addr >> Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: Unknown symbol >> rdma_resolve_addr >> Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: disagrees about >> version of symbol ib_dereg_mr >> Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: Unknown symbol >> ib_dereg_mr >> Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: disagrees about >> version of symbol rdma_reject >> Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: Unknown symbol >> rdma_reject >> Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: disagrees about >> version of symbol rdma_disconnect >> Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: Unknown symbol >> rdma_disconnect >> Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: disagrees about >> version of symbol rdma_resolve_route >> Feb 6 17:03:20 FedoraCore120 modprobe: FATAL: Error inserting >> ko2iblnd >> >(/lib/modules/2.6.9-42.EL_lustre.1.5.95smp/kernel/net/lustre/ko2iblnd.ko >): >> Unknown symbol in module, or unknown parameter (see dmesg) >> Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: Unknown symbol >> rdma_resolve_route >> Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: disagrees about >> version of symbol rdma_bind_addr >> Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: Unknown symbol >> rdma_bind_addr >> Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: disagrees about >> version of symbol rdma_create_qp >> <<<similar messages are displayed for awhile same as >> before>>> >> Feb 6 17:03:21 FedoraCore120 kernel: ko2iblnd: disagrees about >> version of symbol ib_dealloc_pd >> Feb 6 17:03:21 FedoraCore120 kernel: ko2iblnd: Unknown symbol >> ib_dealloc_pd >> Feb 6 17:03:21 FedoraCore120 kernel: LustreError: >> 4753:0:(api-ni.c:1002:lnet_startup_lndnis()) Can''t load LND o2ib, >> module ko2iblnd, rc=256 >> Feb 6 17:03:21 FedoraCore120 kernel: Lustre: Removed LNI >> 172.22.14.120@tcp <mailto:172.22.14.120@tcp> >> Feb 6 17:03:21 FedoraCore120 kernel: LustreError: >> 4753:0:(events.c:581:ptlrpc_init_portals()) network initialisation >> failed >> >> >> >> ---------------------------------------------------------------------- >> -- >> *From:* lustre-discuss-bounces@clusterfs.com >> [mailto:lustre-discuss-bounces@clusterfs.com] *On Behalf Of *Snider, >> Tim >> *Sent:* Tuesday, February 06, 2007 10:19 AM >> *To:* Eric Barton; lustre-discuss@clusterfs.com >> *Subject:* RE: [Lustre-discuss] [Lustre-devel] Using Infiniband with >> 1.5.95 >> >> I can successfully ping other servers thru ib using ipoib ip >addresses. >> Loading lnet or trying to mount a lustre device using o2ib using OFED >> 1.1.1 >> modprobe lnet generates complaints about symbol versions of ib related > >> routines. >> What versions of the OFED driver (1.0, 1.1, or 1.1.1) are compatible >> with Lustre 1.5.95? >> >> Thanks for the advice. >> Tim >> >> /etc/modprobe.conf >> alias eth0 tg3 >> alias eth1 tg3 >> alias scsi_hostadapter mptbase >> alias scsi_hostadapter1 mptscsih >> alias usb-controller ohci-hcd >> options lnet networks=tcp,o2ib # specify both ethernet and ib >> networks for Lustre. >> alias ib0 ib_ipoib >> alias ib1 ib_ipoib >> alias net-pf-27 ib_sdp >> >> Sample of messages: >> Feb 6 14:34:21 FedoraCore121 root: =========start lnet and debug >> Feb 6 14:34:27 FedoraCore121 kernel: Lustre: >> 2306:0:(module.c:382:init_libcfs_module()) maximum lustre stack 8192 >> Feb 6 14:34:46 FedoraCore121 kernel: Lustre: Added LNI >> 172.22.14.121@tcp <mailto:172.22.14.121@tcp> [8/256] >> Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about >> version of symbol ib_create_cq >> Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol >> ib_create_cq >> Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about >> version of symbol rdma_resolve_addr >> Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol >> rdma_resolve_addr >> Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about >> version of symbol ib_dereg_mr >> Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol >> ib_dereg_mr >> Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about >> version of symbol rdma_reject >> Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol >> rdma_reject >> Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about >> version of symbol rdma_disconnect >> Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol >> rdma_disconnect >> Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about >> version of symbol rdma_resolve_route >> Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol >> rdma_resolve_route >> Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about >> version of symbol rdma_bind_addr >> Feb 6 14:34:46 FedoraCore121 modprobe: FATAL: Error inserting >> ko2iblnd >> >(/lib/modules/2.6.9-42.EL_lustre.1.5.95smp/kernel/net/lustre/ko2iblnd.ko >): >> Unknown symbol in module, or unknown parameter (see dmesg) >> Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol >> rdma_bind_addr >> Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about >> version of symbol rdma_create_qp >> Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol >> rdma_create_qp >> Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about >> version of symbol ib_destroy_cq >> Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol >> ib_destroy_cq >> Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about >> version of symbol rdma_create_id >> Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol >> rdma_create_id >> Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about >> version of symbol rdma_listen >> Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol >> rdma_listen >> Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about >> version of symbol rdma_destroy_qp >> Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol >> rdma_destroy_qp >> Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about >> version of symbol ib_get_dma_mr >> Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol >> ib_get_dma_mr >> Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about >> version of symbol ib_alloc_pd >> Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol >> ib_alloc_pd >> Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about >> version of symbol rdma_connect >> Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol >> rdma_connect >> Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about >> version of symbol ib_modify_qp >> Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol >> ib_modify_qp >> Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about >> version of symbol rdma_destroy_id >> Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol >> rdma_destroy_id >> Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about >> version of symbol rdma_accept >> Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol >> rdma_accept >> Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about >> version of symbol ib_dealloc_pd >> Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol >> ib_dealloc_pd >> Feb 6 14:34:47 FedoraCore121 kernel: Lustre: Removed LNI >> 172.22.14.121@tcp <mailto:172.22.14.121@tcp> >> Feb 6 14:35:01 FedoraCore121 sendmail[2268]: sql_select option >missing >> Feb 6 14:35:01 FedoraCore121 sendmail[2268]: auxpropfunc error no >> mechanism available >> >> >> ---------------------------------------------------------------------- >> -- >> *From:* Eric Barton [mailto:eeb@bartonsoftware.com] >> *Sent:* Monday, February 05, 2007 10:42 AM >> *To:* Snider, Tim; lustre-discuss@clusterfs.com >> *Subject:* RE: [Lustre-discuss] [Lustre-devel] Using Infiniband with >> 1.5.95 >> >> Is that OFED 1.1? Does /etc/modprobe.conf contain... >> >> options lnet networks=o2ib >> >> ...or the equivalent using ip2nets? If this isn''t clear, please see >> the lustre manual for an explanation of network setup. >> >> Can you bring up lustre networking on the mgs and a client node... >> >> modprobe lnet; lctl net up >> >> ...and then check /proc/sys/lnet/nis? It should list the local NIDs >> (e.g.... >> >> <ipoib IP address>@o2ib >> 0@lo <mailto:0@lo> >> >> ...). If that looks OK, run an lnet ping from the client to the >MGS... >> >> lctl ping 182.168.3.3@o2ib <mailto:182.168.3.3@o2ib> >> >> Please note that by default, network error messages are logged >> internally, but are not printed to the console or /var/log/messages, >> so it may help to "echo + neterror > /proc/sys/lnet/printk" to enable >> verbose network messages while you are debugging connectivity. >> >> Cheers, >> Eric >> >> >------------------------------------------------------------------------ >> *From:* lustre-discuss-bounces@clusterfs.com >> [mailto:lustre-discuss-bounces@clusterfs.com] *On Behalf Of >> *Snider, Tim >> *Sent:* 05 February 2007 2:40 PM >> *To:* lustre-discuss@clusterfs.com >> *Subject:* [Lustre-discuss] [Lustre-devel] Using Infiniband with >> 1.5.95 >> >> We''re trying to set up a Lustre configuration using infiniband >> ipoib with 1.5.95. openib 1.1 (was formally openib gen 2) is >> installed. We can successfully ping between the mdt/mgs nad ost >> servers using the ipoib address. Lustre fs creation is >> "apparently" successfull. Mounting the lustre device fails. >> 1. Does 1.5.95 work properly with ipoib? >> 2. What is the proper form of mgsnode specification, should >> o2ib or openiib be used? >> 2.a Should we specify the ipoib address or the adapter/port >#? >> >> The ost command line we''re trying is: >> mkfs.lustre --fsname=testfs --mgsnode=192.168.3.3@o2ib >> <mailto:--mgsnode=192.168.3.3@o2ib> /dev/sdb1 >> >> Thanks, >> Timothy Snider >> Storage Architect >> Strategic Planning, Technology and Architecture >> >> LSI Logic Corporation >> 3718 North Rock Road >> Wichita, KS 67226 >> (316) 636-8736 >> _tim.snider@lsi.com <mailto:tim.snider@lsi.com>_ >> >> >> >> ---------------------------------------------------------------------- >> -- >> >> _______________________________________________ >> Lustre-discuss mailing list >> Lustre-discuss@clusterfs.com >> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss >> > >_______________________________________________ >Lustre-discuss mailing list >Lustre-discuss@clusterfs.com >https://mail.clusterfs.com/mailman/listinfo/lustre-discuss >