Hi, I''m experiencing a problem while loading up lustre''s o2ib module as shown below. I''ve now tried to compile this on 2 separate machines and experiences the same issues on both. I''m using SLES10 2.6.16.21-0.8-smp kernel and OFED-1.1-rc6. ofed and lustre-1.5.95 were all compiled for the same kernel. The lustre modules load up fine except the o2ib module which complains about symbol version issue. I have only one ib_core and a single ko2iblnd module in /lib/modules/2.6.16.21-0.8-smp/ I''m pretty sure i''ve done everything correctly and cannot see where i''ve missed something. I''ve included the modinfo output for ib_core.ko and ko2iblnd.ko below. The version magic seems the same. Has anybody from CFS tried compiling lustre-1.5.95 with OFED-1.1rc6 on SLES10 on smp kernel ? I''m very grateful for any help as i''ve now spent several days on this and cannot see the solution. to me this seems a bug with ko2iblnd.ko as other lustre modules load up fine?> ko2iblnd: disagrees about version of symbol ib_create_cq > ko2iblnd: Unknown symbol ib_create_cq > ko2iblnd: disagrees about version of symbol ib_dereg_mr > ko2iblnd: Unknown symbol ib_dereg_mr > ko2iblnd: disagrees about version of symbol ib_destroy_cq > ko2iblnd: Unknown symbol ib_destroy_cq > ko2iblnd: disagrees about version of symbol ib_get_dma_mr > ko2iblnd: Unknown symbol ib_get_dma_mr > ko2iblnd: disagrees about version of symbol ib_alloc_pd > ko2iblnd: Unknown symbol ib_alloc_pd > ko2iblnd: disagrees about version of symbol ib_modify_qp > ko2iblnd: Unknown symbol ib_modify_qp > ko2iblnd: disagrees about version of symbol ib_dealloc_pd > ko2iblnd: Unknown symbol ib_dealloc_pd > LustreError: 4177:0:(api-ni.c:1002:lnet_startup_lndnis()) Can''t load LND > o2ib, module ko2iblnd, rc=256uname -a Linux n32 2.6.16.21-0.8-smp #4 SMP Sun Sep 24 08:47:30 BST 2006 i686 i686 i386 GNU/Linux n32:~/lustre-1.5.95 # modinfo /lib/modules/2.6.16.21-0.8-smp/kernel/net/lustre/ksocklnd.ko filename: /lib/modules/2.6.16.21-0.8-smp/kernel/net/lustre/ksocklnd.ko author: Cluster File Systems, Inc. <info@clusterfs.com> description: Kernel TCP Socket LND v1.0.0 license: GPL vermagic: 2.6.16.21-0.8-smp SMP 586 REGPARM gcc-4.1 depends: libcfs,lnet srcversion: 34574F310009463D60F6050 parm: zc_min_frag:minimum fragment to zero copy (int) parm: enable_irq_affinity:enable IRQ affinity (int) parm: keepalive_intvl:seconds between probes (int) parm: keepalive_count:# missed probes == dead (int) parm: keepalive_idle:# idle seconds before probe (int) parm: nagle:enable NAGLE? (int) parm: rx_buffer_size:socket rx buffer size (int) parm: tx_buffer_size:socket tx buffer size (int) parm: min_bulk:smallest ''large'' message (int) parm: typed_conns:use different sockets for bulk (int) parm: eager_ack:send tcp ack packets eagerly (int) parm: max_reconnectms:max connection retry interval (mS) (int) parm: min_reconnectms:min connection retry interval (mS) (int) parm: nconnds:# connection daemons (int) parm: peer_credits:# concurrent sends to 1 peer (int) parm: credits:# concurrent sends (int) parm: sock_timeout:dead socket timeout (seconds) (int) n32:~/lustre-1.5.95 # modinfo /lib/modules/2.6.16.21-0.8-smp/kernel/drivers/infiniband/core/ib_core.ko filename: /lib/modules/2.6.16.21-0.8-smp/kernel/drivers/infiniband/core/ib_core.ko author: Roland Dreier description: core kernel InfiniBand API license: Dual BSD/GPL vermagic: 2.6.16.21-0.8-smp SMP 586 REGPARM gcc-4.1 depends: srcversion: EFC33E22916E5965C4D4B51 n32:~/lustre-1.5.95 # find /lib/modules/2.6.16.21-0.8-smp/ -name \*o2ib\* /lib/modules/2.6.16.21-0.8-smp/kernel/net/lustre/ko2iblnd.ko n32:~/lustre-1.5.95 # find /lib/modules/2.6.16.21-0.8-smp/kernel/drivers/infiniband/ /lib/modules/2.6.16.21-0.8-smp/kernel/drivers/infiniband/ /lib/modules/2.6.16.21-0.8-smp/kernel/drivers/infiniband/hw /lib/modules/2.6.16.21-0.8-smp/kernel/drivers/infiniband/hw/mthca /lib/modules/2.6.16.21-0.8-smp/kernel/drivers/infiniband/hw/mthca/ib_mthca.ko /lib/modules/2.6.16.21-0.8-smp/kernel/drivers/infiniband/ulp /lib/modules/2.6.16.21-0.8-smp/kernel/drivers/infiniband/ulp/ipoib /lib/modules/2.6.16.21-0.8-smp/kernel/drivers/infiniband/ulp/ipoib/ib_ipoib.ko /lib/modules/2.6.16.21-0.8-smp/kernel/drivers/infiniband/core /lib/modules/2.6.16.21-0.8-smp/kernel/drivers/infiniband/core/ib_core.ko /lib/modules/2.6.16.21-0.8-smp/kernel/drivers/infiniband/core/rdma_cm.ko /lib/modules/2.6.16.21-0.8-smp/kernel/drivers/infiniband/core/ib_addr.ko /lib/modules/2.6.16.21-0.8-smp/kernel/drivers/infiniband/core/ib_umad.ko /lib/modules/2.6.16.21-0.8-smp/kernel/drivers/infiniband/core/ib_uverbs.ko /lib/modules/2.6.16.21-0.8-smp/kernel/drivers/infiniband/core/ib_mad.ko /lib/modules/2.6.16.21-0.8-smp/kernel/drivers/infiniband/core/ib_ucm.ko /lib/modules/2.6.16.21-0.8-smp/kernel/drivers/infiniband/core/ib_cm.ko /lib/modules/2.6.16.21-0.8-smp/kernel/drivers/infiniband/core/ib_sa.ko /lib/modules/2.6.16.21-0.8-smp/kernel/drivers/infiniband/core/rdma_ucm.ko Thanks, Thierry. ---------------------------------------- Dr Thierry DELAITRE Systems and Services Manager, CSCS University of Westminster 115 New Cavendish Street, London W1W 6UW Tel: 020 7911 5000 ext: 3586 Fax: 020 7911 5089 Mobile short dial code 1788 http://www.cscs.wmin.ac.uk/~delaitt ---------------------------------------- This e-mail and its attachments are intended for the above named only and may be confidential. If they have come to you in error you must not copy or show them to anyone, nor should you take any action based on them, other than to notify the error by replying to the sender.
Thierry Delaitre wrote:> Hi, > > I''m experiencing a problem while loading up lustre''s o2ib module as shown > below. I''ve now tried to compile this on 2 separate machines and > experiences the same issues on both. I''m using SLES10 2.6.16.21-0.8-smp > kernel and OFED-1.1-rc6. ofed and lustre-1.5.95 were all compiled for the > same kernel. The lustre modules load up fine except the o2ib module which > complains about symbol version issue. I have only one ib_core and a single > ko2iblnd module in /lib/modules/2.6.16.21-0.8-smp/Hi Thierry, Please refer to this thread from July where Makia Minich experienced the same problem and gave a temporary workaround: https://mail.clusterfs.com/pipermail/lustre-discuss/2006-July/001834.html -- | David Vasil <dmvasil@ornl.gov> | Oak Ridge National Laboratory NCCS Division | High Performance Computing Systems Administrator
On Tue, 26 Sep 2006, David Vasil wrote:> Thierry Delaitre wrote: > > Hi, > > > > I''m experiencing a problem while loading up lustre''s o2ib module as shown > > below. I''ve now tried to compile this on 2 separate machines and > > experiences the same issues on both. I''m using SLES10 2.6.16.21-0.8-smp > > kernel and OFED-1.1-rc6. ofed and lustre-1.5.95 were all compiled for the > > same kernel. The lustre modules load up fine except the o2ib module which > > complains about symbol version issue. I have only one ib_core and a single > > ko2iblnd module in /lib/modules/2.6.16.21-0.8-smp/ > > Hi Thierry, > Please refer to this thread from July where Makia Minich experienced the > same problem and gave a temporary workaround: > > https://mail.clusterfs.com/pipermail/lustre-discuss/2006-July/001834.htmlHi David, I had tried it but i get an horible exception if i set the modversion to ''n'' n32:~/lustre-1.5.95 # modprobe lnet n32:~/lustre-1.5.95 # lctl network up Segmentation fault Unable to handle kernel NULL pointer dereference at virtual address 00000000 printing eip:00000000 *pde = 00000000 Oops: 0000 [#1] SMP last sysfs file: /class/net/ib1/flags Modules linked in: ko2iblnd lnet libcfs rdma_ucm rdma_cm ib_addr ib_cm af_packet ib_ipoib ib_sa ib_uverbs ib_umad ib_mthca ib_mad ib_core nfs lockd nfs_acl sunrpc ipv6 button battery ac apparmor aamatch_pcre loop dm_mod e1000 reiserfs edd fan thermal processor sg aic79xx scsi_transport_spi piix sd_mod scsi_mod ide_disk ide_core CPU: 0 EIP: 0060:[<00000000>] Tainted: G U VLI EFLAGS: 00010293 (2.6.16.21-0.8-smp #6) EIP is at _stext+0x3feffd40/0x29 eax: cf13b280 ebx: dde94580 ecx: df821000 edx: 00000001 esi: df526600 edi: cf13b280 ebp: cecbda90 esp: cecbda7c ds: 007b es: 007b ss: 0068 Process lctl (pid: 9495, threadinfo=cecbc000 task=ddde8030) Stack: <0>e1c39e7a cecbdab0 c0582340 c0582340 00000000 00306269 00000000 00000001 69326f6b 646e6c62 cc1da200 cecbdb08 00000000 db030002 30534aa1 00000000 00000000 451963f3 000cf402 00000001 ffffff00 a14a5330 01000000 00000001 Call Trace: [<e1c39e7a>] kiblnd_startup+0xc4d/0xe93 [ko2iblnd] [<e1c04af9>] lnet_startup_lndnis+0x187/0x6cf [lnet] [<e1c05759>] LNetNIInit+0x108/0x1bf [lnet] [<e0be7376>] libcfs_ioctl+0x0/0x6da [libcfs] [<e1c1316a>] lnet_configure+0x22/0x44 [lnet] [<e0be7963>] libcfs_ioctl+0x5ed/0x6da [libcfs] [<e093027b>] reiserfs_async_progress_wait+0x21/0x6c [reiserfs] [<e092a0ec>] pathrelse+0x18/0x2b [reiserfs] [<c01308d7>] autoremove_wake_function+0x0/0x2d [<e0930dfc>] do_journal_end+0xb36/0xb5f [reiserfs] [<c015d3bd>] __find_get_block+0x17b/0x185 [<c01733f9>] mntput_no_expire+0x12/0xaf [<c0168f4b>] link_path_walk+0xb3/0xbd [<c015d3ee>] __getblk+0x27/0x229 [<c015dc34>] ll_rw_block+0x7f/0x8e [<e0934161>] xattr_lookup_poison+0x52/0x5f [reiserfs] [<c016f6be>] __d_lookup+0x96/0xd9 [<c0166c3a>] do_lookup+0x3c/0x7a [<c016f993>] dput+0x1a/0x118 [<c0168d3a>] __link_path_walk+0xd98/0xef6 [<c0141616>] find_get_page+0x18/0x38 [<c015d078>] __find_get_block_slow+0xfe/0x107 [<e093027b>] reiserfs_async_progress_wait+0x21/0x6c [reiserfs] [<e092a0ec>] pathrelse+0x18/0x2b [reiserfs] [<c01308d7>] autoremove_wake_function+0x0/0x2d [<e0930dfc>] do_journal_end+0xb36/0xb5f [reiserfs] [<c01733f9>] mntput_no_expire+0x12/0xaf [<c0168f4b>] link_path_walk+0xb3/0xbd [<c0169276>] do_path_lookup+0x1df/0x242 [<c015794e>] shmem_permission+0x0/0xa [<c0166dd3>] permission+0x97/0xa3 [<c0167c20>] may_open+0x53/0x200 [<e0be2ee1>] cfs_alloc+0x31/0x60 [libcfs] [<e0be6cb3>] libcfs_psdev_open+0x0/0x32f [libcfs] [<e0be6d8f>] libcfs_psdev_open+0xdc/0x32f [libcfs] [<e0be6cb3>] libcfs_psdev_open+0x0/0x32f [libcfs] [<e0be53a9>] libcfs_psdev_open+0x1d/0x23 [libcfs] [<c02039d7>] misc_open+0x119/0x1c9 [<c0162f0e>] chrdev_open+0x12b/0x161 [<c0162de3>] chrdev_open+0x0/0x161 [<c0159f72>] __dentry_open+0xf5/0x1c4 [<c015a0b1>] nameidata_to_filp+0x25/0x37 [<c015a115>] do_filp_open+0x52/0x5a [<e0be7376>] libcfs_ioctl+0x0/0x6da [libcfs] [<e0be550c>] libcfs_ioctl+0x13a/0x151 [libcfs] [<c016b0c4>] do_ioctl+0x48/0x5e [<c016b326>] vfs_ioctl+0x24c/0x25e [<c016b389>] sys_ioctl+0x51/0x68 [<c0103bcb>] sysenter_past_esp+0x54/0x79 Code: Bad EIP value. n32:~/lustre-1.5.95 # > --> | David Vasil <dmvasil@ornl.gov> > | Oak Ridge National Laboratory NCCS Division > | High Performance Computing Systems Administrator >---------------------------------------- Dr Thierry DELAITRE Systems and Services Manager, CSCS University of Westminster 115 New Cavendish Street, London W1W 6UW Tel: 020 7911 5000 ext: 3586 Fax: 020 7911 5089 Mobile short dial code 1788 http://www.cscs.wmin.ac.uk/~delaitt ---------------------------------------- This e-mail and its attachments are intended for the above named only and may be confidential. If they have come to you in error you must not copy or show them to anyone, nor should you take any action based on them, other than to notify the error by replying to the sender.