Juergen Kabelitz
2007-May-10 14:21 UTC
[Lustre-discuss] problems with SLES 10 and openib-1.1 by installing lustre over infiniband
Hi all, This is my first message to this list. We have a problem compiling lustre with SLES10 kernel like mentioned in Subject, o2ib driver and OFED 1.1. First I install the rpm kernel-lustre-smp-2.6.16-27_0.9_lustre.1.6.0.x86_64.rpm and boot the system and the system can boot. Then I include the driver for an areca raid-controller with a full build of the kernel(make bzImage;make modules; make modules_install and the reboot). The Next step was to install the openib-1.1 with the BUILd_ID: OFED-1.1 openib-1.1 (REV=9905) # User space https://openib.org/svn/gen2/branches/1.1/src/userspace Git: ref: refs/heads/ofed_1_1 commit a083ec1174cb4b5a5052ef5de9a8175df82e864a # MPI mpi_osu-0.9.7-mlx2.2.0.tgz openmpi-1.1.1-1.src.rpm mpitests-2.0-0.src.rpm After the reboot the system works with the infiniband software. Then I go to the directory /usr/src/lustre-1.6.0 and give the configure ./configure --with-linux=/usr/src/linux-2.6.16-27-0.9_lustre.1.6.0 --with-o2ib=/usr/local/ofed/src/openib-1.1/ Follow by a make rpms This compiles with a lot of warnings. The step rpm -iv /usr/src/packages/RPMS/x86_64/lustre-modules-1.6.0-2.6.16_27_0.9_lustre.1.6.0custom_200705101552.x86_64.rpm gives a lot of messages like this: ksym(recalc_sigpending) = fb6af58d is needed by lustre-modules-1.6.0-2.6.16_27_0.9_lustre.1.6.0custom_200705101552.x86_64 ksym(fget) = fba072b4 is needed by lustre-modules-1.6.0-2.6.16_27_0.9_lustre.1.6.0custom_200705101552.x86_64 With the trick rpm2cpio the rpm-packeged can be changed to a cpio file. This can be installed. With insmod you can install the module libcfs and lnet. After a depmod the module can be load in the kernel with modprobe. But when you load the module ko2iblnd you get the following answer: mserv0002:/lib/modules # uname -a Linux mserv0002 2.6.16-27-0.9_lustre.1.6.0custom #1 SMP Thu May 10 13:12:30 CEST 2007 x86_64 x86_64 x86_64 GNU/Linux mserv0002:/lib/modules # cd 2.6.16-27-0.9_lustre.1.6.0custom/kernel/net/ mserv0002:..kernel/net # cd lustre/ mserv0002:..net/lustre # l total 7744 drwxr-xr-x 2 root root 4096 May 10 16:07 ./ drwxr-xr-x 32 root root 4096 May 10 15:17 ../ -rw-r--r-- 1 root root 1080451 May 10 15:55 ko2iblnd.ko -rw-r--r-- 1 root root 1281644 May 10 15:55 ksocklnd.ko -rw-r--r-- 1 root root 2696006 May 10 15:55 libcfs.ko -rw-r--r-- 1 root root 2840143 May 10 15:55 lnet.ko mserv0002:..net/lustre # insmod ./ko2iblnd.ko insmod: error inserting ''./ko2iblnd.ko'': -1 Unknown symbol in module mserv0002:..net/lustre # dmeg shows: ko2iblnd: disagrees about version of symbol ib_create_cq ko2iblnd: Unknown symbol ib_create_cq ko2iblnd: disagrees about version of symbol ib_dereg_mr ko2iblnd: Unknown symbol ib_dereg_mr ko2iblnd: disagrees about version of symbol ib_destroy_cq ko2iblnd: Unknown symbol ib_destroy_cq ko2iblnd: disagrees about version of symbol ib_get_dma_mr ko2iblnd: Unknown symbol ib_get_dma_mr ko2iblnd: disagrees about version of symbol ib_alloc_pd ko2iblnd: Unknown symbol ib_alloc_pd ko2iblnd: disagrees about version of symbol ib_modify_qp ko2iblnd: Unknown symbol ib_modify_qp ko2iblnd: disagrees about version of symbol ib_dealloc_pd ko2iblnd: Unknown symbol ib_dealloc_pd The kernel knows this module: mserv0002:/proc # cat kallsyms | grep ib_create_cq ffffffff88188c3e U ib_create_cq [ib_ipoib] ffffffff88188c3e U ib_create_cq [ib_mad] ffffffff8818d530 r __kcrctab_ib_create_cq [ib_core] ffffffff8818d800 r __ksymtab_ib_create_cq [ib_core] ffffffff8818dc5a r __kstrtab_ib_create_cq [ib_core] ffffffff88188c3e T ib_create_cq [ib_core] 00000000d08533c7 a __crc_ib_create_cq [ib_core] In the /var/log/message file you can read: May 10 17:00:19 mserv0002 kernel: ko2iblnd: disagrees about version of symbol ib_create_cq May 10 17:00:19 mserv0002 kernel: ko2iblnd: Unknown symbol ib_create_cq May 10 17:00:19 mserv0002 kernel: ko2iblnd: disagrees about version of symbol ib_dereg_mr May 10 17:00:19 mserv0002 kernel: ko2iblnd: Unknown symbol ib_dereg_mr May 10 17:00:19 mserv0002 kernel: ko2iblnd: disagrees about version of symbol ib_destroy_cq May 10 17:00:19 mserv0002 kernel: ko2iblnd: Unknown symbol ib_destroy_cq May 10 17:00:19 mserv0002 kernel: ko2iblnd: disagrees about version of symbol ib_get_dma_mr May 10 17:00:19 mserv0002 kernel: ko2iblnd: Unknown symbol ib_get_dma_mr May 10 17:00:19 mserv0002 kernel: ko2iblnd: disagrees about version of symbol ib_alloc_pd May 10 17:00:19 mserv0002 kernel: ko2iblnd: Unknown symbol ib_alloc_pd May 10 17:00:19 mserv0002 kernel: ko2iblnd: disagrees about version of symbol ib_modify_qp May 10 17:00:19 mserv0002 kernel: ko2iblnd: Unknown symbol ib_modify_qp May 10 17:00:19 mserv0002 kernel: ko2iblnd: disagrees about version of symbol ib_dealloc_pd May 10 17:00:19 mserv0002 kernel: ko2iblnd: Unknown symbol ib_dealloc_pd Can anybody help me? With regards J. Kabelitz sysGen GmbH Support und Technik Clustersysteme Am Hallacker 48 28327 Bremen Tel (0421) 40966 -28 Fax (0421) 40966 -66 mailto:jkabelitz@sysgen.de www.sysgen.de Gesch?ftsf?hrerin Gabriele Nikisch Eingetragen beim Amtsgericht Walsrode HRB 121943 -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.clusterfs.com/pipermail/lustre-discuss/attachments/20070510/b9772f5b/attachment.html