Juergen Kabelitz
2007-May-10 14:21 UTC
[Lustre-discuss] problems with SLES 10 and openib-1.1 by installing lustre over infiniband
Hi all,
This is my first message to this list.
We have a problem compiling lustre with SLES10 kernel like mentioned in Subject,
o2ib driver and OFED 1.1.
First I install the rpm kernel-lustre-smp-2.6.16-27_0.9_lustre.1.6.0.x86_64.rpm
and boot the system and the system can boot. Then I include the driver for an
areca raid-controller with a full build of the kernel(make bzImage;make modules;
make modules_install and the reboot). The Next step was to install the
openib-1.1 with the BUILd_ID:
OFED-1.1
openib-1.1 (REV=9905)
# User space
https://openib.org/svn/gen2/branches/1.1/src/userspace
Git:
ref: refs/heads/ofed_1_1
commit a083ec1174cb4b5a5052ef5de9a8175df82e864a
# MPI
mpi_osu-0.9.7-mlx2.2.0.tgz
openmpi-1.1.1-1.src.rpm
mpitests-2.0-0.src.rpm
After the reboot the system works with the infiniband software. Then I go to the
directory /usr/src/lustre-1.6.0 and give the configure
./configure --with-linux=/usr/src/linux-2.6.16-27-0.9_lustre.1.6.0
--with-o2ib=/usr/local/ofed/src/openib-1.1/
Follow by a make rpms
This compiles with a lot of warnings.
The step
rpm -iv
/usr/src/packages/RPMS/x86_64/lustre-modules-1.6.0-2.6.16_27_0.9_lustre.1.6.0custom_200705101552.x86_64.rpm
gives a lot of messages like this:
ksym(recalc_sigpending) = fb6af58d is needed by
lustre-modules-1.6.0-2.6.16_27_0.9_lustre.1.6.0custom_200705101552.x86_64
ksym(fget) = fba072b4 is needed by
lustre-modules-1.6.0-2.6.16_27_0.9_lustre.1.6.0custom_200705101552.x86_64
With the trick rpm2cpio the rpm-packeged can be changed to a cpio file. This can
be installed. With insmod you can install the module libcfs and lnet. After a
depmod the module can be load in the kernel with modprobe.
But when you load the module ko2iblnd you get the following answer:
mserv0002:/lib/modules # uname -a
Linux mserv0002 2.6.16-27-0.9_lustre.1.6.0custom #1 SMP Thu May 10 13:12:30 CEST
2007 x86_64 x86_64 x86_64 GNU/Linux
mserv0002:/lib/modules # cd 2.6.16-27-0.9_lustre.1.6.0custom/kernel/net/
mserv0002:..kernel/net # cd lustre/
mserv0002:..net/lustre # l
total 7744
drwxr-xr-x 2 root root 4096 May 10 16:07 ./
drwxr-xr-x 32 root root 4096 May 10 15:17 ../
-rw-r--r-- 1 root root 1080451 May 10 15:55 ko2iblnd.ko
-rw-r--r-- 1 root root 1281644 May 10 15:55 ksocklnd.ko
-rw-r--r-- 1 root root 2696006 May 10 15:55 libcfs.ko
-rw-r--r-- 1 root root 2840143 May 10 15:55 lnet.ko
mserv0002:..net/lustre # insmod ./ko2iblnd.ko
insmod: error inserting ''./ko2iblnd.ko'': -1 Unknown symbol in
module
mserv0002:..net/lustre #
dmeg shows:
ko2iblnd: disagrees about version of symbol ib_create_cq
ko2iblnd: Unknown symbol ib_create_cq
ko2iblnd: disagrees about version of symbol ib_dereg_mr
ko2iblnd: Unknown symbol ib_dereg_mr
ko2iblnd: disagrees about version of symbol ib_destroy_cq
ko2iblnd: Unknown symbol ib_destroy_cq
ko2iblnd: disagrees about version of symbol ib_get_dma_mr
ko2iblnd: Unknown symbol ib_get_dma_mr
ko2iblnd: disagrees about version of symbol ib_alloc_pd
ko2iblnd: Unknown symbol ib_alloc_pd
ko2iblnd: disagrees about version of symbol ib_modify_qp
ko2iblnd: Unknown symbol ib_modify_qp
ko2iblnd: disagrees about version of symbol ib_dealloc_pd
ko2iblnd: Unknown symbol ib_dealloc_pd
The kernel knows this module:
mserv0002:/proc # cat kallsyms | grep ib_create_cq
ffffffff88188c3e U ib_create_cq [ib_ipoib]
ffffffff88188c3e U ib_create_cq [ib_mad]
ffffffff8818d530 r __kcrctab_ib_create_cq [ib_core]
ffffffff8818d800 r __ksymtab_ib_create_cq [ib_core]
ffffffff8818dc5a r __kstrtab_ib_create_cq [ib_core]
ffffffff88188c3e T ib_create_cq [ib_core]
00000000d08533c7 a __crc_ib_create_cq [ib_core]
In the /var/log/message file you can read:
May 10 17:00:19 mserv0002 kernel: ko2iblnd: disagrees about version of symbol
ib_create_cq
May 10 17:00:19 mserv0002 kernel: ko2iblnd: Unknown symbol ib_create_cq
May 10 17:00:19 mserv0002 kernel: ko2iblnd: disagrees about version of symbol
ib_dereg_mr
May 10 17:00:19 mserv0002 kernel: ko2iblnd: Unknown symbol ib_dereg_mr
May 10 17:00:19 mserv0002 kernel: ko2iblnd: disagrees about version of symbol
ib_destroy_cq
May 10 17:00:19 mserv0002 kernel: ko2iblnd: Unknown symbol ib_destroy_cq
May 10 17:00:19 mserv0002 kernel: ko2iblnd: disagrees about version of symbol
ib_get_dma_mr
May 10 17:00:19 mserv0002 kernel: ko2iblnd: Unknown symbol ib_get_dma_mr
May 10 17:00:19 mserv0002 kernel: ko2iblnd: disagrees about version of symbol
ib_alloc_pd
May 10 17:00:19 mserv0002 kernel: ko2iblnd: Unknown symbol ib_alloc_pd
May 10 17:00:19 mserv0002 kernel: ko2iblnd: disagrees about version of symbol
ib_modify_qp
May 10 17:00:19 mserv0002 kernel: ko2iblnd: Unknown symbol ib_modify_qp
May 10 17:00:19 mserv0002 kernel: ko2iblnd: disagrees about version of symbol
ib_dealloc_pd
May 10 17:00:19 mserv0002 kernel: ko2iblnd: Unknown symbol ib_dealloc_pd
Can anybody help me?
With regards
J. Kabelitz
sysGen GmbH
Support und Technik Clustersysteme
Am Hallacker 48
28327 Bremen
Tel (0421) 40966 -28
Fax (0421) 40966 -66
mailto:jkabelitz@sysgen.de
www.sysgen.de
Gesch?ftsf?hrerin Gabriele Nikisch
Eingetragen beim Amtsgericht Walsrode HRB 121943
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://mail.clusterfs.com/pipermail/lustre-discuss/attachments/20070510/b9772f5b/attachment.html