Ms. Megan Larko
2008-Aug-22 18:12 UTC
[Lustre-discuss] Input/Output error starting lustre 1.6.4.3smp
Happy Friday! I have a box which I am configuring to be a new (better hw) MGS for our lustre system. (FYI, this is not the MGS/MDT I have been using to benchmark my new OSS/OSTs) The OS is CentOS 5.2. The lustre kernel and similar rps are: * 2fsprogs-1.40.4.cfs1-0redhat.x86_64.rpm * kernel-lustre-smp-2.6.18-53.1.13.el5_lustre.1.6.4.3.x86_64.rpm * lustre-1.6.4.3-2.6.18_53.1.13.el5_lustre.1.6.4.3smp_200804260904.x86_64.rpm * lustre-debuginfo-1.6.4.3-2.6.18_53.1.13.el5_lustre.1.6.4.3smp_200804260904.x86_64.rpm * lustre-iokit-1.2-200709210921.noarch.rpm * lustre-ldiskfs-3.0.4-2.6.18_53.1.13.el5_lustre.1.6.4.3smp.x86_64.rpm * lustre-modules-1.6.4.3-2.6.18_53.1.13.el5_lustre.1.6.4.3smp_200804260904.x86_64.rpm I did have to rpm -e my CentOS default e2fsprogs-1.39 and e2fsprogs-libs-1.39 in order to install e2fsprogs-1.40.4.cfs.1-0redat. The e2fsprogs2-1.40 went without error after I did the ''erase'' of the other two earlier versions. When I attempt to start lustre on this new MGS/MDT I receive the following errors: [root at mds ~]# modprobe lustre WARNING: Error inserting ptlrpc (/lib/modules/2.6.18-53.1.13.el5_lustre.1.6.4.3smp/kernel/fs/lustre/ptlrpc.ko): Input/output error WARNING: Error inserting mdc (/lib/modules/2.6.18-53.1.13.el5_lustre.1.6.4.3smp/kernel/fs/lustre/mdc.ko): Unknown symbol in module, or unknown parameter (see dmesg) WARNING: Error inserting lov (/lib/modules/2.6.18-53.1.13.el5_lustre.1.6.4.3smp/kernel/fs/lustre/lov.ko): Unknown symbol in module, or unknown parameter (see dmesg) FATAL: Error inserting lustre (/lib/modules/2.6.18-53.1.13.el5_lustre.1.6.4.3smp/kernel/fs/lustre/lustre.ko): Unknown symbol in module, or unknown parameter (see dmesg) Some info on the files: New MDS: root at mds ~]# ls -l /lib/modules/2.6.18-53.1.13.el5_lustre.1.6.4.3smp/kernel/fs/lustre/ptlrpc.ko -rw-r--r-- 1 root root 9280207 Apr 26 09:09 /lib/modules/2.6.18-53.1.13.el5_lustre.1.6.4.3smp/kernel/fs/lustre/ptlrpc.ko ls -lh /lib/modules/2.6.18-53.1.13.el5_lustre.1.6.4.3smp/kernel/fs/lustre/mdc.ko -rw-r--r-- 1 root root 1.7M Apr 26 09:09 /lib/modules/2.6.18-53.1.13.el5_lustre.1.6.4.3smp/kernel/fs/lustre/mdc.ko ls -l /lib/modules/2.6.18-53.1.13.el5_lustre.1.6.4.3smp/kernel/fs/lustre/lov.ko -rw-r--r-- 1 root root 3357812 Apr 26 09:09 /lib/modules/2.6.18-53.1.13.el5_lustre.1.6.4.3smp/kernel/fs/lustre/lov.ko ls -l /lib/modules/2.6.18-53.1.13.el5_lustre.1.6.4.3smp/kernel/fs/lustre/lustre.ko -rw-r--r-- 1 root root 5432229 Apr 26 09:09 /lib/modules/2.6.18-53.1.13.el5_lustre.1.6.4.3smp/kernel/fs/lustre/lustre.ko Old MDS: [root at mds1 ~]# ls -l /lib/modules/2.6.18-53.1.13.el5_lustre.1.6.4.3smp/kernel/fs/lustre/ptlrpc.ko -rw-r--r-- 1 root root 9280207 Apr 26 10:09 /lib/modules/2.6.18-53.1.13.el5_lustre.1.6.4.3smp/kernel/fs/lustre/ptlrpc.ko ls -lh /lib/modules/2.6.18-53.1.13.el5_lustre.1.6.4.3smp/kernel/fs/lustre/mdc.ko rw-r--r-- 1 root root 1.7M Apr 26 10:09 /lib/modules/2.6.18-53.1.13.el5_lustre.1.6.4.3smp/kernel/fs/lustre/mdc.ko ls -l /lib/modules/2.6.18-53.1.13.el5_lustre.1.6.4.3smp/kernel/fs/lustre/lov.ko -rw-r--r-- 1 root root 3357812 Apr 26 10:09 /lib/modules/2.6.18-53.1.13.el5_lustre.1.6.4.3smp/kernel/fs/lustre/lov.ko ls -l /lib/modules/2.6.18-53.1.13.el5_lustre.1.6.4.3smp/kernel/fs/lustre/lustre.ko -rw-r--r-- 1 root root 5432229 Apr 26 10:09 /lib/modules/2.6.18-53.1.13.el5_lustre.1.6.4.3smp/kernel/fs/lustre/lustre.ko The files are the same size in the same location (they were installed from the same rpm. The dmesg shows many unresolved symbols on the new MDS. I am wondering if the "Input/output error" is a hardware-related issue? I see no hardware error messages at all in the log files. Am I correct in assuming that the many unresolved symbol errors are cascading from the one file, ptlrpc.ko, which was not correctly accessed? Any ideas as to why I cannot successfully access the ptlrpc.ko file? There were no errors installing the lustre rpms listed above other than the e2fsprogs info. Any and all suggestions appreciated. Enjoy your weekend. megan
Brian J. Murrell
2008-Aug-22 18:22 UTC
[Lustre-discuss] Input/Output error starting lustre 1.6.4.3smp
On Fri, 2008-08-22 at 14:12 -0400, Ms. Megan Larko wrote:> Happy Friday!Isn''t it though?> When I attempt to start lustre on this new MGS/MDT I receive the > following errors: > [root at mds ~]# modprobe lustre > WARNING: Error inserting ptlrpc > (/lib/modules/2.6.18-53.1.13.el5_lustre.1.6.4.3smp/kernel/fs/lustre/ptlrpc.ko): > Input/output errorHrm. This seems a bit strange.> The dmesg shows many unresolved symbols on the new > MDS.It will for any of the modules which were depending on ptlrpc.ko (which failed to load).> I am wondering if the "Input/output error" is a hardware-related > issue?Seems like it could be. You will have to look at your dmesg carefully for where the errors start. Whatever is at the top is most relevant. Or you can use dmesg -c to clear the ring buffer and try again.> Any ideas as to why I cannot successfully access the > ptlrpc.ko file?Can you cat (or dd) it to /dev/null without error? b. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20080822/b558023e/attachment.bin
Ms. Megan Larko
2008-Aug-22 18:45 UTC
[Lustre-discuss] Input/Output error starting lustre 1.6.4.3smp
More information: The dmesg on new MGS shows the following error: LustreError: 7165:0:(linux-tcpip.c:106:libcfs_ipif_query()) Can''t get flags for interface ib0 LustreError: 7165:0:(o2iblnd.c:1552:kiblnd_startup()) Can''t query IPoIB interface ib0: -19 LustreError: 105-4: Error -100 starting up LNI o2ib LustreError: 7165:0:(events.c:654:ptlrpc_init_portals()) network initialisation failed This is followed by the numerous unknown symbols for mdc, lov, and lustre. Again, on identical hw I can ''modprobe lustre'' and receive no errors in dmesg: Lustre: OBD class driver, info at clusterfs.com Lustre Version: 1.6.4.3 Build Version: 1.6.4.3-19691231190000-PRISTINE-.tmp.lustre-build.4180.kernel.linux-2.6.18-53.1.13.el5_lustre.1.6.4.3.-2.6.18-53.1.13.el5_lustre.1.6.4.3smp Lustre: Added LNI 172.18.0.15 at o2ib [8/64] Lustre: Lustre Client File System; info at clusterfs.com ...and good in /var/log/messages: Aug 22 14:32:17 oss4 kernel: libcfs: no version for "struct_module" found: kernel tainted. Aug 22 14:32:17 oss4 kernel: Lustre: OBD class driver, info at clusterfs.com Aug 22 14:32:17 oss4 kernel: Lustre Version: 1.6.4.3 Aug 22 14:32:17 oss4 kernel: Build Version: 1.6.4.3-19691231190000-PRISTINE-.tmp.lustre-build.4180.kernel.linux-2.6.18-53.1.13.el5_lustre.1.6.4.3.-2.6.18-53.1.13.el5_lustre.1.6.4.3smp Aug 22 14:32:18 oss4 kernel: Lustre: Added LNI 172.18.0.15 at o2ib [8/64] Aug 22 14:32:18 oss4 kernel: Lustre: Lustre Client File System; info at clusterfs.com Just FYI, Thank you. megan
Brian J. Murrell
2008-Aug-22 19:09 UTC
[Lustre-discuss] Input/Output error starting lustre 1.6.4.3smp
On Fri, 2008-08-22 at 14:45 -0400, Ms. Megan Larko wrote:> More information: > > The dmesg on new MGS shows the following error: > LustreError: 7165:0:(linux-tcpip.c:106:libcfs_ipif_query()) Can''t get > flags for interface ib0 > LustreError: 7165:0:(o2iblnd.c:1552:kiblnd_startup()) Can''t query > IPoIB interface ib0: -19#define ENODEV 19 /* No such device */ It seems lustre can''t find your ib0 device. Do you have it plumbed with an (ipoib) address and up? You should make sure you have basic IP connectivity over IB (i.e. pinging) before you venture to getting lustre working. b. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20080822/36c81564/attachment.bin