Hello,
I've got a rather strange setup over here with about 6 ext3 partitions, one
50gig lvm partition and one 2 gig software-raid1 partition running under a
heavily patched 2.4.10 kernel (various netfilter patches, freeswan, current lvm
patch (1.0.1-rc4 and current ext3 patch). Furthermore I have all of those
neccessary filesystem tools, so don't tell me to upgrade :-)
The machine was under heavily load (7.89) when the crash happend, with an
updatedb, various makes and some tar processes running on the same partiton
(lvm). I can reproduce this when bringing the partition on heavy load again
(lot's of tar's and find's should do the trick). The oops trace
below didn't crash my system only the updatedb find command, but I also had
some oops crashing my box completely (it even stopped answering echo request
packets!). I didn't figgure out yet how to catch those oops to a floppy or
something, so if one of you knows, tell me please, so I can provide some more
information.
Anyway, here is one ksymoops output:
************************************
ksymoops 2.4.3 on i686 2.4.10. Options used
-v /boot/vmlinuz (default)
-k /proc/ksyms (default)
-l /proc/modules (default)
-o /lib/modules/2.4.10/ (default)
-m /boot/System.map (default)
Warning: You did not tell me where to find symbol information. I will
assume that the log matches the kernel and modules that are running
right now and I'll use the default options above for symbol resolution.
If the current kernel and/or modules do not match the log, you can get
more accurate output by telling me the kernel version and where to find
map, modules, ksyms etc. ksymoops -h explains the options.
/usr/bin/nm: /boot/vmlinuz: File format not recognized
Error (pclose_local): read_nm_symbols pclose failed 0x100
Warning (read_vmlinux): no kernel symbols in vmlinux, is /boot/vmlinuz a valid
vmlinux file?
Warning (compare_maps): mismatch on symbol partition_name , ksyms_base says
c023f9e0, System.map says c014c090. Ignoring ksyms_base entry
Unable to handle kernel NULL pointer dereference at virtual address 00000906
c0152ddb
*pde = 00000000
Oops: 0000
CPU: 0
EIP: 0010:[ext3_find_entry+475/788]
EFLAGS: 00010287
eax: 00000905 ebx: 00000900 ecx: 00000005 edx: 00001900
esi: c38eeb60 edi: c38eeb60 ebp: 00000000 esp: c4e83eb0
ds: 0018 es: 0018 ss: 0018
Process find (pid: 27143, stackpage=c4e83000)
Stack: fffffff4 c38eeb60 c38eeb60 c48fed60 c4e83efc 00000005 c38eebbc cf8afc94
c0d1c980 00000000 00000000 00000000 00000001 00000000 00000000 00000000
cf7d4400 cf8afc94 00000000 cf8afc94 00000246 00000000 c0d1c980 c0d1c9e4
Call Trace: [d_alloc+27/348] [ext3_lookup+39/124] [real_lookup+83/196]
[path_walk+1201/1736] [getname+93/156]
Code: 0f b6 43 06 39 c1 75 4d 83 3b 00 74 48 8b 74 24 18 8b 4c 24
Using defaults from ksymoops -t elf32-i386 -a i386
Code; 00000000 Before first symbol
00000000 <_EIP>:
Code; 00000000 Before first symbol
0: 0f b6 43 06 movzbl 0x6(%ebx),%eax
Code; 00000004 Before first symbol
4: 39 c1 cmp %eax,%ecx
Code; 00000006 Before first symbol
6: 75 4d jne 55 <_EIP+0x55> 00000054 Before
first symbol
Code; 00000008 Before first symbol
8: 83 3b 00 cmpl $0x0,(%ebx)
Code; 0000000a Before first symbol
b: 74 48 je 55 <_EIP+0x55> 00000054 Before
first symbol
Code; 0000000c Before first symbol
d: 8b 74 24 18 mov 0x18(%esp,1),%esi
Code; 00000010 Before first symbol
11: 8b 4c 24 00 mov 0x0(%esp,1),%ecx
3 warnings and 1 error issued. Results may not be reliable.
************************************
Yes, /boot/vmlinuz *is* a valid kernel file and yes I also tried
/usr/src/linux/vmlinux, same results.
The crashes started yesterday, the day when I upgraded my system from 2.4.8 to
2.4.10. I was using lvm patch 1.0.1-rc1 then, and switched to 1.0.1-rc4 now.
I'm not really sure who's fault it is, but since there were some debug
statements about ext3 in the oops message, I thought to sent it here first.
I hope you can use this information, if something's missing, tell me please,
I'm on list.
--
Regards,
Wiktor Wodecki | http://johoho.eggheads.org
wodecki@gmx.de | IRC: Johoho@IrcNET
Stephen C. Tweedie
2001-Oct-05 13:32 UTC
Re: Kernel Ooops probably in conjunction with lvm
Hi, On Fri, Oct 05, 2001 at 01:17:45PM +0200, Wiktor Wodecki wrote:> I've got a rather strange setup over here with about 6 ext3 partitions, one 50gig lvm partition and one 2 gig software-raid1 partition running under a heavily patched 2.4.10 kernel (various netfilter patches, freeswan, current lvm patch (1.0.1-rc4 and current ext3 patch). Furthermore I have all of those neccessary filesystem tools, so don't tell me to upgrade :-)Is this a large-memory (>=1GB) box? It appears that we've got a buffer_head whose "b_data" is 0x900, which indicates that the buffer is a highmem one. Highmem buffers should not be used for filesystem metadata: if ext3 is being given such a buffer, it's a core VFS fault (and the VFS changed subsantially in this area in 2.4.10). There have been fixes to related code since 2.4.10, but it's also entirely possible that it's an LVM interaction which is causing the problem. Once ext3-0.9.11 is out for 2.4.11-pre*, I'd suggest giving that a try and seeing if you can reproduce this. I'm 99% sure it's not an ext3 fault, though --- the footprint is clearly a highmem buffer_head occurring where we don't expect ever to see one. Cheers, Stephen