Hello, I've got a rather strange setup over here with about 6 ext3 partitions, one 50gig lvm partition and one 2 gig software-raid1 partition running under a heavily patched 2.4.10 kernel (various netfilter patches, freeswan, current lvm patch (1.0.1-rc4 and current ext3 patch). Furthermore I have all of those neccessary filesystem tools, so don't tell me to upgrade :-) The machine was under heavily load (7.89) when the crash happend, with an updatedb, various makes and some tar processes running on the same partiton (lvm). I can reproduce this when bringing the partition on heavy load again (lot's of tar's and find's should do the trick). The oops trace below didn't crash my system only the updatedb find command, but I also had some oops crashing my box completely (it even stopped answering echo request packets!). I didn't figgure out yet how to catch those oops to a floppy or something, so if one of you knows, tell me please, so I can provide some more information. Anyway, here is one ksymoops output: ************************************ ksymoops 2.4.3 on i686 2.4.10. Options used -v /boot/vmlinuz (default) -k /proc/ksyms (default) -l /proc/modules (default) -o /lib/modules/2.4.10/ (default) -m /boot/System.map (default) Warning: You did not tell me where to find symbol information. I will assume that the log matches the kernel and modules that are running right now and I'll use the default options above for symbol resolution. If the current kernel and/or modules do not match the log, you can get more accurate output by telling me the kernel version and where to find map, modules, ksyms etc. ksymoops -h explains the options. /usr/bin/nm: /boot/vmlinuz: File format not recognized Error (pclose_local): read_nm_symbols pclose failed 0x100 Warning (read_vmlinux): no kernel symbols in vmlinux, is /boot/vmlinuz a valid vmlinux file? Warning (compare_maps): mismatch on symbol partition_name , ksyms_base says c023f9e0, System.map says c014c090. Ignoring ksyms_base entry Unable to handle kernel NULL pointer dereference at virtual address 00000906 c0152ddb *pde = 00000000 Oops: 0000 CPU: 0 EIP: 0010:[ext3_find_entry+475/788] EFLAGS: 00010287 eax: 00000905 ebx: 00000900 ecx: 00000005 edx: 00001900 esi: c38eeb60 edi: c38eeb60 ebp: 00000000 esp: c4e83eb0 ds: 0018 es: 0018 ss: 0018 Process find (pid: 27143, stackpage=c4e83000) Stack: fffffff4 c38eeb60 c38eeb60 c48fed60 c4e83efc 00000005 c38eebbc cf8afc94 c0d1c980 00000000 00000000 00000000 00000001 00000000 00000000 00000000 cf7d4400 cf8afc94 00000000 cf8afc94 00000246 00000000 c0d1c980 c0d1c9e4 Call Trace: [d_alloc+27/348] [ext3_lookup+39/124] [real_lookup+83/196] [path_walk+1201/1736] [getname+93/156] Code: 0f b6 43 06 39 c1 75 4d 83 3b 00 74 48 8b 74 24 18 8b 4c 24 Using defaults from ksymoops -t elf32-i386 -a i386 Code; 00000000 Before first symbol 00000000 <_EIP>: Code; 00000000 Before first symbol 0: 0f b6 43 06 movzbl 0x6(%ebx),%eax Code; 00000004 Before first symbol 4: 39 c1 cmp %eax,%ecx Code; 00000006 Before first symbol 6: 75 4d jne 55 <_EIP+0x55> 00000054 Before first symbol Code; 00000008 Before first symbol 8: 83 3b 00 cmpl $0x0,(%ebx) Code; 0000000a Before first symbol b: 74 48 je 55 <_EIP+0x55> 00000054 Before first symbol Code; 0000000c Before first symbol d: 8b 74 24 18 mov 0x18(%esp,1),%esi Code; 00000010 Before first symbol 11: 8b 4c 24 00 mov 0x0(%esp,1),%ecx 3 warnings and 1 error issued. Results may not be reliable. ************************************ Yes, /boot/vmlinuz *is* a valid kernel file and yes I also tried /usr/src/linux/vmlinux, same results. The crashes started yesterday, the day when I upgraded my system from 2.4.8 to 2.4.10. I was using lvm patch 1.0.1-rc1 then, and switched to 1.0.1-rc4 now. I'm not really sure who's fault it is, but since there were some debug statements about ext3 in the oops message, I thought to sent it here first. I hope you can use this information, if something's missing, tell me please, I'm on list. -- Regards, Wiktor Wodecki | http://johoho.eggheads.org wodecki@gmx.de | IRC: Johoho@IrcNET
Stephen C. Tweedie
2001-Oct-05 13:32 UTC
Re: Kernel Ooops probably in conjunction with lvm
Hi, On Fri, Oct 05, 2001 at 01:17:45PM +0200, Wiktor Wodecki wrote:> I've got a rather strange setup over here with about 6 ext3 partitions, one 50gig lvm partition and one 2 gig software-raid1 partition running under a heavily patched 2.4.10 kernel (various netfilter patches, freeswan, current lvm patch (1.0.1-rc4 and current ext3 patch). Furthermore I have all of those neccessary filesystem tools, so don't tell me to upgrade :-)Is this a large-memory (>=1GB) box? It appears that we've got a buffer_head whose "b_data" is 0x900, which indicates that the buffer is a highmem one. Highmem buffers should not be used for filesystem metadata: if ext3 is being given such a buffer, it's a core VFS fault (and the VFS changed subsantially in this area in 2.4.10). There have been fixes to related code since 2.4.10, but it's also entirely possible that it's an LVM interaction which is causing the problem. Once ext3-0.9.11 is out for 2.4.11-pre*, I'd suggest giving that a try and seeing if you can reproduce this. I'm 99% sure it's not an ext3 fault, though --- the footprint is clearly a highmem buffer_head occurring where we don't expect ever to see one. Cheers, Stephen