After some dialog with Terry Lambert on -hackers, I've been advised to
post this here.
I have a 4.7-RELEASE-p10 box that is suffering regular kernel panics.
The machine is a Dell 2650 running primarily as a file/print server to
a number of computer labs of about 400 machines (although it also
functions as a rembo image server and squid proxy). It mainly stores
applications, which are run off a samba share and user home directories
(again, accessed via samba). It has a largish filesystem (~200G) on a
Powervault 220 attached via a PERC3/DC controller (amr) that most of
the data is stored on. The OS is on a pair of internal 18G drives
attached to the internal PERC3/Di controller (aac). It is attached to
the network with a Netgear GA620 fibre NIC (ti).
The panic is being triggered by the "find" run in
/etc/periodic/daily/100.clean-disks. Disabling this script has, for
the moment, circumvented the problem - although from what I can gather
it is a kernel bug.
The machine is scheduled to be updated to 4.8 in two weeks. If anyone
knows if this issue has been resolved already, please let me know. If
it hasn't, or the status is unknown, I'd be quite happy to re-enable
the daily script triggering the problem once the system has been
upgraded and provide the necessary crash dumps, etc to help solve it.
Terry tells me it has been fixed in -current.
Here is the relevant system info. If I've forgotten anything, or there
is anything more anyone needs to help fix the problem, please let me
know.
leela# uname -a
FreeBSD leela.lab.bel.uq.edu.au 4.7-RELEASE-p10 FreeBSD 4.7-RELEASE-p10
#0: Mon Apr 7 10:34:08 EST 2003
root@leela.lab.bel.uq.edu.au:/usr/src/sys/compile/LEELA i386
leela#
leela# cat /var/run/dmesg.boot
Copyright (c) 1992-2002 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights
reserved.
FreeBSD 4.7-RELEASE-p10 #0: Mon Apr 7 10:34:08 EST 2003
root@leela.lab.bel.uq.edu.au:/usr/src/sys/compile/LEELA
Timecounter "i8254" frequency 1193182 Hz
CPU: Pentium 4 (2392.26-MHz 686-class CPU)
Origin = "GenuineIntel" Id = 0xf27 Stepping = 7
Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE
,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,<b28>,ACC,<b31
>>
real memory = 2147418112 (2097088K bytes)
avail memory = 2088574976 (2039624K bytes)
Changing APIC ID for IO APIC #0 from 0 to 4 on chip
Changing APIC ID for IO APIC #1 from 0 to 5 on chip
Changing APIC ID for IO APIC #2 from 0 to 6 on chip
Programming 16 pins in IOAPIC #0
IOAPIC #0 intpin 2 -> irq 0
Programming 16 pins in IOAPIC #1
Programming 16 pins in IOAPIC #2
FreeBSD/SMP: Multiprocessor motherboard
cpu0 (BSP): apic id: 0, version: 0x00050014, at 0xfee00000
cpu1 (AP): apic id: 2, version: 0x00050014, at 0xfee00000
io0 (APIC): apic id: 4, version: 0x000f0011, at 0xfec00000
io1 (APIC): apic id: 5, version: 0x000f0011, at 0xfec01000
io2 (APIC): apic id: 6, version: 0x000f0011, at 0xfec02000
Preloaded elf kernel "kernel" at 0xc030d000.
Pentium Pro MTRR support enabled
md0: Malloc disk
Using $PIR table, 9 entries at 0xc00fc480
npx0: <math processor> on motherboard
npx0: INT 16 interface
pcib0: <Host to PCI bridge> on motherboard
IOAPIC #1 intpin 3 -> irq 2
IOAPIC #1 intpin 7 -> irq 7
IOAPIC #1 intpin 11 -> irq 10
pci0: <PCI bus> on pcib0
pci0: <unknown card> (vendor=0x1028, dev=0x000c) at 4.0 irq 2
pci0: <unknown card> (vendor=0x1028, dev=0x0008) at 4.1 irq 7
pci0: <unknown card> (vendor=0x1028, dev=0x000d) at 4.2 irq 10
pci0: <ATI Mach64-GR graphics accelerator> at 14.0
atapci0: <ServerWorks CSB5 ATA100 controller> port
0x8b0-0x8bf,0x8d8-0x8db,0x8d0-0x8d7,0x8c8-0x8cb,0x8c0-0x8c7 at device
15.1 on pci0
ata0: at 0x1f0 irq 14 on atapci0
ata1: at 0x170 irq 15 on atapci0
pci0: <OHCI USB controller> at 15.2 irq 5
isab0: <PCI to ISA bridge (vendor=1166 device=0225)> at device 15.3 on
pci0
isa0: <ISA bus> on isab0
pcib1: <Host to PCI bridge> on motherboard
IOAPIC #1 intpin 0 -> irq 11
pci1: <PCI bus> on pcib1
ti0: <Netgear GA620 1000baseSX Gigabit Ethernet> mem
0xfcf00000-0xfcf03fff irq 11 at device 6.0 on pci1
ti0: Ethernet address: 00:02:e3:00:0d:c6
pcib2: <Host to PCI bridge> on motherboard
pci2: <PCI bus> on pcib2
pcib8: <PCI to PCI bridge (vendor=8086 device=b154)> at device 6.0 on
pci2
IOAPIC #1 intpin 9 -> irq 13
pci3: <PCI bus> on pcib8
pcib9: <PCI to PCI bridge (vendor=8086 device=b154)> at device 0.0 on
pci3
IOAPIC #1 intpin 8 -> irq 16
pci4: <PCI bus> on pcib9
amr0: <AMI MegaRAID> mem 0xf0000000-0xf7ffffff irq 16 at device 0.0 on
pci4
amr0: <PERC 3/DC> Firmware 1.74, BIOS 3.27, 128MB RAM
pci3: <unknown card> (vendor=0x1077, dev=0x1216) at 1.0 irq 13
pcib3: <Host to PCI bridge> on motherboard
IOAPIC #1 intpin 12 -> irq 17
IOAPIC #1 intpin 13 -> irq 18
pci5: <PCI bus> on pcib3
bge0: <Broadcom BCM5701 Gigabit Ethernet> mem 0xeff10000-0xeff1ffff irq
17 at device 6.0 on pci5
bge0: Ethernet address: 00:06:5b:f3:09:7d
miibus0: <MII bus> on bge0
brgphy0: <BCM5701 10/100/1000baseTX PHY> on miibus0
brgphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseTX,
1000baseTX-FDX, auto
bge1: <Broadcom BCM5701 Gigabit Ethernet> mem 0xeff00000-0xeff0ffff irq
18 at device 8.0 on pci5
bge1: Ethernet address: 00:06:5b:f3:09:7e
miibus1: <MII bus> on bge1
brgphy1: <BCM5701 10/100/1000baseTX PHY> on miibus1
brgphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseTX,
1000baseTX-FDX, auto
pcib4: <Host to PCI bridge> on motherboard
IOAPIC #1 intpin 14 -> irq 19
pci6: <PCI bus> on pcib4
pcib10: <PCI to PCI bridge (vendor=8086 device=0309)> at device 8.0 on
pci6
IOAPIC #1 intpin 15 -> irq 20
pci7: <PCI bus> on pcib10
pci7: <unknown card> (vendor=0x9005, dev=0x00c5) at 6.0 irq 19
pci7: <unknown card> (vendor=0x9005, dev=0x00c5) at 6.1 irq 20
aac0: <Dell PERC 3/Di> mem 0xe0000000-0xe7ffffff irq 19 at device 8.1
on pci6
aac0: i960RX 100MHz, 118MB cache memory, optional battery present
aac0: Kernel 2.7-1, Build 3170, S/N 9c38d3
pcib5: <Host to PCI bridge> on motherboard
pci8: <PCI bus> on pcib5
pcib6: <Host to PCI bridge> on motherboard
pci9: <PCI bus> on pcib6
pcib7: <Host to PCI bridge> on motherboard
pci10: <PCI bus> on pcib7
orm0: <Option ROMs> at iomem
0xc0000-0xc7fff,0xc8000-0xcbfff,0xec000-0xeffff on isa0
fdc0: <NEC 72065B or clone> at port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on
isa0
fdc0: FIFO enabled, 8 bytes threshold
fd0: <1440-KB 3.5" drive> on fdc0 drive 0
atkbdc0: <Keyboard controller (i8042)> at port 0x60,0x64 on isa0
atkbd0: <AT Keyboard> irq 1 on atkbdc0
kbd0 at atkbd0
vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on
isa0
sc0: <System console> at flags 0x100 on isa0
sc0: VGA <16 virtual consoles, flags=0x300>
sio0 at port 0x3f8-0x3ff irq 4 flags 0x10 on isa0
sio0: type 16550A
APIC_IO: Testing 8254 interrupt delivery
APIC_IO: Broken MP table detected: 8254 is not connected to IOAPIC #0
intpin 2
APIC_IO: routing 8254 via 8259 and IOAPIC #0 intpin 0
ata0-slave: ATAPI identify retries exceeded
SMP: AP CPU #1 Launched!
acd0: CDROM <TEAC CD-ROM CD-224E> at ata0-master PIO4
amrd0: <MegaRAID logical drive> on amr0
amrd0: 209634MB (429330432 sectors) RAID 5 (optimal)
aacd0: <RAID 1 (Mirror)> on aac0
aacd0: 17355MB (35544576 sectors)
Mounting root from ufs:/dev/aacd0s1a
WARNING: / was not properly dismounted
leela#
leela# gdb -k /kernel.debug /export/crash/vmcore.1
GNU gdb 4.18 (FreeBSD)
Copyright 1998 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and
you are
welcome to change it and/or distribute copies of it under certain
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for
details.
This GDB was configured as "i386-unknown-freebsd"...Deprecated
bfd_read
called at
/usr/src/gnu/usr.bin/binutils/gdb/../../../../contrib/gdb/gdb/dbxread.c
line 2627 in elfstab_build_psymtabs
Deprecated bfd_read called at
/usr/src/gnu/usr.bin/binutils/gdb/../../../../contrib/gdb/gdb/dbxread.c
line 933 in fill_symbuf
SMP 2 cpus
IdlePTD at phsyical address 0x0032c000
initial pcb at physical address 0x002a29e0
panicstr: page fault
panic messages:
---
Fatal trap 12: page fault while in kernel mode
mp_lock = 00000002; cpuid = 0; lapic.id = 00000000
fault virtual address = 0x18
fault code = supervisor write, page not present
instruction pointer = 0x8:0xc01e1725
stack pointer = 0x10:0xfd05bc50
frame pointer = 0x10:0xfd05bc54
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, def32 1, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 16458 (find)
interrupt mask = none <- SMP: XXX
trap number = 12
panic: page fault
mp_lock = 00000002; cpuid = 0; lapic.id = 00000000
boot() called on cpu#0
syncing disks... 10
done
Uptime: 23h45m19s
amr0: flushing cache...done
dumping to dev #aacd/0x20001, offset 4194432
dump 2047 2046 2045 2044 2043 2042 2041 2040 2039 2038 2037 2036 2035
2034 2033 2032 2031 2030 2029 2028 2027 2026 [...]
39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 succeeded
aac0: shutting down controller...
---
#0 dumpsys () at ../../kern/kern_shutdown.c:487
487 if (dumping++) {
(kgdb) where
#0 dumpsys () at ../../kern/kern_shutdown.c:487
#1 0xc0163cf0 in boot (howto=256) at ../../kern/kern_shutdown.c:316
#2 0xc0164171 in panic (fmt=0xc0251c79 "%s") at
../../kern/kern_shutdown.c:595
#3 0xc0214e46 in trap_fatal (frame=0xfd05bc10, eva=24) at
../../i386/i386/trap.c:974
#4 0xc0214a99 in trap_pfault (frame=0xfd05bc10, usermode=0, eva=24) at
../../i386/i386/trap.c:867
#5 0xc02145df in trap (frame={tf_fs = 24, tf_es = -1071775728, tf_ds =
-1070989296, tf_edi = 1, tf_esi = 0, tf_ebp = -49955756,
tf_isp = -49955780, tf_ebx = 2, tf_edx = 0, tf_ecx = 1, tf_eax =
2, tf_trapno = 12, tf_err = 2, tf_eip = -1071769819,
tf_cs = 8, tf_eflags = 66118, tf_esp = 2, tf_ss = -49955720}) at
../../i386/i386/trap.c:466
#6 0xc01e1725 in _vm_object_allocate (type=2, size=1, object=0x0) at
../../vm/vm_object.c:158
#7 0xc01e18c4 in vm_object_allocate (type=2, size=1) at
../../vm/vm_object.c:241
#8 0xc01e753d in vnode_pager_alloc (handle=0xff7fce00, size=512,
prot=0, offset=0) at ../../vm/vnode_pager.c:145
#9 0xc018ffc9 in vop_stdcreatevobject (ap=0xfd05bd64) at
../../kern/vfs_default.c:526
#10 0xc018fc35 in vop_defaultop (ap=0xfd05bd64) at
../../kern/vfs_default.c:150
#11 0xc01d7ef1 in ufs_vnoperate (ap=0xfd05bd64) at
../../ufs/ufs/ufs_vnops.c:2422
#12 0xc01943c2 in vfs_object_create (vp=0xff7fce00, p=0xfcee6d00,
cred=0xc74ba800) at vnode_if.h:1383
#13 0xc0190af1 in namei (ndp=0xfd05bec4) at ../../kern/vfs_lookup.c:171
#14 0xc0199c85 in vn_open (ndp=0xfd05bec4, fmode=5, cmode=2180) at
../../kern/vfs_vnops.c:138
#15 0xc0195c98 in open (p=0xfcee6d00, uap=0xfd05bf80) at
../../kern/vfs_syscalls.c:1028
#16 0xc0215195 in syscall2 (frame={tf_fs = 134545455, tf_es =
134545455, tf_ds = -1078001617, tf_edi = 134597120,
tf_esi = -1077937628, tf_ebp = -1077937532, tf_isp = -49954860,
tf_ebx = 672099916, tf_edx = 134597184, tf_ecx = 134557696,
tf_eax = 5, tf_trapno = 0, tf_err = 2, tf_eip = 672006988, tf_cs
= 31, tf_eflags = 663, tf_esp = -1077937960, tf_ss = 47})
at ../../i386/i386/trap.c:1175
#17 0xc0201e5b in Xint0x80_syscall ()
#18 0x280a1f60 in ?? ()
#19 0x280a1b46 in ?? ()
#20 0x80496ca in ?? ()
#21 0x804b6c8 in ?? ()
#22 0x8049377 in ?? ()
(kgdb) list *_vm_object_allocate+158
0xc01e17b6 is in _vm_object_allocate (../../vm/vm_object.c:187).
182 * Try to generate a number that will spread objects
out in the
183 * hash table. We 'wipe' new objects across the hash
in 128 page
184 * increments plus 1 more to offset it a little more by
the time
185 * it wraps around.
186 */
187 object->hash_rand = object_hash_rand - 129;
188
189 object->generation++;
190
191 TAILQ_INSERT_TAIL(&vm_object_list, object, object_list);
(kgdb) list *vm_object_allocate+241
0xc01e1991 is in vm_object_deallocate (../../vm/vm_object.c:325).
320
321 if (object->ref_count == 0) {
322 panic("vm_object_deallocate: object
deallocated too many times: %d", object->type);
323 } else if (object->ref_count > 2) {
324 object->ref_count--;
325 return;
326 }
327
328 /*
329 * Here on ref_count of one or two, which are
special cases for
(kgdb) list *vnode_pager_alloc+145
0xc01e751d is in vnode_pager_alloc (../../vm/vnode_pager.c:141).
136 }
137
138 if (vp->v_usecount == 0)
139 panic("vnode_pager_alloc: no vnode
reference");
140
141 if (object == NULL) {
142 /*
143 * And an object of the appropriate size
144 */
145 object = vm_object_allocate(OBJT_VNODE,
OFF_TO_IDX(round_page(size)));
(kgdb)
Cheers,
--
+- Christopher Smith, Systems Administrator
------------------------------+
| Server & Security Group, Information Technology Services
|
| The University of Queensland, Brisbane, Australia, 4072
|
+- Ph +61 7 3365 4046 | email csmith@its.uq.edu.au | Fax +61 7 3365
4065 -+