Hi there,
I have random, non-reproducable panics every so often (roughly averaging once
a week) with my Intel server running FreeBSD-
5-stable and I am running out of options trying to track down the issue.
Error details:
Definitely not consistent, generally something like:
"panic: vm_fault fault on naofault entry, addr:d6f68000"
Turned to debugging kernel, unattended reboot and crash dumping via swap,
however this doesn't work at all. Kernel options added to GENERIC:
"makeoptions DEBUG=-g"
"options KDB, GDB, DDB, KDB_UNATTENDED"
dumpdev specified correctly in rc.conf (swap parition > mem) and rights set
correctly and existing /var/crash.
Recently cvsed /usr/src, no optimizations in /etc/make.conf and world/kernel
made in accordance with the handbook.
Further system details:
uname -a:
"FreeBSD thesaloon.flipse.org 5.3-STABLE FreeBSD 5.3-STABLE #1: Wed Feb 9
01:36:58 CET 2005
root@thesaloon.flipse.org:/usr/src/sys/i386/compile/GENERIC-DEBUG i386"
dmesg cpu/mem:
CPU: Intel Pentium III (866.33-MHz 686-class CPU)
real memory = 469696512 (447 MB)
avail memory = 454152192 (433 MB)
ide hardware:
atapci0: <VIA 82C686A UDMA66 controller> port
0x9000-0x900f,0x376,0x170-0x177,0x3f6,0x1f0-0x1f7 at device 7.1 on pci0
ata0: channel #0 on atapci0
ata1: channel #1 on atapci0
atapci1: <Promise PDC20269 UDMA133 controller> port
0xc000-0xc00f,0xbc00-0xbc03,0xb800-0xb807,0xb400-0xb403,0xb000-0xb007 mem
0xe9000000-0xe9003fff irq 11 at device 12.0 on pci0
ata2: channel #0 on atapci1
ata3: channel #1 on atapci1
ad0: 194481MB <Maxtor 6B200P0/BAH41B10> [395136/16/63] at ata0-master
UDMA66
ad2: 156334MB <Maxtor 6Y160P0/YAR41BW0> [317632/16/63] at ata1-master
UDMA66
ad4: 156334MB <Maxtor 6Y160P0/YAR41BW0> [317632/16/63] at ata2-master
UDMA133
ad6: 156334MB <Maxtor 6Y160P0/YAR41BW0> [317632/16/63] at ata3-master
UDMA133
network hardware:
rl0: <RealTek 8139 10/100BaseTX> port 0xac00-0xacff mem
0xe9004000-0xe90040ff
irq 10 at device 10.0 on pci0
miibus0: <MII bus> on rl0
rlphy0: <RealTek internal media interface> on miibus0
rlphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
rl0: Ethernet address: 00:02:44:35:6f:2e
Mounting root from ufs:/dev/ad0s1a
kldstat:
Id Refs Address Size Name
1 3 0xc0400000 3b05c8 kernel
2 1 0xc189c000 dd000 vinum.ko
3 1 0xc1a0a000 7000 nullfs.ko
vinum dumpconfig:
Drive vinumdrive0: Device /dev/ad0s1e
Created on TestHost.TestDomain at Mon Nov 15 20:24:19 2004
Config last updated Mon Feb 21 23:51:52 2005
Size: 163921572864 bytes (156327 MB)
volume vinum0 state up
plex name vinum0.p0 state up org raid5 479s vol vinum0
sd name vinum0.p0.s0 drive vinumdrive0 len 320158810s driveoffset 265s state
up plex vinum0.p0 plexoffset 0s
sd name vinum0.p0.s1 drive vinumdrive1 len 320158810s driveoffset 265s state
up plex vinum0.p0 plexoffset 479s
sd name vinum0.p0.s2 drive vinumdrive2 len 320158810s driveoffset 265s state
up plex vinum0.p0 plexoffset 958s
sd name vinum0.p0.s3 drive vinumdrive3 len 320158810s driveoffset 265s state
up plex vinum0.p0 plexoffset 1437s
Drive /dev/ad0s1e: 152 GB (163921572864 bytes)
Drive vinumdrive1: Device /dev/ad2s1e
(..)
Size: 163921572864 bytes (156327 MB)
volume vinum0 state up
(..)
Drive /dev/ad2s1e: 152 GB (163921572864 bytes)
Drive vinumdrive2: Device /dev/ad4s1e
(..)
Size: 163921572864 bytes (156327 MB)
volume vinum0 state up
(..)
Drive /dev/ad4s1e: 152 GB (163921572864 bytes)
Drive vinumdrive3: Device /dev/ad6s1e
(..)
Size: 163921572864 bytes (156327 MB)
volume vinum0 state up
(..)
Drive /dev/ad6s1e: 152 GB (163921572864 bytes)
disklabel /dev/ad0s1:
# /dev/ad0s1:
8 partitions:
# size offset fstype [fsize bsize bps/cpg]
a: 77075519 0 4.2BSD 2048 16384 28552
b: 1048576 77075519 swap
c: 398283417 0 unused 0 0 # "raw" part,
don't edit
e: 320159322 78124095 vinum
Other disklabels (ad2s1, ad4s1, ad6s1) only parition e: available labelled
"vinum" and with the same size. Checkparity option of vinum returns no
errors
and fsck /dev/vinum/vinum0 returns no errors.
I have carried out extensive hardware tests and most of the hardware
(including most of the memory, the cpu and main board) have already been
replaced but this makes no difference to stability. I also believe this issue
was already there when running 4-STABLE. I have tried out the new geom feature
and gvinum but crashes where more frequent and panic dumps were specifically
mentioning vinum, so I turned back to non-geom vinum (which also has a
significantly higher performance).
I am sure I am forgetting to provide some required details but please ask. Has
anybody got a clue whether this is a kernel bug, hardware and/or driver bug,
vinum bug or configuration issue?
Thanks for looking into this!
Daniel