And it doesn't dump its core to its dump swap space, too, so I can't run
savecore after reboot to get debugging info. I have the swap space in
fstab commented out so it won't come up at boot to be able to manually
harvest the core, as it gives "savecore: no dumps found." (it
doesn't
happen automatically, either).
We recently thought we'd give 5.3 a go in production, and it has been
too unstable. When it crashes, it doesn't reboot, so it just hangs
there until someone has to drive in and push the button. Who knows,
maybe Linux would be more stable at this point. Sigh.
Hardware that it is running on is a Tyan s2875 with dual amd64/246
processors, and 2 GB Registered DDR RAM (Corsair). We're also running
vinum for all of the filesystems, mirroring them all, including the root
filesystem. The vinum is using two SATA WD Raptors. I have one older
IDE drive plugged in to capture the kernel dumps.
We've tried many different memory configurations to see if we can tune
it so that FreeBSD can handle it (DRAM ECC vs master ECC, bank & node
interleaving turned off/on, slowing the memory down, DRAM Scrub Redirect
off/on, etc, to no avail.
It's usually pagedaemon that croaks, but it crashes on the keyboard irq
process and serial IO irq process for some reason also. I guess since
it's usually the pager that dies, that's the reason why I can't get
kernel dumps. Here are some (manually copied) panics from the console.
Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address = 0x88
fault code = supervisor read, page not present
instruction pointer = 0x8:0xffffffff80389aea
stack pointer = 0x10:0xffffffffb2051a60
frame pointer = 0x10:0xffffff006b12d000
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 53 (pagedaemon)
trap number = 12
panic: page fault
cpuid = 0
boot() called on cpu#0
Uptime: 10h18m49s
...
Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address = 0x88
fault code = supervisor read, page not present
instruction pointer = 0x8:0xffffffff8038a10a
frame pointer = 0x10:0xffffffffb2051ab0
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 53 (pagedaemon)
trap number = 12
panic: page fault
cpuid = 0
boot() called on cpu#0
Uptime: 15h59m55s
...
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = resumek IOPL = 0
current process = 36 (swi5: clock sio)
trap number = 12
panic: page fault
cpuid = 1
kernel trap 12 with interrupts disabled
Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 01
fault virtual address = 0x48
fault code = supervisor read, page not present
instruction pointer = 0x8: 0xffffffff803a40d3
stack pointer = 0x10: 0xffffffffb1d63650
frame pointer = 0x10: 0xffffff007b7f3a40
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0,pres 1, long 1, def32 0, gran 1
processor eflags = resume, IOPL = 0
current process = 30
trap number = 12
panic: page fault
cpuid = 1
spin lock sched lock held by 0xffffff007b8177b0 for > 5 seconds
...
What can I do to debug this more if I can't harvest the kernel dumps to
report a bug? Is there anything the FreeBSD team can do? Do I need to
resort to Linux for dual amd64 support for now? <cringe>
Thanks,
../troy
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 3306 bytes
Desc: not available
Url :
http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20041227/52b9b186/smime.bin