I'm running a 5.3 system with a kernel built from the latest RELENG_5 sources as of 15 days ago which has just experienced another sporadic reboot in a long sequence of sporadic reboots which have occured since the system was originally installed. It was up some 30 days without trouble before I colocated that, and after that it was up approximately another 30 days before it began exhibiting problems. Since then it has crashed approximately every 15-25 days or so. I originally configured a dumpdev only to discover that the size of swap must exceed physical memory by 1MB in order for it to work, which it does not, unfortunately. So dumpdev won't work to collect crash data. What I'm really in need of is some way to determine the cause of reboots, and panic messages aren't being logged. I've heard of patches to the kernel which would provide ways to do this such as network console support (my colo provider doesn't provide serial consoles). Does anyone have any suggestions as to how I could obtain a log of the kernel panics and any other pertainent information I would need to debug the problem? Are there any patches to write debugging data to the swap partition when the system panics in lieu of a complete kernel core, such as the panic message or even better, a KTR dump? As is I have absolutely no information to go on in debugging this problem. Tony Arcieri
Tony Arcieri wrote:> I originally configured a dumpdev only to discover that the size of swap must > exceed physical memory by 1MB in order for it to work, which it does not, > unfortunately. So dumpdev won't work to collect crash data.You might set hw.physmem to a smaller value in loader.conf to fit within the amount of swapspace which is available: set hw.physmem=<value> MAXMEM (i386 only) Limits the amount of physical memory space available to the system to <value> bytes. <value> may have a k, M or G suffix to indicate kilobytes, megabytes and gigabytes respectively. Note that the current i386 architecture limits this value to 4GB.> What I'm really in need of is some way to determine the cause of reboots, > and panic messages aren't being logged. I've heard of patches to the > kernel which would provide ways to do this such as network console support > (my colo provider doesn't provide serial consoles).last and dmesg don't give you anything, hmm? That's unforunate, hmm, you might try leaving an ssh session logged in doing a tail -f on /var/log/messages and see whether you can get anything from that.