I recently brought up an old 700MHz P3 4.9 machine and added a 3ware IDE raid card 7506-8 with 4 120Gb drives in a raid 5 array. It will randomly reboot, about once-per-day and even though I'm running a debug kernel, it does not leave any crash information (which I assume just means that the kernel did not panic and dump core). After the first few times, I upgraded to a new 400W power supply. The machine is not heavily loaded and its primary function is for NFS/samba sharing. Any ideas for trying to figure out what is wrong with this machine? Thanks. -Doug
In my experience, this would be bad hardware. If you have the machine setup to save crash dumps (you have something like dumpdev="/dev/twed0s1b" in rc.conf right?) and it never saves one I would lean towards hardware as the cause. If you can hookup a serial console to it to log any panic's to the screen and if its always in a different location or you never see a panic message, this would further point to hardware. ---Mike At 11:59 AM 01/03/2004, Doug Silver wrote:>I recently brought up an old 700MHz P3 4.9 machine and added a 3ware IDE raid >card 7506-8 with 4 120Gb drives in a raid 5 array. It will randomly reboot, >about once-per-day and even though I'm running a debug kernel, it does not >leave any crash information (which I assume just means that the kernel did >not panic and dump core). After the first few times, I upgraded to a new >400W power supply. The machine is not heavily loaded and its primary >function is for NFS/samba sharing. > >Any ideas for trying to figure out what is wrong with this machine? > >Thanks. > >-Doug > >_______________________________________________ >freebsd-stable@freebsd.org mailing list >http://lists.freebsd.org/mailman/listinfo/freebsd-stable >To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"
On Mon, 1 Mar 2004 08:59:16 -0800 Doug Silver <dsilver@urchin.com> wrote:> I recently brought up an old 700MHz P3 4.9 machine and added a 3ware IDE raid > card 7506-8 with 4 120Gb drives in a raid 5 array.Maybe you have WD drives, there is known issue with timing out these drives in RAID. Check: http://support.wdc.com/download/index.asp#raid3ware -- Tomasz Szymczak _.-. jgs GCS/M d- s-:- a-- C+++$ UL++$>++++$ P++ L++>++++ E W++ N+ w O+ '( ^{_} ( PS+ PE !Y PGP+ t- X- R tv-- b++>+++ DI D>+++ G e>++++ h! r- y? `~\`-----'\ Umys? sprawia, ?e wszystko si? zmienia, ?eby upozorowa? up?yw czasu )_)---)_)
> In my experience, this would be bad hardware. If you have the machine setupThat was my experience with 5.1-CURRENT and a SM P6DGH where I had to change the interrupt used by the Ethernet card (Intel Pro100+) away from being shared with the Radeon AIW and SCSI controllers where the BIOS put it. Similar crashes were ocurring with this hardware under MS Windows, both on my system and another one reported on the SM newsgroup, so this was not a FreeBSD-only problem. Mike Squires
Doug Silver writes:>I recently brought up an old 700MHz P3 4.9 machine and added a 3ware IDE raid >card 7506-8 with 4 120Gb drives in a raid 5 array. It will randomly reboot, >about once-per-day and even though I'm running a debug kernel, it does not >leave any crash information (which I assume just means that the kernel did >not panic and dump core). After the first few times, I upgraded to a new >400W power supply. The machine is not heavily loaded and its primary >function is for NFS/samba sharing.This is an old thread (I'm playing catch-up with this list), but I just had a similar problem. We had a new, 2U, P4 2.66GHz machine, all-SCSI, with an Adaptec 2100s RAID device (using the asr driver) doing RAID 5 with four drives, plus one spare, and a gigabyte of RAM. This was our new mail server, and got about 150k to 200k connections a day. We run MIME Defang with Clamav, and a lot of our users use Spamassassin. So it was used pretty thoroughly, though it rarely hit a load greater than .6, and swap was nearly un-utilized. The first four days it ran, it was fine. Then it spontaneously rebooted one morning. Two days later, it did the same thing. Within a week, it would barely stay up a full 24 hours (we were going through a lot of troubleshooting during this time, BTW, not just standing around, picking our noses). We ended up taking the whole thing down and reinstating our old, barely-sufficient system, while we tested the box. I could go through a list of everything we tested, but won't bother racking my memory, unless someone really wants to hear it. We ended up replacing nearly every piece of hardware but the case - NIC, M/B, RAID card, RAM - but nothing worked. I was always pretty sure it was hardware-related, as we could never capture a panic, and by the time it got really bad (the day we replaced the M/B), I could watch it reboot almost as soon as it finished booting. In the end, the culprit was exactly what I suspected from the beginning, but was assured it couldn't be - the riser card in the 2U case. We don't have anything but circumstantial evidence pointing to that, but it's pretty sure. If we took the riser card out of the case, and plugged everything directly into the M/B (which required leaving the top of the case, of course), we could bludgeon the system with SMTP connections while running a disk I/O benchmarker and FTPing large amounts of data in variously-sized files back and forth. If we put the card back in, it'd reboot in about three hours. We switched to a 4U case, upgraded the system the Friday before last, and haven't had a problem, yet (fingers crossed, knocking on wood, etc.). So I guess all this was just to say "beware the riser card." -Todd