thr3ads.net - freebsd stable - Unexplained reboots with 4.9 [Mar 2004]

If this information is useful, please help other people find it:
Share via:

Doug Silver

2004-Mar-01 08:59 UTC

Unexplained reboots with 4.9

I recently brought up an old 700MHz P3 4.9 machine and added a 3ware IDE raid 
card 7506-8 with 4 120Gb drives in a raid 5 array.  It will randomly reboot, 
about once-per-day and even though I'm running a debug kernel, it does not 
leave any crash information (which I assume just means that the kernel did 
not panic and dump core).  After the first few times, I upgraded to a new 
400W power supply.  The machine is not heavily loaded and its primary 
function is for NFS/samba sharing.

Any ideas for trying to figure out what is wrong with this machine?  

Thanks.

-Doug

Mike Tancsa

2004-Mar-01 11:20 UTC

head link

Unexplained reboots with 4.9

In my experience, this would be bad hardware. If you have the machine setup 
to save crash dumps (you have something like dumpdev="/dev/twed0s1b"
in
rc.conf right?) and it never saves one I would lean towards hardware as the 
cause.  If you can hookup a serial console to it to log any panic's to the 
screen and if its always in a different location or you never see a panic 
message, this would further point to hardware.

         ---Mike

At 11:59 AM 01/03/2004, Doug Silver wrote:>I recently brought up an old 700MHz P3 4.9 machine and added a 3ware IDE
raid
>card 7506-8 with 4 120Gb drives in a raid 5 array.  It will randomly reboot,
>about once-per-day and even though I'm running a debug kernel, it does
not
>leave any crash information (which I assume just means that the kernel did
>not panic and dump core).  After the first few times, I upgraded to a new
>400W power supply.  The machine is not heavily loaded and its primary
>function is for NFS/samba sharing.
>
>Any ideas for trying to figure out what is wrong with this machine?
>
>Thanks.
>
>-Doug
>
>_______________________________________________
>freebsd-stable@freebsd.org mailing list
>http://lists.freebsd.org/mailman/listinfo/freebsd-stable
>To unsubscribe, send any mail to
"freebsd-stable-unsubscribe@freebsd.org"

Tomasz Szymczak

2004-Mar-01 11:58 UTC

head link

Unexplained reboots with 4.9

On Mon, 1 Mar 2004 08:59:16 -0800
Doug Silver <dsilver@urchin.com> wrote:
> I recently brought up an old 700MHz P3 4.9 machine and added a 3ware IDE
raid
> card 7506-8 with 4 120Gb drives in a raid 5 array.
Maybe you have WD drives, there is known issue with timing out  these drives in
RAID. Check:

http://support.wdc.com/download/index.asp#raid3ware

-- 
Tomasz Szymczak                                                    _.-.  jgs
GCS/M d- s-:- a-- C+++$ UL++$>++++$ P++ L++>++++ E W++ N+ w O+   '(
^{_}    (
PS+ PE !Y PGP+ t- X- R tv-- b++>+++ DI D>+++ G e>++++ h! r- y?    
`~\`-----'\
Umys? sprawia, ?e wszystko si? zmienia, ?eby upozorowa? up?yw czasu   )_)---)_)

Michael L. Squires

2004-Mar-01 18:21 UTC

head link

Unexplained reboots with 4.9

> In my experience, this would be bad hardware. If you have the machine setup
That was my experience with 5.1-CURRENT and a SM P6DGH where I had to change
the interrupt used by the Ethernet card (Intel Pro100+) away from being
shared with the Radeon AIW and SCSI controllers where the BIOS put it.

Similar crashes were ocurring with this hardware under MS Windows, both
on my system and another one reported on the SM newsgroup, so this was
not a FreeBSD-only problem.

Mike Squires

Todd Meister

2004-Mar-09 12:59 UTC

head link

Unexplained reboots with 4.9

Doug Silver writes:>I recently brought up an old 700MHz P3 4.9 machine and added a 3ware IDE
raid
>card 7506-8 with 4 120Gb drives in a raid 5 array.  It will randomly reboot,
>about once-per-day and even though I'm running a debug kernel, it does
not
>leave any crash information (which I assume just means that the kernel did 
>not panic and dump core).  After the first few times, I upgraded to a new 
>400W power supply.  The machine is not heavily loaded and its primary 
>function is for NFS/samba sharing.
This is an old thread (I'm playing catch-up with this list), but I just had 
a similar problem.  We had a new, 2U, P4 2.66GHz machine, all-SCSI, with an 
Adaptec 2100s RAID device (using the asr driver) doing RAID 5 with four 
drives, plus one spare, and a gigabyte of RAM.  This was our new mail server, 
and got about 150k to 200k connections a day.  We run MIME Defang with 
Clamav, and a lot of our users use Spamassassin.  So it was used pretty 
thoroughly, though it rarely hit a load greater than .6, and swap was nearly 
un-utilized.

The first four days it ran, it was fine.  Then it spontaneously rebooted 
one morning.  Two days later, it did the same thing.  Within a week, it 
would barely stay up a full 24 hours (we were going through a lot of 
troubleshooting during this time, BTW, not just standing around, picking 
our noses).  We ended up taking the whole thing down and reinstating our 
old, barely-sufficient system, while we tested the box.

I could go through a list of everything we tested, but won't bother racking 
my memory, unless someone really wants to hear it.  We ended up replacing 
nearly every piece of hardware but the case - NIC, M/B, RAID card, RAM - 
but nothing worked.  I was always pretty sure it was hardware-related, as 
we could never capture a panic, and by the time it got really bad (the 
day we replaced the M/B), I could watch it reboot almost as soon as it 
finished booting.

In the end, the culprit was exactly what I suspected from the beginning, but 
was assured it couldn't be - the riser card in the 2U case.  We don't
have
anything but circumstantial evidence pointing to that, but it's pretty sure.
If we took the riser card out of the case, and plugged everything directly 
into the M/B (which required leaving the top of the case, of course), we could 
bludgeon the system with SMTP connections while running a disk I/O benchmarker 
and FTPing large amounts of data in variously-sized files back and forth.  If 
we put the card back in, it'd reboot in about three hours.

We switched to a 4U case, upgraded the system the Friday before last, and 
haven't had a problem, yet (fingers crossed, knocking on wood, etc.).

So I guess all this was just to say "beware the riser card."

-Todd

freebsd stable - Mar 2004 - Unexplained reboots with 4.9

Unexplained reboots with 4.9

Unexplained reboots with 4.9

Unexplained reboots with 4.9

Unexplained reboots with 4.9

Unexplained reboots with 4.9