On Mon, Oct 13, 2003 at 04:19:58PM +0200, Hani Mouneimne
wrote:> Hey all,
>
> I was wondering if you could help with this issue.
>
> Eeverytime I run a make/compile on my freebsd 4.8 p10 systrem it has a
> complete spaz and reboots. Usually cores and someimes gives no messages at
> all in the logfiles.
> Here is the latest output of a makeworld I am doing
> ="sh /usr/src/tools/install.sh"
>
PATH=/usr/obj/usr/src/i386/usr/sbin:/usr/obj/usr/src/i386/usr/bin:/usr/obj/usr/src/i386/usr/games:/sbin:/bin:/usr/sbin:/usr/bin
> make -f Makefile.inc1 par-depend
> *** Signal 11
> *** Signal 11
> Killed
I assume you've read the FAQ entry on Sig11:
http://www.freebsd.org/doc/en_US.ISO8859-1/books/faq/troubleshoot.html#SIGNAL11
Signal 11, especially if it occurs in an unpredictable place during
compiles or other heavy weight operations, is a clear sign of hardware
problems, but I think you know that from what you say next.
> This is just one of many crashes of similar scale, Sefaulting is also
> common.
> I have changed the entire server hardware including the hard drive and it
is
> still doing this. It was fine with FreeBSD p0 so I am wondering it it could
> be some code issue.
Tricky. Are you sure you've swapped out *all* of the hardware? SEGVs
are typically due to memory or CPUs going bad, but there are several
other considerations.
- memory can be marginal: tests like running memtest86 won't
necessarily pick up all failure cases, although when they do
find a problem they are generally right.
If the memory timing isn't quite in spec, or if there's a
problem that only occurs when the memory stick heats up due to
high activity then you may not pick it up except under load.
- SEGVs can also occur due to bad memory in such devices as RAID
or graphics controllers, or even in the CPU cache.
- Overheating will generally cause stressed components to fail in
this sort of way. Such failures will definitely be correlated
with high system activity. CPUs generally do have thermal
cutouts that just halt the machine, but thermal problems in
other components can crash the system as you've seen.
Northbridge and Southbridge chipsets on the motherboard can be
an Achilles' heel in this respect.
Check that all of the fans are working correctly, and that all
of the ventilation holes/dust filters are clear and that there
is sufficient room around the machine to permit free flow of
air. If you've added extra components inside the system is the
cooling airflow still adequate?
- PSUs are also capable of causing such symptoms, especially if
they aren't actually quite powerful enough to drive all your
hardware. If the system voltages aren't properly stable then
all sorts of undefined behaviour can occur. Modern 1GHz+ boxes
generally need a 300W PSU, and the PSU tends to be both one of
the least reliable parts of the system and one of the items
where box manufacturers will be most agressive on price when
sourcing components.
- Even the machine *case* can cause this sort of problem. I've
seen a machine where all of the electronics, PSU, fans etc. were
swapped out, but the machine still keeled over when the case was
screwed back together. Turned out that the case itself was a
bit distorted, and screwing the case on resulted in bending the
motherboard in a way that was clearly not good for it,
especially when it warmed up a bit as well. Changing out the
case produced a working system...
Cheers,
Matthew
--
Dr Matthew J Seaman MA, D.Phil. 26 The Paddocks
Savill Way
PGP: http://www.infracaninophile.co.uk/pgpkey Marlow
Tel: +44 1628 476614 Bucks., SL7 1TH UK
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 187 bytes
Desc: not available
Url :
http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20031013/f75d3c38/attachment.bin