Hi, I have been having a fairly serious problem with an HP Netserver LH3 that seems to hang for no apparent reason. This happens when the machine is under load and it also happens when it is simply idle. When this "hang" occurs I can no longer SSH to the box. I can ping it and port scan it and it appears to be "alive" but it won't load web pages or process mail etc. When I physically access the box there are NO error messages, warnings or anything on the monitor and when I attempt to log on, I enter any username or root and hit enter and it just sits there forever. I try Ctrl + ALT + F2 and I have the same problem on another tty. When I power off and then log on, I am unable to find any evidence of what is causing this problem in any system logs. I am wondering if is because I uncomment the 2 lines needed for SMP support, and rebuilt 4.8-STABLE FreeBSD and boot to that kernel. I do this because I would like to use both CPUs. I suppose I could do that and then run it and see if that is in fact the cause but that would not really get me any close to using both CPUs which is of course my ultimate goal. I am about out of ideas here, what should I do first? Thanks! Rick Up P.S. Some hopefully useful info can be found here: dmesg @ http://12.246.251.12/dmesg.txt SMP kernel config file @ http://12.246.251.12/SMP.txt var/log/messages @ http://12.246.251.12/messages.txt I am running qmail + vpopmail + procmail under a very light load 1000 messages per 24 hours right now and apache with almost no load at all, just me reading documentation etc.
Sean wrote: >Go into the BIOS, change IRQ routing from "Smart" to "Static" (or whatever >the opposite of Smart is); I had problems with LH3s losing the interrupts >for the onboard LSI RAID card, and locking up in the driver until I did >that. The Intel cards we added also lost interrupts occasionally, though >they were more graceful about it. I went into the BIOS and selected: Configuration -> PCI Slot Devices -> PCI IRQ Locking -> Routing Algorithm [Smart] Ok I changed Routing Algorithm [Smart] to [Fixed] and got a scary warning about data loss etc. but I hit Yes and saved as prompted and rebooted. >We've now got several LH3s, an LH4 and an LH6000 all running under >load. Ok hopefully I will have this LH3 "fixed" as well : ) Thanks for the help. P.S. I have been looking at mailing list archives where the following was mentioned... Is this going to be a problem? Could this be causing my current problem? < snip from dmesg @ http://12.246.251.12/dmesg.txt > APIC_IO: Testing 8254 interrupt delivery APIC_IO: Broken MP table detected: 8254 is not connected to IOAPIC #0 intpin 2 APIC_IO: routing 8254 via 8259 and IOAPIC #0 intpin 0
Eric Parusel wrote: >Hello, > >I just noticed this thread. Months ago I tried one of our LH3's with >SMP, and had the same behaviour.. > >Did the change you made in the BIOS fix all the problems you were having? YES! So far so good. : ) It has not locked up once yet since making that one and only change to eh BIOS. I plan to post all the details to http://prioris.mini.pw.edu.pl/~gregory/FreeBSD by the end of next week (as soon as I can)