Hi all,
for some time I have been experiencing weird hangs on one of my servers.
When it happens, I can still ping it, but I cannot make any connections
or type anything on existing ssh connections. Serial console is also
dead, however I can enter the kernel debugger and call cpu_reset() to
reboot. Upon reboot all is fine again.
Now I discover that my dmsg output contains a few of these:
ahc0: PCI error Interrupt at seqaddr = 0x9
ahc0: Data Parity Error Detected during address or write data phase
and
ahc0: PCI error Interrupt at seqaddr = 0x8
ahc0: Data Parity Error Detected during address or write data phase
Thing is - there's nothing connected to the Adaptec. It is enabled, but
not used.
Can someone in-the-know tell me what exactly these errors mean, and what
they might indicate? Ofcourse this MIGHT be the cause of my problems,
but I don't know that for sure, and I'd like to know if there are any
other plausible explanations for these errors...
I will obviously disable the onboard adaptec at the earliest
convenience. This is an ASUS P2B-DS board, dual p3 with onboard u2w scsi.
Anyone?
Thanks,
/Eirik