Guy Helmer
2005-Apr-25 12:42 UTC
FreeBSD 5.3p6-5.4RC3, Supermicro X6DHR-8G, Dual 3.6GHz Xeons,Adaptec aic7902 SCSI interface doesn't work in UP kernel
I have SuperMicro X6DHR-8G machines with dual 3.6GHz Xeons and a Seagate 73G drive attached to the on-board aic7902 SCSI channel. A uniprocessor FreeBSD 5.3p6 or 5.4RC3 kernel booted on this machine stops at the SCSI bus probe for a minute or so, and then occasionally spews messages "SIMOE0[0xc]: (ENOVERRUN|ENIOERR)" followed by more info and finally "Issued Channel A Bus Reset, <n> SCBs aborted". It never seems to get past this point. I've tried both BIOS revisions 1.2 and 1.2a with no effect on the errors. Disabling ACPI seemed to have no effect on the errors. Disabling the APIC allowed it to boot successfully. Booting an SMP kernel allows it to boot successfully (this is the obvious workaround, but the problem was first discovered when trying to boot from a GENERIC UP kernel on CD-ROM; booting 5.4-RC3 Disc 1 hangs the same way unless I boot in "Safe" mode). dmesg (from a successful boot), mptable, and "acpidump -t -d" output are available if anyone cares to look: http://www.palisadesys.com/~ghelmer/X6DHR-8G-dmesg.txt http://www.palisadesys.com/~ghelmer/X6DHR-8G-mptable.txt http://www.palisadesys.com/~ghelmer/X6DHR-8G.asl If there is anything else I can provide that will help diagnose the problem, please let me know. Guy -- Guy Helmer, Ph.D. Principal System Architect Palisade Systems, Inc.
Ade Lovett
2005-Apr-25 20:06 UTC
FreeBSD 5.3p6-5.4RC3, Supermicro X6DHR-8G, Dual 3.6GHzXeons,Adaptec aic7902 SCSI interface doesn't work in UP kernel
Guy Helmer wrote: Reducing the problem to its relevant parts:> SuperMicro [...] Seagate [...] > on-board aic7902and, from your dmesg output: da0: <SEAGATE ST373207LC 0003> Fixed Direct Access SCSI-3 device da0: 320.000MB/s transfers [...] Supermicro boards and Seagates generally hate each other. Supermicro blames Seagate, Seagate blames Supermicro. Even under normal operation, you'll see spurious SCSI errors, both at bootup, and under load, exacerbated if you put more than one Seagate drive on the same chain, or run them at their native U320 speeds. To make matters worse, your ST373207LC model is, even by Seagate standards, a piece of unmitigated crap. At an absolute minimum, have the existing drive swapped out for an ST373453LC model, making VERY certain that the firmware is rev 0006 -- prior revisions WILL corrupt your data, set fire to your cat, and otherwise ruin your entire life -- and that's before they've actually spun up. A second option is to change the drive out for one from a vendor that actually cares -- ok, maybe only *just* cares, but cares nonetheless. Hitachi drives work fine, and certainly seem to be in the same ballpark for overall reliability. Likewise, swapping out the motherboard for a non-Supermicro unit may be an option, though be wary of anything with onboard Broadcom gigabit ethernet if you plan on doing continuous high network I/O -- Seagate drives *appear* to have considerably fewer problems when connected to non-Adaptec hardware in general, and the onboard Supermicro variant in particular. If you're stuck with the hardware you have (modulo this particular 73GB drive model, as mentioned above), pick up another SCSI controller and use that, not forgetting to disable the onboard controllers. At an absolute pinch, head in to the adaptec bios and lock down the drive to U160 speeds -- that *may* help in a few edge cases. Someone (I forget who, sorry) recently suggested a considerable portition of the Supermicro vs Seagate mess was down to a weird interaction between the bios and the occasional SMM interrupt -- a hackaround to run at boot was even suggested -- check the archives for details, though unfortunately it did not appear to fix the issues I was seeing on boxes here. You may have better luck. -aDe