Howdy. I apologize for vagueness of this report but I am not immediately sure
how to proceed with this or what details will be relevant.
I look after a community server which has been running just fine for the last
6 months or so with no disk issues at all. The box has an all in one Shuttle
motherboard which uses the VIA 8233 ATA133 disk controller according to what
I see from dmesg. The main drive is a 30 gig Maxtor and I very recently
added an 80 gig Seagate to the second IDE bus. There is also a CDROM sharing
the primary IDE bus with the main drive.
Up until quite recently the machine has had a single drive in it and been
performing flawlessly, no disk problems or anything of that nature (nothing
untoward in messages, not spontaneous reboots, etc.). I have performed a few
CVSups and make worlds over the last month to deal with security issues and
such and about two weeks ago I added the second Seagate drive to the system.
Shortly after adding the new drive (but also around the time of a CVSup), the
machine suddenly locked up. This was after about a day and a half of stable
operation, with some serious disk I/O (busy mailing lists, a cpdup from one
drive to the other, etc.). The lockup happened at a time where no obviously
heavy disk I/O was happening, just general usage.
I initially assumed that it was a loose cable and so opened the box up and
re-seated the cables. After this the machine worked fine again for maybe 12
hours but then encountered the same type of errors which start with stuff
along the lines of:
Sep 26 05:59:35 www /kernel: ad0: READ command timeout tag=0 serv=0 -
resetting
Sep 26 05:59:35 www /kernel: ata0: resetting devices .. ad0: DMA limited to
UDMA33, non-ATA66 cable or device
After a few of these the machine stops being able to access its drives and
eventually either reboots or hangs.
After searching for a bit I saw that disabling DMA access altogether might
help the situation out so I have done that at startup which now gives me the
following initial complaints upon boot but then no further obvious problems
other than sub-standard performance (which I do need to fix):
Sep 27 13:02:19 www /kernel: ad0: READ command timeout tag=0 serv=0 -
resetting
Sep 27 13:02:20 www /kernel: ata0: resetting devices .. done
Sep 27 13:02:51 www /kernel: ad0: WRITE command timeout tag=0 serv=0 -
resetting
Sep 27 13:02:51 www /kernel: ata0: resetting devices .. done
I am currently running the FreeBSD 4.9-PRERELEASE that I CVSup'd earlier in
the day on the 27th.
Now I have not had any disk issues that I am aware of since going to PIO mode
and the machine worked awesome up until around the addition of the second
drive / OS update. I am wondering if there are any known issues that could
be affecting DMA access with recent kernels or whether anyone has any other
suggestions on what might be going wrong here?
The fact that the machine runs fine without DMA, ran awesome up until recently
and can perform intensive disk I/O for a period with no problems makes me
think that this isn't a hardware problem but I am open to any suggestions
that might help to nail down the culprit.
If you need more specifics to offer suggestions please let me know.
Cheers!
--
---> (culture) http://industrial.org : (label) http://deterrent.net
---> (community) http://ampfea.org : (hire me) http://codegrunt.com
---> (send EEEI news to) infosuck@industrial.org
---> Whomever dies with the most URLs wins!!!!!!!!!!!!!