On Wed, 2005-07-20 at 23:54 -0500, Steve wrote:> I've found tons of emails, news messages, listserv messages, and even
> some bug reports of this seemingly common error.
>
> So, I had been running 5.2 on a server, and, updated to 5.3. Got the
> READ_DMA and WRITE_DMA error and retries. So, figuring it might be a bad
> update, took a new drive. put it in, loaded 5.4 for grins, and, same
> issue, lots of these errors, eventually destroying the FS. Played around
> with various settings, no avail. So, took it back, got different box,
> everything new. Same problem, new install of 5.4
>
> So, took it back, got another with another MB (different model), but,
> same maker (ASUS). Didn't have endless time to spend on production
> machine. Sure enough, same problem. It's an ASUS A7V880. Controller is
> SATA VT8237. Played around with tons of settings, eventually, after
> reading various messages out there, discovered one that resolved the
> problem. Had to set hw.ata.ata_dma="0". Of course, there is the
obvious
> downside to that! Speed!
>
> But it stinks to have "decent" hardware, yet, have to cripple the
> machine. The place I got the equipment at runs ASUS only and has
> thousands of them running under other OSes. Wished I had stayed with the
> old FreeBSD version and old hardware now. I have not seen anyone that
> has ever said the problem was being (or had been) solved though. I see
> the bug reports, I take it no one has actually pinpointed the problem
> though. BUT, I do hope it is understood that this is fairly widespread,
> for me, the likelihood of 3 pcs, 2 different MB models, and, *complete*
> new hardware for each of the 3 pcs kind of rules out hardware being
> broken, might be badly designed, but, certainly not defective hardware.
>
> I do hope someone can eventually figure this out, seems to be extremely
> common, and, definitely a problem for a stable release named 5.4.
I was one of the people who suffered from and reported this "seemingly
common error." On the systems that encountered problems, none had
particularly obscure or cutting-edge hardware (e.g., Intel PIIX4 ATA
controller on the motherboard). One common thread in my case is that
all ran some kind of software RAID (gvinum or gmirror), though not all
of my software RAIDed machines exhibited the DMA problems leading me to
think perhaps it was a hardware/load/disk combination problem. Quite
obviously, not all PIIX4 controller users were having this happen, and
so the "it doesn't happen to me" factor might have contributed to
the
general notion that this was probably "operator error" or something
like
that, and dismissed.
Anyway, as well as 5-STABLE, I also run a 6-CURRENT system that suffered
the problem. Happily, after the ATA Mk.III merge, the situation
improved a LOT. I occasionally still get the error reported, but it is
not fatal, unlike before (where the drive would be detached, breaking my
geom_mirror, necessitating a lengthy background rebuild). So, I
consider the ATA Mk. III rewrite to have "fixed" the problem I had.
It
may be, then, that those upgrading to the upcoming 6.0-RELEASE (when it
appears) might also find their ATA DMA problems solved, too.
As for 5.x, I track -STABLE, and have noticed slight improvements
regarding the DMA TIMEOUT problem. If you only run -RELEASE, you might
miss these ongoing improvements that crop up from time to time.
Cheers,
Paul.
--
e-mail: paul@gromit.dlib.vt.edu
"Without music to decorate it, time is just a bunch of boring production
deadlines or dates by which bills must be paid."
--- Frank Vincent Zappa