thr3ads.net - freebsd stable - ATA Woes. [Jul 2005]

If this information is useful, please help other people find it:
Share via:

Tony Byrne

2005-Jul-19 09:37 UTC

ATA Woes.

Folks,

I'm seeing something very unusual on one of our FreeBSD 5.4 Stable
boxes which I'm having a hard time getting to the bottom of.

You may recall that a few weeks ago I posted regarding a server that
was having trouble with WRITE_DMA and READ_DMA timeouts on it's SATA
disk. We finally decided to migrate to a new disk, so we purchased a
brand new Western Digital 250GB SATA drive and transferred the data
across, before removing the old drive.

We got about two days of trouble free access to this new disk before
it too started throwing READ_DMA problems.  This time they were error
40<UNCORRECTABLE>.  Running SmartCtl on the disk showed a number of
errors and there were specific files on the disk that could not be
read.  We moved the disk to a desktop box to confirm the problem and
noted that fsck couldn't fix the errors on the drive.

Assuming a dud drive, we purchased a replacement and this time we
spurned SATA in favour of a PATA drive (Western Digital 200GB). We
installed the drive yesterday using a brand new UDMA cable. Imagine my
surprise when I came in this morning to find that this new drive was
also now suffering from UNCORRECTABLE READ_DMA failures and SmartCtl
confirmed that the drive wasn't happy. What are the odds of getting
two dud disks from two separate batches of drives from, a reputable
brand?

The server itself is a 1U high rack mount installed in an AC'd machine
room. It is powered from a UPS. There is space around the drive and a
pair of fans draw air over the drive casing, to the casings are cool
to the touch. The motherboard is an Intel S875PWP3 equipped with an
Intel ICH5 chipset.

Is there any known problem with using WD SATA / PATA disks with
FreeBSD 5.4 Stable with the above mainboard? Is it possible that a FreeBSD
bug is causing problems with these drives, including the problems
reported by SmartCtl?

Regards,

Tony.

-- 
Tony Byrne

Tony Byrne

2005-Jul-19 12:21 UTC

head link

ATA Woes.

Hello Tony,

Tuesday, July 19, 2005, 10:37:40 AM, you wrote:

TB> Folks,

TB> I'm seeing something very unusual on one of our FreeBSD 5.4 Stable
TB> boxes which I'm having a hard time getting to the bottom of.

Further information from my server logs:

Jul 19 13:01:48 roo kernel: ad0: FAILURE - READ_DMA
status=51<READY,DSC,ERROR> error=40<UNCORRECTABLE> LBA=288810495
Jul 19 13:01:59 roo kernel: ad0: FAILURE - READ_DMA
status=51<READY,DSC,ERROR> error=1<ILLEGAL_LENGTH> LBA=288810495
Jul 19 13:02:05 roo kernel: ad0: FAILURE - READ_DMA
status=51<READY,DSC,ERROR> error=40<UNCORRECTABLE> LBA=288810495
Jul 19 13:02:16 roo kernel: ad0: FAILURE - READ_DMA
status=51<READY,DSC,ERROR> error=40<UNCORRECTABLE> LBA=288810495
Jul 19 13:04:36 roo last message repeated 4 times

With this disk it appears to be the same LBA each time. How can I
translate that LBA offset into something indicating the file affected?

I installed the *other* disk into a Windows box an ran the Western
Digital Drive Tools SMART test on it. It found some sectors needing
reallocation and successfully performed the reallocation. The tests
(both short and long) now pass, but the drive's SMART Status remains
at 'fail'. When I examine the attributes, the Raw Read Error Rate is
flagged.

I'm totally confused. I don't know enough about SMART to know whether
I'm looking at real failing drives or some bug exposed by the
interaction between drive firmware, hd controller and FreeBSD.

Regards,

Tony.

-- 
Tony Byrne

freebsd stable - Jul 2005 - ATA Woes.

ATA Woes.

ATA Woes.