thr3ads.net - zfs discuss - [zfs-discuss] Gen-ATA read sector errors [Jul 2011]

If this information is useful, please help other people find it:
Share via:

Koopmann, Jan-Peter

2011-Jul-28 11:55 UTC

[zfs-discuss] Gen-ATA read sector errors

Hi,

my system is running oi148 on a super micro X8SIL-F board. I have two pools (2
disc mirror, 4 disc RAIDZ) with RAID level SATA drives. (Hitachi HUA72205 and
SAMSUNG HE103UJ).  The system runs as expected however every few days (sometimes
weeks) the system comes to a halt due to these errors:

Dec  3 13:51:20 nasjpk gda: [ID 107833 kern.warning] WARNING: /pci at
0,0/pci-ide at 1f,2/ide at 1/cmdk at 0,0 (Disk1):
Dec  3 13:51:20 nasjpk  Error for commandX \''read sector\''
Error Level: Fatal
Dec  3 13:51:20 nasjpk gda: [ID 107833 kern.notice]     Requested Block 5503936,
Error Block: 5503936
Dec  3 13:51:20 nasjpk gda: [ID 107833 kern.notice]     Sense Key: uncorrectable
data error
Dec  3 13:51:20 nasjpk gda: [ID 107833 kern.notice]     Vendor
\''Gen-ATA \'' error code: XX7

It is not related to this one disk. It happens on all disks. Sometimes several
are listed before the system "crashes", sometimes just one. I cannot
pinpoint it to a single defect disk though (and already have replaced the
disks). I suspect that this is an error with the SATA controller or the driver.
Can someone give me a hint on whether or not that assumption sounds feasible? I
am planning on getting a new "cheap" 6-8 way SATA2 or SATA3 controller
and switch over the drives to that controller. If it is driver/controller
related the problem should disappear. Is it possible to simply reconnect the
drives and all is going to be well or will I have to reinstall due to different
SATA "layouts" on the disks or alike?

Kind regards,
   JP
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20110728/18be9739/attachment.html>

Jens Elkner

2011-Jul-28 14:10 UTC

head link

[zfs-discuss] Gen-ATA read sector errors

On Thu, Jul 28, 2011 at 01:55:27PM +0200, Koopmann, Jan-Peter wrote:
Hi,> 
>    my system is running oi148 on a super micro X8SIL-F board. I have two
pools
>    (2 disc mirror, 4 disc RAIDZ) with RAID level SATA drives. (Hitachi
HUA72205
>    and SAMSUNG HE103UJ).  The system runs as expected however every few
days
>    (sometimes weeks) the system comes to a halt due to these errors:
> 
>    Dec  3 13:51:20 nasjpk gda: [ID 107833 kern.warning] WARNING:
>    /pci at 0,0/pci-ide at 1f,2/ide at 1/cmdk at 0,0 (Disk1):
>    Dec  3 13:51:20 nasjpk  Error for commandX \''read
sector\'' Error Level:
>    Fatal
>    Dec  3 13:51:20 nasjpk gda: [ID 107833 kern.notice]     Requested Block
>    5503936, Error Block: 5503936
>    Dec  3 13:51:20 nasjpk gda: [ID 107833 kern.notice]     Sense Key:
>    uncorrectable data error
>    Dec  3 13:51:20 nasjpk gda: [ID 107833 kern.notice]     Vendor
\''Gen-ATA \''
>    error code: XX7
> 
>    It is not related to this one disk. It happens on all disks. Sometimes
>    several are listed before the system "crashes", sometimes just
one. I cannot
I tend to agree, that the IDE driver seems to have a problem: I.e. on
our machines (HP Z400 with a 0B4Ch-D MB with a 82801JI (ICH10 Family)
controller) using a rpool 2-way mirror of WDC WD5000AAKS HDDs) we also
see sometimes, that one drive got disabled dueto "too many errors".
zpool clear revives the pool (i.e. the HDD gets resilvered very quickly
without any problem) ''til it occures again (i.e. after some
days, weeks, or months). Unfortunately we couldn''t find a procedure to
reproduce the problem (e.g. like for the Marvell ctrl in the early days).

Regards,
jel.
-- 
Otto-von-Guericke University     http://www.cs.uni-magdeburg.de/
Department of Computer Science   Geb. 29 R 027, Universitaetsplatz 2
39106 Magdeburg, Germany         Tel: +49 391 67 12768

Richard Elling

2011-Jul-29 19:23 UTC

head link

[zfs-discuss] Gen-ATA read sector errors

On Jul 28, 2011, at 4:55 AM, Koopmann, Jan-Peter wrote:
> Hi,
> 
> my system is running oi148 on a super micro X8SIL-F board. I have two pools
(2 disc mirror, 4 disc RAIDZ) with RAID level SATA drives. (Hitachi HUA72205 and
SAMSUNG HE103UJ).  The system runs as expected however every few days (sometimes
weeks) the system comes to a halt due to these errors:
> 
> Dec  3 13:51:20 nasjpk gda: [ID 107833 kern.warning] WARNING: /pci at
0,0/pci-ide at 1f,2/ide at 1/cmdk at 0,0 (Disk1):
> Dec  3 13:51:20 nasjpk  Error for commandX \''read
sector\'' Error Level: Fatal
> Dec  3 13:51:20 nasjpk gda: [ID 107833 kern.notice]     Requested Block
5503936, Error Block: 5503936
> Dec  3 13:51:20 nasjpk gda: [ID 107833 kern.notice]     Sense Key:
uncorrectable data error
> Dec  3 13:51:20 nasjpk gda: [ID 107833 kern.notice]     Vendor
\''Gen-ATA \'' error code: XX7
Several things:

1. You are using SATA in IDE-compatibility mode.  Usually this is a BIOS setting
and for most BIOSes, IDE-compatibility mode is the default. Change to AHCI 
is an improvement that includes better error monitoring.

2. In this case, the disk is returning an unrecoverable read error. This is the
most
common error for modern HDDs.

3. When #2 happens, consumer-grade disks can get stuck retrying forever. 
Enterprise-class drives have limited retry. For the retry-forever disks, the OS
is responsible for ultimately timing out the I/O attempt. For many Solaris
releases,
the default retry/timeout cycle lasts 3 to 5 minutes. Because of #1, the disk
cannot
service more than one outstanding I/O, so all I/O to the disk is blocked,
impacting
the rest of the pool.
> 
> It is not related to this one disk. It happens on all disks. Sometimes
several are listed before the system "crashes", sometimes just one. I
cannot pinpoint it to a single defect disk though (and already have replaced the
disks). I suspect that this is an error with the SATA controller or the driver.
Can someone give me a hint on whether or not that assumption sounds feasible? I
am planning on getting a new "cheap" 6-8 way SATA2 or SATA3 controller
and switch over the drives to that controller. If it is driver/controller
related the problem should disappear. Is it possible to simply reconnect the
drives and all is going to be well or will I have to reinstall due to different
SATA "layouts" on the disks or alike?
The ease of migration depends on your HBA and whether it writes metadata
that is not compatible with other HBAs. For simple HBAs, it is quite common for
disks to be migrated to other machines and the pool imported.

HTH,
 -- richard


-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20110729/22eeb751/attachment.html>

zfs discuss - Jul 2011 - Gen-ATA read sector errors

[zfs-discuss] Gen-ATA read sector errors

[zfs-discuss] Gen-ATA read sector errors

[zfs-discuss] Gen-ATA read sector errors