thr3ads.net - zfs discuss - [zfs-discuss] ZFS corruption [Feb 2009]

If this information is useful, please help other people find it:
Share via:

Leonid Roodnitsky

2009-Feb-09 22:39 UTC

[zfs-discuss] ZFS corruption

Dear All,

I am receiving DEGRAGED for zpool status -v. 3 out of 14 disks are reported as
degraded with ''too many errors''. This is Build 99 running on
x4240 with STK SAS RAID controller. Version of AAC driver is 2.2.5. I am not
sure even where to start. Any advice is very much appreciated. Trying to
convince management that ZFS is the way to go and then getting this problem.
RAID controller does not report any problems with drives. This is RAIDZ (RAID5)
zpool. Thank you everybody.

Regards,
Leonid
-- 
This message posted from opensolaris.org

Richard Elling

2009-Feb-09 23:34 UTC

head link

[zfs-discuss] ZFS corruption

Leonid Roodnitsky wrote:> Dear All,
>
> I am receiving DEGRAGED for zpool status -v. 3 out of 14 disks are reported
as degraded with ''too many errors''. This is Build 99 running
on x4240 with STK SAS RAID controller. Version of AAC driver is 2.2.5. I am not
sure even where to start. Any advice is very much appreciated. Trying to
convince management that ZFS is the way to go and then getting this problem.
RAID controller does not report any problems with drives. This is RAIDZ (RAID5)
zpool. Thank you everybody.
>
>   
The zpool man page says:
     The health of the top-level vdev, such as  mirror  or  raidz
     device,  is potentially impacted by the state of its associ-
     ated vdevs, or component devices. A top-level vdev  or  com-
     ponent device is in one of the following states:

     DEGRADED    One or more top-level vdevs is in  the  degraded
                 state  because one or more component devices are
                 offline. Sufficient replicas exist  to  continue
                 functioning.

                 One or more component devices is in the degraded
                 or  faulted state, but sufficient replicas exist
                 to continue functioning. The  underlying  condi-
                 tions are as follows:

                     o    The number of checksum  errors  exceeds
                          acceptable  levels  and  the  device is
                          degraded as an  indication  that  some-
                          thing  may  be  wrong. ZFS continues to
                          use the device as necessary.

                     o    The  number  of  I/O   errors   exceeds
                          acceptable levels. The device could not
                          be marked as faulted because there  are
                          insufficient replicas to continue func-
                          tioning.

You should take this into consideration as you decide whether
to replace disks or not.
 -- richard

Leonid Roodnitsky

2009-Feb-10 20:55 UTC

head link

[zfs-discuss] ZFS corruption

Dear All,

Is there any way to figure out which piece is at fault? Sun SAS RAID
(Adaptec/Intel) controller is reporting that drives are good, but ZFS is not
happy about checksum errors. Is there any way to figure out which component
introduced the error?

Leonid
-- 
This message posted from opensolaris.org

Cindy.Swearingen at Sun.COM

2009-Feb-10 21:42 UTC

head link

[zfs-discuss] ZFS corruption

Leonid,

You could use the fmdump -eV command to look for problems with these
disks. This command might generate a lot of output, but it should be
clear if the root cause is a problem accessing these devices.

I would also check /var/adm/messages for any driver-related messages.

Cindy

Leonid Roodnitsky wrote:> Dear All,
> 
> Is there any way to figure out which piece is at fault? Sun SAS RAID
(Adaptec/Intel) controller is reporting that drives are good, but ZFS is not
happy about checksum errors. Is there any way to figure out which component
introduced the error?
> 
> Leonid

Roodnitsky, Leonid

2009-Feb-10 22:00 UTC

head link

[zfs-discuss] ZFS corruption

Could this be relevant? Notice sd_cache_control mismatch message. Thank
you everybody for any ideas or help. I really appreciate it.

Feb 06 2009 23:14:07.704531935 ereport.io.scsi.cmd.disk.dev.uderr
nvlist version: 0
        class = ereport.io.scsi.cmd.disk.dev.uderr
        ena = 0x2487a4cf2e00c01
        detector = (embedded nvlist)
        nvlist version: 0
                version = 0x0
                scheme = dev
                device-path /pci at 0,0/pci10de,375 at f/pci108e,286 at 0/disk
at 1,0
                devid = id1,sd at TSun_____STK_RAID_INT____6DB80B08
        (end detector)

        driver-assessment = fail
        op-code = 0x1a
        cdb = 0x1a 0x0 0x8 0x0 0x18 0x0
        pkt-reason = 0x0
        pkt-state = 0x1f
        pkt-stats = 0x0
        stat-code = 0x0
        un-decode-info = sd_cache_control: Mode Sense caching page code
mismatch 0
        un-decode-value         __ttl = 0x1
        __tod = 0x498d189f 0x29fe4ddf


Leonid

-----Original Message-----
From: Cindy.Swearingen at Sun.COM [mailto:Cindy.Swearingen at Sun.COM] 
Sent: Tuesday, February 10, 2009 3:42 PM
To: Roodnitsky, Leonid
Cc: zfs-discuss at opensolaris.org
Subject: Re: [zfs-discuss] ZFS corruption

Leonid,

You could use the fmdump -eV command to look for problems with these
disks. This command might generate a lot of output, but it should be
clear if the root cause is a problem accessing these devices.

I would also check /var/adm/messages for any driver-related messages.

Cindy

Leonid Roodnitsky wrote:> Dear All,
> 
> Is there any way to figure out which piece is at fault? Sun SAS RAID(Adaptec/Intel) controller is reporting that drives are good, but ZFS is
not happy about checksum errors. Is there any way to figure out which
component introduced the error?> 
> Leonid

zfs discuss - Feb 2009 - ZFS corruption

[zfs-discuss] ZFS corruption

[zfs-discuss] ZFS corruption

[zfs-discuss] ZFS corruption

[zfs-discuss] ZFS corruption

[zfs-discuss] ZFS corruption