thr3ads.net - zfs discuss - [zfs-discuss] Degraded log device in "zpool status" output [Apr 2009]

If this information is useful, please help other people find it:
Share via:

Will Murnane

2009-Apr-19 05:52 UTC

[zfs-discuss] Degraded log device in "zpool status" output

I have a pool, "huge", composed of one six-disk raidz2 vdev and a log
device.  I failed to plug in one disk when I took the machine down to
plug in the log device, and booted all the way before I realized this,
so the raidz2 vdev was rightly listed as degraded.  Then I brought the
machine down, plugged the disk in, and brought it back up.  I ran
"zpool scrub huge" to make sure that the missing disk was completely
synced.  After a few minutes, "zpool status huge" showed this:
$ zpool status huge
  pool: huge
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using ''zpool clear'' or replace the device with
''zpool replace''.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: scrub in progress for 0h8m, 1.19% done, 11h15m to go
config:

        NAME        STATE     READ WRITE CKSUM
        huge        DEGRADED     0     0     0
          raidz2    DEGRADED     0     0     0
            c4t4d0  DEGRADED     0     0    15  too many errors
            c4t1d0  ONLINE       0     0     0
            c4t2d0  ONLINE       0     0     0
            c4t3d0  ONLINE       0     0     0
            c4t5d0  ONLINE       0     0     0
            c4t6d0  ONLINE       0     0     0
        logs        DEGRADED     0     0     0
          c7d1      ONLINE       0     0     0

errors: No known data errors

I understand that not all of the blocks may have been synced onto
c4t4d0 (the missing disk), so some checksum errors are normal there.
But the log disk reports no errors, and its sole component reports
none either, yet the log device is marked as degraded.  To see what
would happen, I executed this:
$ pfexec zpool clear huge c4t4d0
$ zpool status huge
  pool: huge
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using ''zpool clear'' or replace the device with
''zpool replace''.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: scrub in progress for 0h12m, 1.87% done, 10h32m to go
config:

        NAME        STATE     READ WRITE CKSUM
        huge        ONLINE       0     0     0
          raidz2    ONLINE       0     0     0
            c4t4d0  ONLINE       0     0     2
            c4t1d0  ONLINE       0     0     0
            c4t2d0  ONLINE       0     0     0
            c4t3d0  ONLINE       0     0     0
            c4t5d0  ONLINE       0     0     0
            c4t6d0  ONLINE       0     0     0
        logs        ONLINE       0     0     0
          c7d1      ONLINE       0     0     0

errors: No known data errors

So clearing the errors from one device has an effect on the status of
another device?  Is this expected behavior, or is something wrong with
my log device?  I''m running snv_111.

Will

Neil Perrin

2009-Apr-19 05:58 UTC

head link

[zfs-discuss] Degraded log device in "zpool status" output

Will,

This is bug:

6710376 log device can show incorrect status when other parts of pool are
degraded

This is just an error in the reporting. There was nothing actually wrong with
the log device. It is picking up the degraded status from the rest of the pool.
The bug was fixed only yesterday and checked into snv_114.

Neil.

On 04/18/09 23:52, Will Murnane wrote:> I have a pool, "huge", composed of one six-disk raidz2 vdev and a
log
> device.  I failed to plug in one disk when I took the machine down to
> plug in the log device, and booted all the way before I realized this,
> so the raidz2 vdev was rightly listed as degraded.  Then I brought the
> machine down, plugged the disk in, and brought it back up.  I ran
> "zpool scrub huge" to make sure that the missing disk was
completely
> synced.  After a few minutes, "zpool status huge" showed this:
> $ zpool status huge
>   pool: huge
>  state: ONLINE
> status: One or more devices has experienced an unrecoverable error.  An
>         attempt was made to correct the error.  Applications are
unaffected.
> action: Determine if the device needs to be replaced, and clear the errors
>         using ''zpool clear'' or replace the device with
''zpool replace''.
>    see: http://www.sun.com/msg/ZFS-8000-9P
>  scrub: scrub in progress for 0h8m, 1.19% done, 11h15m to go
> config:
> 
>         NAME        STATE     READ WRITE CKSUM
>         huge        DEGRADED     0     0     0
>           raidz2    DEGRADED     0     0     0
>             c4t4d0  DEGRADED     0     0    15  too many errors
>             c4t1d0  ONLINE       0     0     0
>             c4t2d0  ONLINE       0     0     0
>             c4t3d0  ONLINE       0     0     0
>             c4t5d0  ONLINE       0     0     0
>             c4t6d0  ONLINE       0     0     0
>         logs        DEGRADED     0     0     0
>           c7d1      ONLINE       0     0     0
> 
> errors: No known data errors
> 
> I understand that not all of the blocks may have been synced onto
> c4t4d0 (the missing disk), so some checksum errors are normal there.
> But the log disk reports no errors, and its sole component reports
> none either, yet the log device is marked as degraded.  To see what
> would happen, I executed this:
> $ pfexec zpool clear huge c4t4d0
> $ zpool status huge
>   pool: huge
>  state: ONLINE
> status: One or more devices has experienced an unrecoverable error.  An
>         attempt was made to correct the error.  Applications are
unaffected.
> action: Determine if the device needs to be replaced, and clear the errors
>         using ''zpool clear'' or replace the device with
''zpool replace''.
>    see: http://www.sun.com/msg/ZFS-8000-9P
>  scrub: scrub in progress for 0h12m, 1.87% done, 10h32m to go
> config:
> 
>         NAME        STATE     READ WRITE CKSUM
>         huge        ONLINE       0     0     0
>           raidz2    ONLINE       0     0     0
>             c4t4d0  ONLINE       0     0     2
>             c4t1d0  ONLINE       0     0     0
>             c4t2d0  ONLINE       0     0     0
>             c4t3d0  ONLINE       0     0     0
>             c4t5d0  ONLINE       0     0     0
>             c4t6d0  ONLINE       0     0     0
>         logs        ONLINE       0     0     0
>           c7d1      ONLINE       0     0     0
> 
> errors: No known data errors
> 
> So clearing the errors from one device has an effect on the status of
> another device?  Is this expected behavior, or is something wrong with
> my log device?  I''m running snv_111.
> 
> Will
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

zfs discuss - Apr 2009 - Degraded log device in "zpool status" output

[zfs-discuss] Degraded log device in "zpool status" output

[zfs-discuss] Degraded log device in "zpool status" output