thr3ads.net - zfs discuss - [zfs-discuss] ZFS recovery from a disk losing power [May 2006]

If this information is useful, please help other people find it:
Share via:

grant beattie

2006-May-16 09:02 UTC

[zfs-discuss] ZFS recovery from a disk losing power

running b37 on amd64. after removing power from a disk configured as
a mirror, 10 minutes has passed and ZFS has still not offlined it.

# zpool status tank
  pool: tank
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using ''zpool clear'' or replace the device with
''zpool replace''.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        tank        ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            c4t0d0  ONLINE      14 6.05K     0
            c4t1d0  ONLINE       0     0     0

errors: No known data errors

# grep ''Hardware_Error'' /var/adm/messages | wc -l
    7632

only after I manually ran "zfs detach tank c4t0d0" did the SCSI errors
stop. I would have expected it to be offlined automatically, which is
exactly what happened when I did this same test with an SVM mirror.

is this a bug?

grant.

grant beattie

2006-May-16 10:13 UTC

head link

[zfs-discuss] ZFS recovery from a disk losing power

On Tue, May 16, 2006 at 07:02:37PM +1000, grant beattie wrote:
> running b37 on amd64. after removing power from a disk configured as
> a mirror, 10 minutes has passed and ZFS has still not offlined it.
I should have mentioned, the disks are connected to an Adaptec 2120S
card (aac). not that I think it should make any difference.

grant.

Eric Schrock

2006-May-16 17:13 UTC

head link

[zfs-discuss] ZFS recovery from a disk losing power

What has happened is that your device has started reporting errors, but
is still available on the system.  i.e. ZFS is still able to ldi_open()
the underlying device.  This seems like a strange failure mode for the
device (you may want to investigate how that''s possible), but ZFS is
functioning as designed.  You can verify this by doing ''dtrace -n
vdev_reopen:entry'', which should show ZFS attempting to reopen the
device once a minute or so.  We currently only detect device failure
when the device "goes away".

A future enhancement is to do predictive analysis based on error rates.
This will leverage the full power of FMA diagnosis, allowing us to
perform SERD analysis and incorporate past history as a mechanism for
predicting future failure.  This will also incoporate the SMART
predictive failure bit when available.  We haven''t started work on this
yet, but we have a plan for doing so.

- Eric

On Tue, May 16, 2006 at 07:02:37PM +1000, grant beattie
wrote:> running b37 on amd64. after removing power from a disk configured as
> a mirror, 10 minutes has passed and ZFS has still not offlined it.
> 
> # zpool status tank
>   pool: tank
>  state: ONLINE
> status: One or more devices has experienced an unrecoverable error.  An
>         attempt was made to correct the error.  Applications are
unaffected.
> action: Determine if the device needs to be replaced, and clear the errors
>         using ''zpool clear'' or replace the device with
''zpool replace''.
>    see: http://www.sun.com/msg/ZFS-8000-9P
>  scrub: none requested
> config:
> 
>         NAME        STATE     READ WRITE CKSUM
>         tank        ONLINE       0     0     0
>           mirror    ONLINE       0     0     0
>             c4t0d0  ONLINE      14 6.05K     0
>             c4t1d0  ONLINE       0     0     0
> 
> errors: No known data errors
> 
> # grep ''Hardware_Error'' /var/adm/messages | wc -l
>     7632
> 
> only after I manually ran "zfs detach tank c4t0d0" did the SCSI
errors
> stop. I would have expected it to be offlined automatically, which is
> exactly what happened when I did this same test with an SVM mirror.
> 
> is this a bug?
> 
> grant.
> 
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
--
Eric Schrock, Solaris Kernel Development       http://blogs.sun.com/eschrock

grant beattie

2006-May-16 17:22 UTC

head link

[zfs-discuss] ZFS recovery from a disk losing power

On Tue, May 16, 2006 at 10:13:46AM -0700, Eric Schrock wrote:
> What has happened is that your device has started reporting errors, but
> is still available on the system.  i.e. ZFS is still able to ldi_open()
> the underlying device.  This seems like a strange failure mode for the
> device (you may want to investigate how that''s possible), but ZFS
is
> functioning as designed.  You can verify this by doing ''dtrace -n
> vdev_reopen:entry'', which should show ZFS attempting to reopen the
> device once a minute or so.  We currently only detect device failure
> when the device "goes away".
hi Eric,

you''re right, the aac card appears to offline the disk but the LUN is
still available (though its an empty device). I''ll capture some more
info
when I try this again tomorrow.

what I find interesting is that the SCSI errors were continuous for 10
minutes before I detached it, ZFS wasn''t backing off at all. it was
flooding the VGA console quicker than the console could print it all
:) from what you said above, once per minute would have been more
desirable.

I wonder why, given that ZFS knew there was a problem with this disk,
that it wasn''t marked FAULTED and the pool DEGRADED?

I don''t know enough about the internals to know why SVM happily
offlined the device after a short burst of errors - that''s certainly
more friendly and expected. is there any way I can get the same
failure mode with ZFS?
> A future enhancement is to do predictive analysis based on error rates.
> This will leverage the full power of FMA diagnosis, allowing us to
> perform SERD analysis and incorporate past history as a mechanism for
> predicting future failure.  This will also incoporate the SMART
> predictive failure bit when available.  We haven''t started work on
this
> yet, but we have a plan for doing so.
that would be cool, too :)

grant.

Eric Schrock

2006-May-16 17:32 UTC

head link

[zfs-discuss] ZFS recovery from a disk losing power

On Wed, May 17, 2006 at 03:22:34AM +1000, grant beattie
wrote:> 
> what I find interesting is that the SCSI errors were continuous for 10
> minutes before I detached it, ZFS wasn''t backing off at all. it
was
> flooding the VGA console quicker than the console could print it all
> :) from what you said above, once per minute would have been more
> desirable.
The "once per minute" is related to the frequency at which ZFS tries
to
reopen the device.  Regardless, ZFS will try to issue I/O to the device
whenever asked.  If you believe the device is completely broken, the
correct procedure (as documented in the ZFS Administration Guide), is to
''zpool offline'' the device until you are able to repair it.
> I wonder why, given that ZFS knew there was a problem with this disk,
> that it wasn''t marked FAULTED and the pool DEGRADED?
This is the future enhancement that I described below.  We need more
sophisticated analysis than simply ''N errors = FAULTED'', and
that''s what
FMA provides.  It will allow us to interact with larger fault management
(such as correlating SCSI errors, identifying controller failure, and
more).  ZFS is a intentionally dumb.  Each subsystem is responsible for
reporting errors, but coordinated fault diagnosis has to happen at a
higher level.
> I don''t know enough about the internals to know why SVM happily
> offlined the device after a short burst of errors - that''s
certainly
> more friendly and expected. is there any way I can get the same
> failure mode with ZFS?
Not currently.

- Eric

--
Eric Schrock, Solaris Kernel Development       http://blogs.sun.com/eschrock

Richard Elling

2006-May-16 18:19 UTC

head link

[zfs-discuss] ZFS recovery from a disk losing power

On Tue, 2006-05-16 at 10:32 -0700, Eric Schrock wrote:> On Wed, May 17, 2006 at 03:22:34AM +1000, grant beattie wrote:
> > 
> > what I find interesting is that the SCSI errors were continuous for 10
> > minutes before I detached it, ZFS wasn''t backing off at all.
it was
> > flooding the VGA console quicker than the console could print it all
> > :) from what you said above, once per minute would have been more
> > desirable.
> 
> The "once per minute" is related to the frequency at which ZFS
tries to
> reopen the device.  Regardless, ZFS will try to issue I/O to the device
> whenever asked.  If you believe the device is completely broken, the
> correct procedure (as documented in the ZFS Administration Guide), is to
> ''zpool offline'' the device until you are able to repair
it.
> 
> > I wonder why, given that ZFS knew there was a problem with this disk,
> > that it wasn''t marked FAULTED and the pool DEGRADED?
> 
> This is the future enhancement that I described below.  We need more
> sophisticated analysis than simply ''N errors = FAULTED'',
and that''s what
> FMA provides.  It will allow us to interact with larger fault management
> (such as correlating SCSI errors, identifying controller failure, and
> more).  ZFS is a intentionally dumb.  Each subsystem is responsible for
> reporting errors, but coordinated fault diagnosis has to happen at a
> higher level.
[reason #8752, why pulling disk drives doesn''t simulate real failures]
There are also a number of cases where a successful or 
unsuccessful+retryable error codes carry the recommendation to replace
the drive.  There really isn''t a clean way to write such diagnosis
engines into the various file systems, LVMs, or databases which might
use disk drives.  Putting that intelligence into an FMA DE and tying
that into file systems or LVMs is the best way to do this.
 -- richard

Sanjay Nadkarni

2006-May-19 05:40 UTC

head link

[zfs-discuss] ZFS recovery from a disk losing power

Since it''s not exactly clear what you did with SVM I am assuming the 
following:

 You had a file system on top of the mirror and there was some I/O 
occurring to the mirror.  The *only* time, SVM puts a device into 
maintenance is when we receive an EIO from the underlying device.  So, 
in case a write occurred to the mirror, then the write to the powered 
off side failed (returned an EIO) and SVM kept going.  Since all buffers 
sent to sd/ssd are marked with B_FAILFAST, the driver timeouts are low 
and the device is put into maintenance.

If I understand Eric correctly, ZFS attempts to see if the device is 
really gone.  However I am not quite sure what Eric means by:
>We currently only detect device failure when the device "goes
away".
Perhaps the issue here that ldi_open is successful when it should n''t 
and therefore confusing ZFS.

 Another way to check is perform the same test, without any I/O 
occurring to the file system.   Then run metastat -i (as root).  This is 
similar to scrub for the volumes. 

-Sanjay




Richard Elling wrote:
>On Tue, 2006-05-16 at 10:32 -0700, Eric Schrock wrote:
>  
>
>>On Wed, May 17, 2006 at 03:22:34AM +1000, grant beattie wrote:
>>    
>>
>>>what I find interesting is that the SCSI errors were continuous for
10
>>>minutes before I detached it, ZFS wasn''t backing off at
all. it was
>>>flooding the VGA console quicker than the console could print it all
>>>:) from what you said above, once per minute would have been more
>>>desirable.
>>>      
>>>
>>The "once per minute" is related to the frequency at which ZFS
tries to
>>reopen the device.  Regardless, ZFS will try to issue I/O to the device
>>whenever asked.  If you believe the device is completely broken, the
>>correct procedure (as documented in the ZFS Administration Guide), is to
>>''zpool offline'' the device until you are able to
repair it.
>>
>>    
>>
>>>I wonder why, given that ZFS knew there was a problem with this
disk,
>>>that it wasn''t marked FAULTED and the pool DEGRADED?
>>>      
>>>
>>This is the future enhancement that I described below.  We need more
>>sophisticated analysis than simply ''N errors =
FAULTED'', and that''s what
>>FMA provides.  It will allow us to interact with larger fault management
>>(such as correlating SCSI errors, identifying controller failure, and
>>more).  ZFS is a intentionally dumb.  Each subsystem is responsible for
>>reporting errors, but coordinated fault diagnosis has to happen at a
>>higher level.
>>    
>>
>
>[reason #8752, why pulling disk drives doesn''t simulate real
failures]
>There are also a number of cases where a successful or 
>unsuccessful+retryable error codes carry the recommendation to replace
>the drive.  There really isn''t a clean way to write such diagnosis
>engines into the various file systems, LVMs, or databases which might
>use disk drives.  Putting that intelligence into an FMA DE and tying
>that into file systems or LVMs is the best way to do this.
> -- richard
>
>
>_______________________________________________
>zfs-discuss mailing list
>zfs-discuss at opensolaris.org
>http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>  
>

grant beattie

2006-May-19 06:13 UTC

head link

[zfs-discuss] ZFS recovery from a disk losing power

On Thu, May 18, 2006 at 11:40:53PM -0600, Sanjay Nadkarni wrote:
> Since it''s not exactly clear what you did with SVM I am assuming
the
> following:
> 
> You had a file system on top of the mirror and there was some I/O 
> occurring to the mirror.  The *only* time, SVM puts a device into 
> maintenance is when we receive an EIO from the underlying device.  So, 
> in case a write occurred to the mirror, then the write to the powered 
> off side failed (returned an EIO) and SVM kept going.  Since all buffers 
> sent to sd/ssd are marked with B_FAILFAST, the driver timeouts are low 
> and the device is put into maintenance.
the test was the same in both the SVM and the ZFS case. constant reads
from the mirror device, and unplugging the power. the read throughput
during this test with ZFS drops to around 20% until the device is
manually removed from the pool, after which point it returns to normal.
> If I understand Eric correctly, ZFS attempts to see if the device is 
> really gone.  However I am not quite sure what Eric means by:
> 
> >We currently only detect device failure when the device "goes
away".
> 
> Perhaps the issue here that ldi_open is successful when it should
n''t
> and therefore confusing ZFS.
yes, that seems to be the case. it appears to be caused by the way the
aac card deals with the disk going away - it offlines the disk, and the
LUN is still presented, but it now has zero length.

also, after a disk is offlined by the card, there does not seem to be
a way to tell the card to rescan the bus, so it requires a reboot
(though there is nothing that ZFS can do which would fix that). I
believe it can be done with the "aaccli" program provided by Adaptec,
but that doesn''t work with the Solaris-provided aac driver.
> Another way to check is perform the same test, without any I/O 
> occurring to the file system.   Then run metastat -i (as root).  This is 
> similar to scrub for the volumes. 
with no IO activity on the mirror, metastat -i does not detect that
anything is wrong.

with IO activity, SVM offlines the metadevice when it gets a fatal
error from the device.

grant.

Richard Elling

2006-May-19 16:31 UTC

head link

[zfs-discuss] ZFS recovery from a disk losing power

On Thu, 2006-05-18 at 23:40 -0600, Sanjay Nadkarni
wrote:>  You had a file system on top of the mirror and there was some I/O 
> occurring to the mirror.  The *only* time, SVM puts a device into 
> maintenance is when we receive an EIO from the underlying device.  So, 
> in case a write occurred to the mirror, then the write to the powered 
> off side failed (returned an EIO) and SVM kept going.  Since all buffers 
> sent to sd/ssd are marked with B_FAILFAST, the driver timeouts are low 
> and the device is put into maintenance.
Sanjay,
#1 on the Pareto chart of disk error messages is the nonrecoverable
read.  Does SVM put the mirror in maintenance mode due to an EIO caused
by a nonrecoverable read?
 -- richard

Seemingly Similar Threads

Search for more seemingly similar threads

zfs discuss - May 2006 - ZFS recovery from a disk losing power

[zfs-discuss] ZFS recovery from a disk losing power

[zfs-discuss] ZFS recovery from a disk losing power

[zfs-discuss] ZFS recovery from a disk losing power

[zfs-discuss] ZFS recovery from a disk losing power

[zfs-discuss] ZFS recovery from a disk losing power

[zfs-discuss] ZFS recovery from a disk losing power

[zfs-discuss] ZFS recovery from a disk losing power

[zfs-discuss] ZFS recovery from a disk losing power

[zfs-discuss] ZFS recovery from a disk losing power

Seemingly Similar Threads