thr3ads.net - zfs discuss - [zfs-discuss] ZFS Failing Drive procedure (mirrored pairs)

If this information is useful, please help other people find it:
Share via:

Karl Pielorz

2008-Sep-08 08:50 UTC

[zfs-discuss] ZFS Failing Drive procedure (mirrored pairs) - did I mess this up?

Hi All,

I run ZFS (a version 6 pool) under FreeBSD. Whilst I realise this changes a 
*whole heap* of things - I''m more interested in if I did
''anything wrong''
when I had a recent drive failure...

On of a mirrored pair of drives on the system started failing, badly 
(confirmed by ''hard'' read & write erros logged to the
console). ZFS also
started showing errors, the machine started hanging, waiting for I/O''s
to
complete (which is how I noticed it).

How many errors does a drive have to throw before it''s considered
"failed"
by ZFS? - Mine had got to about 30-40 [not a huge amount] - but was making 
the system unusable, so I manually attached another hot-spare drive to the 
''good'' device left in that mirrored pair.

However, ZFS was still trying to read data off the failing drive - this 
pushed the re-silver time up to 755 hours, whilst the number of errors in 
the next forty minutes or so got to around 300. Not wanting my data 
unprotected for 755 odd hours (and fearing the number was just going up and 
up) I did:

  zpool detach vol ad4

(''ad4'' was the failing drive).

This hung all I/O on the pool :( - I waited 5 hours, and then decided to 
reboot.

After the reboot the pool came back OK (with ''ad4'' removed)
and the
re-silver continued, and completed in half an hour.

Thinking about it - perhaps I should have detached ad4 (the failing drive) 
before attaching another device? - My thinking at the time was I didn''t
know how badly failed the drive was, and obviously removing what might have 
been 200Gb of ''perfectly'' accessible data from a mirrored
pair, prior to
re-silvering to a replacement, didn''t sit right.

I''m hoping ZFS shouldn''t have hung when I later decided to fix
the
situation, and remove ad4?

-Kp

Richard Elling

2008-Sep-08 14:30 UTC

head link

[zfs-discuss] ZFS Failing Drive procedure (mirrored pairs) - did I mess this up?

Karl Pielorz wrote:> Hi All,
>
> I run ZFS (a version 6 pool) under FreeBSD. Whilst I realise this changes a
> *whole heap* of things - I''m more interested in if I did
''anything wrong''
> when I had a recent drive failure...
>
> On of a mirrored pair of drives on the system started failing, badly 
> (confirmed by ''hard'' read & write erros logged to the
console). ZFS also
> started showing errors, the machine started hanging, waiting for
I/O''s to
> complete (which is how I noticed it).
>
> How many errors does a drive have to throw before it''s considered
"failed"
> by ZFS? - Mine had got to about 30-40 [not a huge amount] - but was making 
> the system unusable, so I manually attached another hot-spare drive to the 
> ''good'' device left in that mirrored pair.
>
> However, ZFS was still trying to read data off the failing drive - this 
> pushed the re-silver time up to 755 hours, whilst the number of errors in 
> the next forty minutes or so got to around 300. Not wanting my data 
> unprotected for 755 odd hours (and fearing the number was just going up and
> up) I did:
>
>   zpool detach vol ad4
>
> (''ad4'' was the failing drive).
>
> This hung all I/O on the pool :( - I waited 5 hours, and then decided to 
> reboot.
>   
This seems like a reasonable process to follow, I would have done
much the same.
> After the reboot the pool came back OK (with ''ad4''
removed) and the
> re-silver continued, and completed in half an hour.
>   
There are failure modes that disks can get into which seem to be
solved by a power-on reset.  I had one of these just last week :-(.
We would normally expect a soft reset to clear the cobwebs, but
that was not my experience.
> Thinking about it - perhaps I should have detached ad4 (the failing drive) 
> before attaching another device? - My thinking at the time was I
didn''t
> know how badly failed the drive was, and obviously removing what might have
> been 200Gb of ''perfectly'' accessible data from a mirrored
pair, prior to
> re-silvering to a replacement, didn''t sit right.
>
> I''m hoping ZFS shouldn''t have hung when I later decided
to fix the
> situation, and remove ad4?
>   
[caveat: I''ve not examined the FreeBSD ZFS port, the following
presumes the FreeBSD port is similar to the Solaris port]
ZFS does not have its own timeouts for this sort of problem.
It relies on the underlying device drivers to manage their
timeouts.  So there was not much you could do at the ZFS level
other than detach the disk.
 -- richard

Miles Nordin

2008-Sep-08 17:34 UTC

head link

[zfs-discuss] ZFS Failing Drive procedure (mirrored pairs) - did I mess this up?

>>>>> "kp" == Karl Pielorz <kpielorz_lst at
tdx.co.uk> writes:
    kp> Thinking about it - perhaps I should have detached ad4 (the
    kp> failing drive) before attaching another device?

no, I think ZFS should be fixed.

1. the procedure you used is how hot spares are used, so anyone who
   says it''s wrong for any reason is using hindsight bias.

2. Being able to pull data off a failing-but-not-fully-gone drive is
   something a good storage subsystem should be able to do.  I might
   not expect it of LVM2 or of crappy raid-on-a-card, but I would
   definitely expect it from Netapp/EMC/Hitachi.

3. Also sometimes one is confused about which drive is failing because
   of crappy controllers and controller drivers, so by-the-book
   recovery procedures shouldn''t have to involve ad-hoc detaching.
   though my experience with software raid other than ZFS is the
   same---the whole job is about having the Fu to know what to unplug
   to make the rickety system stable again.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 304 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20080908/8d192b16/attachment.bin>

Bob Friesenhahn

2008-Sep-08 17:54 UTC

head link

[zfs-discuss] ZFS Failing Drive procedure (mirrored pairs) - did I mess this up?

On Mon, 8 Sep 2008, Miles Nordin wrote:>
> no, I think ZFS should be fixed.
>
> 1. the procedure you used is how hot spares are used, so anyone who
>   says it''s wrong for any reason is using hindsight bias.
>
> 2. Being able to pull data off a failing-but-not-fully-gone drive is
>   something a good storage subsystem should be able to do.  I might
>   not expect it of LVM2 or of crappy raid-on-a-card, but I would
>   definitely expect it from Netapp/EMC/Hitachi.
Please describe (in detail) how ZFS can be improved to be able to 
retrieve data from a failing drive (which might take minutes to return 
a read error due to "consumer" drive firmware) in a reasonable amount 
of time.  I look forward to your response.

Thanks,

Bob
=====================================Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Tuomas Leikola

2008-Sep-08 18:40 UTC

head link

[zfs-discuss] ZFS Failing Drive procedure (mirrored pairs) - did I mess this up?

On Mon, Sep 8, 2008 at 8:54 PM, Bob Friesenhahn
<bfriesen at simple.dallas.tx.us> wrote:>> 2. Being able to pull data off a failing-but-not-fully-gone drive is
>>   something a good storage subsystem should be able to do.  I might
>>   not expect it of LVM2 or of crappy raid-on-a-card, but I would
>>   definitely expect it from Netapp/EMC/Hitachi.
>
> Please describe (in detail) how ZFS can be improved to be able to
> retrieve data from a failing drive (which might take minutes to return
> a read error due to "consumer" drive firmware) in a reasonable
amount
> of time.  I look forward to your response.
>
During resilver? Use a heuristic to determine one drive more suspect
than others, and try to only issue "leaf" data requests to that drive.
When it''s queue is full, the other drive can happily churn away. If it
still hangs when resilver would otherwise complete, issue the same
reads to other disk(s) to get the data, and forget about it.

This way you wont stall the resilver unnecessarily (after you notice
one drive being slowish) and still have it around if there are bad
blocks on other drives (unless most software raid systems).

Not perfect, but better than ignorant round robin readbalance.

- Tuomas

Karl Pielorz

2008-Sep-08 19:46 UTC

head link

[zfs-discuss] ZFS Failing Drive procedure (mirrored pairs) - did I mess this up?

--On 08 September 2008 07:30 -0700 Richard Elling <Richard.Elling at
Sun.COM>
wrote:
> This seems like a reasonable process to follow, I would have done
> much the same.
> [caveat: I''ve not examined the FreeBSD ZFS port, the following
> presumes the FreeBSD port is similar to the Solaris port]
> ZFS does not have its own timeouts for this sort of problem.
> It relies on the underlying device drivers to manage their
> timeouts.  So there was not much you could do at the ZFS level
> other than detach the disk.
Ok, I''m glad I''m finally getting the hang of ZFS, and
''did the right
thing(tm)''.

Is there any tunable on ZFS that will tell it "If you get more than x/y/z 
Read, Write or Checksum errors" - detach the drive as
''failed''? Maybe on a
per-drive basis?

It''d probably need some way for admin to override it (i.e. force it to
be
ignored)? - for those times where you either have to, or for a drive you 
know will at least stand a chance of reading the rest of the surface
''past''
the errors.

This would probably be set quite low for ''consumer'' grade
drives, and
moderately higher for ''enterprise'' drives that don''t
"go out to lunch" for
extended periods while seeing if they can recover a block. You could even 
default it to ''infinity'' if that''s what the current
level is.

It''d certainly have saved me a lot of time if the number of errors on
the
drive had past a relatively low figure, and it just ditched the drive...

One other random thought occurred to me when this happened - if I detach a 
drive, does ZFS have to update some meta-data on *all* the drives for that 
pool (including the one I''ve detached) to know it''s been
detached? (if that
makes sense).

That might explain why the ''detach'' I issued just hung (if it
had to update
meta-data on the drive I was removing, it probably got caught in the wash 
of failing I/O timing out on that device).

-Karl

Richard Elling

2008-Sep-08 23:37 UTC

head link

[zfs-discuss] ZFS Failing Drive procedure (mirrored pairs) - did I mess this up?

Karl Pielorz wrote:>
>
> --On 08 September 2008 07:30 -0700 Richard Elling 
> <Richard.Elling at Sun.COM> wrote:
>
>> This seems like a reasonable process to follow, I would have done
>> much the same.
>
>> [caveat: I''ve not examined the FreeBSD ZFS port, the following
>> presumes the FreeBSD port is similar to the Solaris port]
>> ZFS does not have its own timeouts for this sort of problem.
>> It relies on the underlying device drivers to manage their
>> timeouts.  So there was not much you could do at the ZFS level
>> other than detach the disk.
>
> Ok, I''m glad I''m finally getting the hang of ZFS, and
''did the right
> thing(tm)''.
>
> Is there any tunable on ZFS that will tell it "If you get more than 
> x/y/z Read, Write or Checksum errors" - detach the drive as
''failed''?
> Maybe on a per-drive basis?
This is the function of one or more diagnosis engines in Solaris.
Not all errors are visible to ZFS, it makes sense to diagnose the error
where it is visible -- usually at the device driver level.
>
> It''d probably need some way for admin to override it (i.e. force
it to
> be ignored)? - for those times where you either have to, or for a 
> drive you know will at least stand a chance of reading the rest of the 
> surface ''past'' the errors.
>
> This would probably be set quite low for ''consumer'' grade
drives, and
> moderately higher for ''enterprise'' drives that
don''t "go out to lunch"
> for extended periods while seeing if they can recover a block. You 
> could even default it to ''infinity'' if that''s
what the current level is.
>
> It''d certainly have saved me a lot of time if the number of errors
on
> the drive had past a relatively low figure, and it just ditched the 
> drive...
In Solaris, this is implemented through the FMA diagnosis engines
which communicate with interested parties, such as ZFS.  At present,
the variables really aren''t tunable, per se, but you can see the values
in the source.  For example, the ZFS diagnosis engine is:
http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/cmd/fm/modules/common/zfs-diagnosis/zfs_de.c
>
> One other random thought occurred to me when this happened - if I 
> detach a drive, does ZFS have to update some meta-data on *all* the 
> drives for that pool (including the one I''ve detached) to know
it''s
> been detached? (if that makes sense).
Yes.
>
> That might explain why the ''detach'' I issued just hung
(if it had to
> update meta-data on the drive I was removing, it probably got caught 
> in the wash of failing I/O timing out on that device).
Yes, I believe this is consistent with what you saw.
 -- richard

Ross

2008-Sep-09 08:19 UTC

head link

[zfs-discuss] ZFS Failing Drive procedure (mirrored pairs) -

The lop sided mirror stuff I''m going to RFE today would probably do it
too:
http://www.opensolaris.org/jive/thread.jspa?threadID=70811&tstart=0

If ZFS realised that a drive was returning results much slower than normal it
could try reading off the other drive instead.  That would allow the resilver to
run from the good drive.

The zpool detach failing is a problem though.  I would hope that under Solaris
FMA would have spotted the problem and faulted that drive, but I still feel that
ZFS should be double checking stuff like this so that we don''t get
these situations where the whole pool hangs.
--
This message posted from opensolaris.org

zfs discuss - Sep 2008 - ZFS Failing Drive procedure (mirrored pairs) - did I mess this up?

[zfs-discuss] ZFS Failing Drive procedure (mirrored pairs) - did I mess this up?

[zfs-discuss] ZFS Failing Drive procedure (mirrored pairs) - did I mess this up?

[zfs-discuss] ZFS Failing Drive procedure (mirrored pairs) - did I mess this up?

[zfs-discuss] ZFS Failing Drive procedure (mirrored pairs) - did I mess this up?

[zfs-discuss] ZFS Failing Drive procedure (mirrored pairs) - did I mess this up?

[zfs-discuss] ZFS Failing Drive procedure (mirrored pairs) - did I mess this up?

[zfs-discuss] ZFS Failing Drive procedure (mirrored pairs) - did I mess this up?

[zfs-discuss] ZFS Failing Drive procedure (mirrored pairs) -