thr3ads.net - zfs discuss - [zfs-discuss] ZFS Device fail timeout? [Apr 2008]

If this information is useful, please help other people find it:
Share via:

Luke Scharf

2008-Apr-01 19:08 UTC

[zfs-discuss] ZFS Device fail timeout?

I''m running ZFS in a test-server against a bunch of drives in an Apple
XRaid (configured in the JBOD mode).  It works pretty well, except that
when I yank one of the drives,  ZFS hangs -- presumably, it''s waiting
for a response from the the XRAID.

Is there any way to set the device-failure timeout with ZFS?

Thanks,
-Luke

Richard Elling

2008-Apr-01 22:52 UTC

head link

[zfs-discuss] ZFS Device fail timeout?

Luke Scharf wrote:> I''m running ZFS in a test-server against a bunch of drives in an
Apple
> XRaid (configured in the JBOD mode).  It works pretty well, except that
> when I yank one of the drives,  ZFS hangs -- presumably, it''s
waiting
> for a response from the the XRAID.
>
> Is there any way to set the device-failure timeout with ZFS?
>   
In general, ZFS doesn''t manage device timeouts.  The lower
layer drivers do.  The timeout management depends on which OS,
OS version, and HBA you use.  A fairly extreme example may be
Solaris using parallel SCSI and the sd driver, which uses a default
timeout of 60 seconds and 5 retries.  In the more recent Solaris NV
builds, FMA has been enhanced with an io-retire module which
can make better decisions on whether the device is behaving well.
 -- richard
> Thanks,
> -Luke
>
>
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>

Luke Scharf

2008-Apr-02 00:50 UTC

head link

[zfs-discuss] ZFS Device fail timeout?

Richard Elling wrote:> In general, ZFS doesn''t manage device timeouts.  The lower
> layer drivers do.  The timeout management depends on which OS,
> OS version, and HBA you use.  A fairly extreme example may be
> Solaris using parallel SCSI and the sd driver, which uses a default
> timeout of 60 seconds and 5 retries.  In the more recent Solaris NV
> builds, FMA has been enhanced with an io-retire module which
> can make better decisions on whether the device is behaving well.
What, ZFS isn''t the whole kernel?  ;-)

I can Google/RTFM from here.

Thanks!
-Luke

Ross

2008-Apr-05 20:40 UTC

head link

[zfs-discuss] ZFS Device fail timeout?

To my mind it''s a big limitation of ZFS that it relies on the driver
timeouts.  The driver has no knowledge of what kind of configuration the disks
are in, and generally any kind of data loss is bad, so it''s not
unexpected to see that long timeouts are the norm as the driver does
it''s very best to avoid data loss.

ZFS however knows full well if a device is in a protected pool (whether raided
or mirrored), and really has no reason to hang operations on that entire pool if
one device is not responding.

I''ve seen this with iSCSI drivers and I''ve seen plenty of
reports of other people experiencing ZFS hangs, and that includes the admin
tools which makes error reporting / monitoring kind of difficult too.

When dealing with redundant devices ZFS needs to either have it''s own
timeouts, or a more intelligent way of handling this kind of scenario.
 
 
This message posted from opensolaris.org

zfs discuss - Apr 2008 - ZFS Device fail timeout?

[zfs-discuss] ZFS Device fail timeout?

[zfs-discuss] ZFS Device fail timeout?

[zfs-discuss] ZFS Device fail timeout?

[zfs-discuss] ZFS Device fail timeout?