thr3ads.net - zfs discuss - [zfs-discuss] disk drive sparing

If this information is useful, please help other people find it:
Share via:

Al Hopper

2006-Mar-30 02:20 UTC

[zfs-discuss] disk drive sparing - questions not answers

I understand that ZFS provides fault data, via the zpool status command,
that would (probably ??) allow a knowledgeable[0] zfs admin to determine
(manually) when a zfs hardware element (today that''s usually called a
"disk
drive") might need "maintainence" or might need to be replaced. 
And there
are sufficient features/facilities available in the current (b36+) release
of ZFS to allow that "faulty" element to be replaced.

But what I''m curious about, is what the ZFS Team plan is to support
automatic sparing and automatic substitution/replacement of faulty ZFS
elements with ZFS pre-allocated "spares".

Many hardware RAID controllers support the concept of
''spares''.  Some allow
the spare to be powered down - so that the spare does not fail just as the
drive it is intended to replace, also fails!  Whoops - the wonders of MTBF
and the statistical likelyhood that identical drives, with identical
runtimes and MTBF specs, operated in an identical runtime environment, are
likely to fail within a short timeframe of one another.

[0] Unfortunately, the "supply" of knowledgeable and talented admins
in
various technical disciplines seems (to me) to be deteriorating, rather
than increasing, over time.  Why is that?  Some possibilities:

- Perhaps due to the trend to outsource different computer admin related
functions to less experienced, and lower cost, labor pools.

- perhaps due to the fact that technical management does not understand or
appreciate, the required skillsets and is simply not prepared to pay for
them.

- perhaps because of the theory that modern hardware is so reliable that
human involvement is simply un-necessary.  And that the supplier of said
hardware has already "engineered out", the expected failure modes.

- perhaps because the current trend is that most companies are simply not
willing to pay the ongoing costs to keep their computer personnel trained
and equipped with the necessary technical skillsets to afford the company
a technical advantage.

- perhaps because the management has been conned into spending their entire
IT/computer budget "unwisely" into technology which promised to solve
all
their reliability/uptime issues.  Yep - they blew the budget on the "silver
bullet", as skillfully presented by adept "namebrand" company
sales droids.
Can you spell "SAP"?!

Summary: What is the ZFS sparing policy vision and how can it be best
implemented?  Should it be part of ZFS, or should it be an independent
software suite that is ZFS aware, but customizable by the individual ZFS
user.  If it''s customizable - how much customization can one allow
before
it becomes totally ineffective through mis-configuration by
less-than-stellar zfs admins?

Technical curiosity is a disease - and I''m afflicted with it!  :)

Al Hopper  Logical Approach Inc, Plano, TX.  al at logical-approach.com
           Voice: 972.379.2133 Fax: 972.379.2134  Timezone: US CDT
OpenSolaris.Org Community Advisory Board (CAB) Member - Apr 2005

Eric Schrock

2006-Mar-30 04:43 UTC

head link

[zfs-discuss] disk drive sparing - questions not answers

I''d wait about 24 hours.  I''ve finished up the prototype and
the team is
reviewing the proposed interfaces.  I''ll forward the proposal to a
larger audience once it''s passed some internal scrutiny.

- Eric

On Wed, Mar 29, 2006 at 08:20:27PM -0600, Al Hopper
wrote:> 
> I understand that ZFS provides fault data, via the zpool status command,
> that would (probably ??) allow a knowledgeable[0] zfs admin to determine
> (manually) when a zfs hardware element (today that''s usually
called a "disk
> drive") might need "maintainence" or might need to be
replaced.  And there
> are sufficient features/facilities available in the current (b36+) release
> of ZFS to allow that "faulty" element to be replaced.
> 
> But what I''m curious about, is what the ZFS Team plan is to
support
> automatic sparing and automatic substitution/replacement of faulty ZFS
> elements with ZFS pre-allocated "spares".
> 
> Many hardware RAID controllers support the concept of
''spares''.  Some allow
> the spare to be powered down - so that the spare does not fail just as the
> drive it is intended to replace, also fails!  Whoops - the wonders of MTBF
> and the statistical likelyhood that identical drives, with identical
> runtimes and MTBF specs, operated in an identical runtime environment, are
> likely to fail within a short timeframe of one another.
> 
> [0] Unfortunately, the "supply" of knowledgeable and talented
admins in
> various technical disciplines seems (to me) to be deteriorating, rather
> than increasing, over time.  Why is that?  Some possibilities:
> 
> - Perhaps due to the trend to outsource different computer admin related
> functions to less experienced, and lower cost, labor pools.
> 
> - perhaps due to the fact that technical management does not understand or
> appreciate, the required skillsets and is simply not prepared to pay for
> them.
> 
> - perhaps because of the theory that modern hardware is so reliable that
> human involvement is simply un-necessary.  And that the supplier of said
> hardware has already "engineered out", the expected failure
modes.
> 
> - perhaps because the current trend is that most companies are simply not
> willing to pay the ongoing costs to keep their computer personnel trained
> and equipped with the necessary technical skillsets to afford the company
> a technical advantage.
> 
> - perhaps because the management has been conned into spending their entire
> IT/computer budget "unwisely" into technology which promised to
solve all
> their reliability/uptime issues.  Yep - they blew the budget on the
"silver
> bullet", as skillfully presented by adept "namebrand"
company sales droids.
> Can you spell "SAP"?!
> 
> Summary: What is the ZFS sparing policy vision and how can it be best
> implemented?  Should it be part of ZFS, or should it be an independent
> software suite that is ZFS aware, but customizable by the individual ZFS
> user.  If it''s customizable - how much customization can one allow
before
> it becomes totally ineffective through mis-configuration by
> less-than-stellar zfs admins?
> 
> Technical curiosity is a disease - and I''m afflicted with it!  :)
> 
> Al Hopper  Logical Approach Inc, Plano, TX.  al at logical-approach.com
>            Voice: 972.379.2133 Fax: 972.379.2134  Timezone: US CDT
> OpenSolaris.Org Community Advisory Board (CAB) Member - Apr 2005
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
--
Eric Schrock, Solaris Kernel Development       http://blogs.sun.com/eschrock

Apparently Analagous Threads

Search for more apparently analagous threads

zfs discuss - Mar 2006 - disk drive sparing - questions not answers

[zfs-discuss] disk drive sparing - questions not answers

[zfs-discuss] disk drive sparing - questions not answers

Apparently Analagous Threads