thr3ads.net - zfs discuss - [zfs-discuss] zpool with RAID-5 from intelligent storage arrays [Jun 2008]

If this information is useful, please help other people find it:
Share via:

zfsmonk

2008-Jun-14 15:09 UTC

[zfs-discuss] zpool with RAID-5 from intelligent storage arrays

Mentioned on
http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide is the
following:
"ZFS works well with storage based protected LUNs (RAID-5 or mirrored LUNs
from intelligent storage arrays). However, ZFS cannot heal corrupted blocks that
are detected by ZFS checksums."

based upon that, if we have LUNs already in RAID5 being served from intelligent
storage arrays, is it any benefit to create the zpool in a mirror if zfs
can''t heal any corrupted blocks? Or would we just be wasting disk
space?
 
 
This message posted from opensolaris.org

Tomas Ögren

2008-Jun-14 15:25 UTC

head link

[zfs-discuss] zpool with RAID-5 from intelligent storage arrays

On 14 June, 2008 - zfsmonk sent me these 0,7K bytes:
> Mentioned on
> http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide
> is the following: "ZFS works well with storage based protected LUNs
> (RAID-5 or mirrored LUNs from intelligent storage arrays). However,
> ZFS cannot heal corrupted blocks that are detected by ZFS checksums."
> 
> based upon that, if we have LUNs already in RAID5 being served from
> intelligent storage arrays, is it any benefit to create the zpool in a
> mirror if zfs can''t heal any corrupted blocks? Or would we just be
> wasting disk space?
Let''s say you have a raid thing called A.. If you use that as ZFS
storage and ZFS detects bit errors in it, there''s not much it can do
other than say "your storage sucks".

If you have another raid thing called B and you mirror A and B through
ZFS.. Then A comes along and flips some bits again.. then ZFS checks B,
sees that it''s still correct and fixes A.

A might be intelligent storage and can cope with a disk dying, but if A
delivers bit errors up to ZFS - then ZFS can''t fix it. If A is actually
dumb storage and you leave the raid part to ZFS, then it can fix.

/Tomas
-- 
Tomas ?gren, stric at acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Ume?
`- Sysadmin at {cs,acc}.umu.se

Bob Friesenhahn

2008-Jun-14 16:11 UTC

head link

[zfs-discuss] zpool with RAID-5 from intelligent storage arrays

On Sat, 14 Jun 2008, zfsmonk wrote:
> Mentioned on 
> http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide 
> is the following: "ZFS works well with storage based protected LUNs 
> (RAID-5 or mirrored LUNs from intelligent storage arrays). However, 
> ZFS cannot heal corrupted blocks that are detected by ZFS 
> checksums."
This basically means that the checksum itself is not sufficient to 
accomplish correction.  However if ZFS-level RAID is used, the correct 
block can be obtained from a redundant copy.
> based upon that, if we have LUNs already in RAID5 being served from 
> intelligent storage arrays, is it any benefit to create the zpool in 
> a mirror if zfs can''t heal any corrupted blocks? Or would we just
be
> wasting disk space?
This is a matter of opinion.  If ZFS does not have access to 
redundancy then it can not correct any problems that it encounters, 
and could even panic the system or the entire pool could be lost. 
However, if the storage array and all associated drivers, adaptors, 
memory, and links are working correctly, then this risk may be 
acceptable (to you).

ZFS experts at Sun say that even the best storage arrays may not 
detect and correct some problems and that complex systems can produce 
errors even though all of their components seem to be working 
correctly.  This is in spite of Sun also making a living by selling 
such products.  The storage array is only able to correct errors it 
detects due to the hardware reporting an unrecoverable error condition 
or by double-checking using data on a different drive.  Since storage 
arrays want to be fast they are likely to engage additional validity 
checks/correction only after a problem has already been reported (or 
during a scrub/resilver) rather than as a matter of course.

A problem which may occur is that your storage array may say that the 
data is good while ZFS says that there is bad data.  Under these 
conditions there might not be a reasonable way to correct the problem 
other than to lose the data.  If the zfs pool requires the failed data 
in order to operate, then the entire pool could be lost.

Bob
=====================================Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Brian Wilson

2008-Jun-14 17:11 UTC

head link

[zfs-discuss] zpool with RAID-5 from intelligent storage arrays

> On Sat, 14 Jun 2008, zfsmonk wrote:
> 
> > Mentioned on 
> > 
> http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide 
> 
> > is the following: "ZFS works well with storage based protected
LUNs
> 
> > (RAID-5 or mirrored LUNs from intelligent storage arrays). However, 
> 
> > ZFS cannot heal corrupted blocks that are detected by ZFS 
> > checksums."
> 
> This basically means that the checksum itself is not sufficient to 
> accomplish correction.  However if ZFS-level RAID is used, the correct 
> 
> block can be obtained from a redundant copy.
> 
> > based upon that, if we have LUNs already in RAID5 being served from 
> 
> > intelligent storage arrays, is it any benefit to create the zpool in 
> 
> > a mirror if zfs can''t heal any corrupted blocks? Or would we
just be
> 
> > wasting disk space?
> 
> This is a matter of opinion.  If ZFS does not have access to 
> redundancy then it can not correct any problems that it encounters, 
> and could even panic the system or the entire pool could be lost. 
> However, if the storage array and all associated drivers, adaptors, 
> memory, and links are working correctly, then this risk may be 
> acceptable (to you).
> 
> ZFS experts at Sun say that even the best storage arrays may not 
> detect and correct some problems and that complex systems can produce 
> 
> errors even though all of their components seem to be working 
> correctly.  This is in spite of Sun also making a living by selling 
> such products.  The storage array is only able to correct errors it 
> detects due to the hardware reporting an unrecoverable error condition 
> 
> or by double-checking using data on a different drive.  Since storage 
> 
> arrays want to be fast they are likely to engage additional validity 
> checks/correction only after a problem has already been reported (or 
> during a scrub/resilver) rather than as a matter of course.
> 
> A problem which may occur is that your storage array may say that the 
> 
> data is good while ZFS says that there is bad data.  Under these 
> conditions there might not be a reasonable way to correct the problem 
> 
> other than to lose the data.  If the zfs pool requires the failed data 
> 
> in order to operate, then the entire pool could be lost.
> 
Couple of questions on this topic - 

What''s the percent of data in a zpool that if it gets one of these bit
corruption errors, will actually cause the zpool to fail?  Is it a higher/lower
percent than what it would take to fatally and irrevocably corrupt UFS, or VxFS
to the point where a restore is required?

Given that today''s storage arrays catch a good percentage of errors and
correct them (for the intelligent arrays I have in mind anyway), is
we''re talking about the nasty, silent corruption I''ve been
reading about that occurs in huge datasets where the RAID thinks it''s
good, but it''s actually garbage?  From what I remember reading,
that''s an low occurrence rate and only became noticeable because
we''re dealing in such large amounts of data these days.  Am I wrong
here?

So, looking at making operational decisions in the short term, I have to ask
specifically.  Is it more or less likely that a zpool will die and have to be
restored than UFS or VxFS filesystems on a VxVM volume?

My opinions and questions are my own, and do not necessarily represent those of
my employer. (or my coworkers, or anyone else)

cheers,
Brian
> Bob
> =====================================> Bob Friesenhahn
> bfriesen at simple.dallas.tx.us,
http://www.simplesystems.org/users/bfriesen/
> GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/
> 
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Brian Wilson

2008-Jun-14 17:21 UTC

head link

[zfs-discuss] zpool with RAID-5 from intelligent storage arrays

----- Original Message -----
From: Brian Wilson <bfwilson at doit.wisc.edu>
Date: Saturday, June 14, 2008 12:12 pm
Subject: Re: [zfs-discuss] zpool with RAID-5 from intelligent storage arrays
To: Bob Friesenhahn <bfriesen at simple.dallas.tx.us>
Cc: zfs-discuss at opensolaris.org

> > On Sat, 14 Jun 2008, zfsmonk wrote:
> > 
> > > Mentioned on 
> > > 
> > 
> http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide 
> 
> > 
> > > is the following: "ZFS works well with storage based
protected
> LUNs 
> > 
> > > (RAID-5 or mirrored LUNs from intelligent storage arrays). 
> However, 
> > 
> > > ZFS cannot heal corrupted blocks that are detected by ZFS 
> > > checksums."
> > 
> > This basically means that the checksum itself is not sufficient to 
> > accomplish correction.  However if ZFS-level RAID is used, the 
> correct 
> > 
> > block can be obtained from a redundant copy.
> > 
> > > based upon that, if we have LUNs already in RAID5 being served 
> from 
> > 
> > > intelligent storage arrays, is it any benefit to create the zpool
> in 
> > 
> > > a mirror if zfs can''t heal any corrupted blocks? Or
would we just
> be 
> > 
> > > wasting disk space?
> > 
> > This is a matter of opinion.  If ZFS does not have access to 
> > redundancy then it can not correct any problems that it encounters, 
> 
> > and could even panic the system or the entire pool could be lost. 
> > However, if the storage array and all associated drivers, adaptors, 
> 
> > memory, and links are working correctly, then this risk may be 
> > acceptable (to you).
> > 
> > ZFS experts at Sun say that even the best storage arrays may not 
> > detect and correct some problems and that complex systems can 
> produce 
> > 
> > errors even though all of their components seem to be working 
> > correctly.  This is in spite of Sun also making a living by selling 
> 
> > such products.  The storage array is only able to correct errors it 
> 
> > detects due to the hardware reporting an unrecoverable error 
> condition 
> > 
> > or by double-checking using data on a different drive.  Since 
> storage 
> > 
> > arrays want to be fast they are likely to engage additional validity 
> 
> > checks/correction only after a problem has already been reported (or 
> 
> > during a scrub/resilver) rather than as a matter of course.
> > 
> > A problem which may occur is that your storage array may say that 
> the 
> > 
> > data is good while ZFS says that there is bad data.  Under these 
> > conditions there might not be a reasonable way to correct the 
> problem 
> > 
> > other than to lose the data.  If the zfs pool requires the failed 
> data 
> > 
> > in order to operate, then the entire pool could be lost.
> > 
> 
> Couple of questions on this topic - 
> 
> What''s the percent of data in a zpool that if it gets one of these
bit
> corruption errors, will actually cause the zpool to fail?  Is it a 
> higher/lower percent than what it would take to fatally and 
> irrevocably corrupt UFS, or VxFS to the point where a restore is 
> required? 
> 
> Given that today''s storage arrays catch a good percentage of
errors
> and correct them (for the intelligent arrays I have in mind anyway), 
> is we''re talking about the nasty, silent corruption I''ve
been reading
> about that occurs in huge datasets where the RAID thinks it''s
good,
> but it''s actually garbage?  From what I remember reading,
that''s an
> low occurrence rate and only became noticeable because we''re
dealing
> in such large amounts of data these days.  Am I wrong here?
> 
> So, looking at making operational decisions in the short term, I have 
> to ask specifically.  Is it more or less likely that a zpool will die 
> and have to be restored than UFS or VxFS filesystems on a VxVM volume?
> 
To put it specifically -
I have currently got a volume (a bunch of them) that''s on one
intelligent array, on UFS or VxFS on VxVM volumes.
I''m not intending at this point if I use ZFS to mirror it to another
array (which may or may not exist) and double my use of expensive disk.  So,
that puts me at this risk described here, where the zpool could go poof.

What are the odds, in that configuration of zpool (no mirroring, just using the
intelligent disk as concatenated luns in the zpool) that if we have this silent
corruption, the whole zpool dies?
If anyone knows, what''s the comparative odds of the VxVM volume, UFS or
VxFS filesystem similarly dying in the same scenario?

Thanks!
Brian

> My opinions and questions are my own, and do not necessarily represent 
> those of my employer. (or my coworkers, or anyone else)
> 
> cheers,
> Brian
> 
> > Bob
> > =====================================> > Bob Friesenhahn
> > bfriesen at simple.dallas.tx.us,
http://www.simplesystems.org/users/bfriesen/
> > GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/
> > 
> > _______________________________________________
> > zfs-discuss mailing list
> > zfs-discuss at opensolaris.org
> > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Bob Friesenhahn

2008-Jun-14 19:19 UTC

head link

[zfs-discuss] zpool with RAID-5 from intelligent storage arrays

On Sat, 14 Jun 2008, Brian Wilson wrote:
> What are the odds, in that configuration of zpool (no mirroring, 
> just using the intelligent disk as concatenated luns in the zpool) 
> that if we have this silent corruption, the whole zpool dies? If 
> anyone knows, what''s the comparative odds of the VxVM volume, UFS
or
> VxFS filesystem similarly dying in the same scenario?
I don''t know the answer to that.  Probably nobody knows the answer 
since there is no formal research project to analyze it and no 
automatic collection agent to report the data.  You can scan the list 
archives to find the zfs horror stories.  Most of the "whole pool 
died" horror stories are not due to data loss on a properly maintained 
RAID array.

Zfs does not come with fsck.  That is both good and bad.  With fsck 
you can simply say ''yes'' to the obscure questions and (if you
are
lucky) after a few hours (or a day), there will be something left and 
perhaps that critical file is still to be found in the lost+found 
directory if you can figure out which one it is among all the files 
which which were previously deleted and are now ressurected.

With zfs you can scrub the pool at the system level.  This allows you 
to discover many issues early before they become nightmares.

Bob
=====================================Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

dick hoogendijk

2008-Jun-14 19:32 UTC

head link

[zfs-discuss] zpool with RAID-5 from intelligent storage arrays

On Sat, 14 Jun 2008 14:19:05 -0500 (CDT)
Bob Friesenhahn <bfriesen at simple.dallas.tx.us> wrote:
> With zfs you can scrub the pool at the system level.  This allows you 
> to discover many issues early before they become nightmares.
#zpool status
scrub: none requested

My question is really, do I wait ''till scrub is requested or am I
supposed to scrub on a regular basis myself.

-- 
Dick Hoogendijk -- PGP/GnuPG key: 01D2433D
++ http://nagual.nl/ + SunOS sxce snv90 ++

Bob Friesenhahn

2008-Jun-14 19:51 UTC

head link

[zfs-discuss] zpool with RAID-5 from intelligent storage arrays

On Sat, 14 Jun 2008, dick hoogendijk wrote:>> With zfs you can scrub the pool at the system level.  This allows you
>> to discover many issues early before they become nightmares.
>
> #zpool status
> scrub: none requested
>
> My question is really, do I wait ''till scrub is requested or am I 
> supposed to scrub on a regular basis myself.
I think that "none requested" likely means that the administrator has 
never issued a request to scrub the pool.

How often to scrub depends on how much you care about your data and 
how invasive the scrub is to other activities (I/O bandwidth 
consumption, snapshots, acoustic noise, electricity consumption), and 
how long the scrub takes.  My pool is set to be scrubbed every night 
via a cron job:

# Scrub the pool for errors
20 4 * * * /usr/sbin/zpool scrub MyPool

Scrub helps find and correct residual problems before they cause 
serious problems such as when the data is used, or during resilvering. 
The statistical chances of problems during disk resilvering are surely 
significantly improved if scrub is executed often since scrub and 
resilvering both access the same data.  This is based on the "Does the 
sun comes up every day?" principle.

Bob
=====================================Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Al Hopper

2008-Jun-15 00:09 UTC

head link

[zfs-discuss] zpool with RAID-5 from intelligent storage arrays

On Sat, Jun 14, 2008 at 12:11 PM, Brian Wilson <bfwilson at doit.wisc.edu>
wrote:>
>
>
>
>> On Sat, 14 Jun 2008, zfsmonk wrote:
>>
>> > Mentioned on
>> >
>> http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide
>>
>> > is the following: "ZFS works well with storage based
protected LUNs
>>
>> > (RAID-5 or mirrored LUNs from intelligent storage arrays).
However,
>>
>> > ZFS cannot heal corrupted blocks that are detected by ZFS
>> > checksums."
>>
>> This basically means that the checksum itself is not sufficient to
>> accomplish correction.  However if ZFS-level RAID is used, the correct
>>
>> block can be obtained from a redundant copy.
>>
>> > based upon that, if we have LUNs already in RAID5 being served
from
>>
>> > intelligent storage arrays, is it any benefit to create the zpool
in
>>
>> > a mirror if zfs can''t heal any corrupted blocks? Or would
we just be
>>
>> > wasting disk space?
>>
>> This is a matter of opinion.  If ZFS does not have access to
>> redundancy then it can not correct any problems that it encounters,
>> and could even panic the system or the entire pool could be lost.
>> However, if the storage array and all associated drivers, adaptors,
>> memory, and links are working correctly, then this risk may be
>> acceptable (to you).
>>
>> ZFS experts at Sun say that even the best storage arrays may not
>> detect and correct some problems and that complex systems can produce
>>
>> errors even though all of their components seem to be working
>> correctly.  This is in spite of Sun also making a living by selling
>> such products.  The storage array is only able to correct errors it
>> detects due to the hardware reporting an unrecoverable error condition
>>
>> or by double-checking using data on a different drive.  Since storage
>>
>> arrays want to be fast they are likely to engage additional validity
>> checks/correction only after a problem has already been reported (or
>> during a scrub/resilver) rather than as a matter of course.
>>
>> A problem which may occur is that your storage array may say that the
>>
>> data is good while ZFS says that there is bad data.  Under these
>> conditions there might not be a reasonable way to correct the problem
>>
>> other than to lose the data.  If the zfs pool requires the failed data
>>
>> in order to operate, then the entire pool could be lost.
>>
>
> Couple of questions on this topic -
>
> What''s the percent of data in a zpool that if it gets one of these
bit corruption errors, will actually cause the zpool to fail?  Is it a
higher/lower percent than what it would take to fatally and irrevocably corrupt
UFS, or VxFS to the point where a restore is required?
>
> Given that today''s storage arrays catch a good percentage of
errors and correct them (for the intelligent arrays I have in mind anyway), is
we''re talking about the nasty, silent corruption I''ve been
reading about that occurs in huge datasets where the RAID thinks it''s
good, but it''s actually garbage?  From what I remember reading,
that''s an low occurrence rate and only became noticeable because
we''re dealing in such large amounts of data these days.  Am I wrong
here?
Yes - you''re "wrong" - but not because you''re
unintelligent or saying
something "wrong", but because you can be let down by a bad FC (Fibre
Channel) port on a switch (random noise) or by a bad optical component
in the optical path between your host system that is writing the data
and the final destination (read "expensive FC hardware SAN box") - or
a bad optical connection or a "flaky" data comm link.  Or  ... a
firmware bug (in your high $dollar SAN box) after the last (firmware)
upgrade you performed on your SAN box.

There are already well documented cases where an OP mailed the ZFS
list and said "my SAN box has been working correctly for X years, and
when I used ZFS to store data on it, ZFS "said" that the data is
"bad".  ZFS is "broken" (technical term (TM)) and not ready
for prime
time.  In *all* cases, it turned out the ZFS was *not* broken and that
there was a problem somewhere in the data path, or with the SAN
hardware/firmware.  Also - look at the legacy posts and see where an
OpenSolaris developer discovered that the errors being reported by ZFS
were caused by a flaky/noisey power supply in his desktop box -
despite the fact that the particular desktop was very popular with
other (OpenSolaris) kernel developers as was widely regarded as
"fool-proof".

Its probably true to state that ZFS is the first filesystem that
allowed those high-$dollar hardware SAN vendors to actually verify
that their complex hardware/firmware chain was behaving as designed,
end-to-end.    Where "end-to-end" is defined as the data that that
host system writes is actually the data that can be retrieved N years
after its been written!
> So, looking at making operational decisions in the short term, I have to
ask specifically.  Is it more or less likely that a zpool will die and have to
be restored than UFS or VxFS filesystems on a VxVM volume?
>
> My opinions and questions are my own, and do not necessarily represent
those of my employer. (or my coworkers, or anyone else)
>
> cheers,
> Brian
>
>> Bob
>> =====================================>> Bob Friesenhahn
>> bfriesen at simple.dallas.tx.us,
http://www.simplesystems.org/users/bfriesen/
>> GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/
>>
>> _______________________________________________
>> zfs-discuss mailing list
>> zfs-discuss at opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
Regards,

-- 
Al Hopper Logical Approach Inc,Plano,TX al at logical-approach.com
 Voice: 972.379.2133 Timezone: US CDT
OpenSolaris Governing Board (OGB) Member - Apr 2005 to Mar 2007
http://www.opensolaris.org/os/community/ogb/ogb_2005-2007/

Brian Hechinger

2008-Jun-15 04:10 UTC

head link

[zfs-discuss] zpool with RAID-5 from intelligent storage arrays

On Sat, Jun 14, 2008 at 02:19:05PM -0500, Bob Friesenhahn
wrote:> On Sat, 14 Jun 2008, Brian Wilson wrote:
> 
> > What are the odds, in that configuration of zpool (no mirroring, 
> > just using the intelligent disk as concatenated luns in the zpool) 
> > that if we have this silent corruption, the whole zpool dies? If 
> > anyone knows, what''s the comparative odds of the VxVM volume,
UFS or
> > VxFS filesystem similarly dying in the same scenario?
> 
> I don''t know the answer to that.  Probably nobody knows the answer
> since there is no formal research project to analyze it and no 
> automatic collection agent to report the data.  You can scan the list 
> archives to find the zfs horror stories.  Most of the "whole pool 
> died" horror stories are not due to data loss on a properly maintained
> RAID array.
ZFS uses ditto blocks for meta-data.  I think it would be really hard for
silent corruptions to render a ZFS volume (even on a single disk with no
RAID or mirroring) unless luck just wasn''t on your side and both copies
of your meta-data got corrupted.  That being said, you can increase the
number of ditto copies that are made (I think 2 is the default for meta-data
and 1 is the default for data) and increase your chances of survival on a
single disk system.

-brian
-- 
"Coding in C is like sending a 3 year old to do groceries. You gotta
tell them exactly what you want or you''ll end up with a cupboard full
of
pop tarts and pancake mix." -- IRC User (http://www.bash.org/?841435)

Brian Hechinger

2008-Jun-15 04:12 UTC

head link

[zfs-discuss] zpool with RAID-5 from intelligent storage arrays

On Sat, Jun 14, 2008 at 02:51:31PM -0500, Bob Friesenhahn
wrote:> 
> I think that "none requested" likely means that the administrator
has
> never issued a request to scrub the pool.
Or the system.  That status line will show the last scrub/resilver to
have taken place.  "None requested" means that no scrub/resilver has
happened.
> How often to scrub depends on how much you care about your data and 
> how invasive the scrub is to other activities (I/O bandwidth 
> consumption, snapshots, acoustic noise, electricity consumption), and 
> how long the scrub takes.  My pool is set to be scrubbed every night 
> via a cron job:
And like all other things of this nature, the more often you do it, the
less invasive it will be as there is less to do.  That being said, I still
wouldn''t recommend hourly scrubs. ;)
> The statistical chances of problems during disk resilvering are surely 
> significantly improved if scrub is executed often since scrub and 
> resilvering both access the same data.  This is based on the "Does the
> sun comes up every day?" principle.
Also I would think that this would in the worst case scenario reduce the
amount of time to resilver, but I could be wrong.

-brian
-- 
"Coding in C is like sending a 3 year old to do groceries. You gotta
tell them exactly what you want or you''ll end up with a cupboard full
of
pop tarts and pancake mix." -- IRC User (http://www.bash.org/?841435)

Bob Friesenhahn

2008-Jun-15 05:16 UTC

head link

[zfs-discuss] zpool with RAID-5 from intelligent storage arrays

On Sun, 15 Jun 2008, Brian Hechinger wrote:
>> how long the scrub takes.  My pool is set to be scrubbed every night
>> via a cron job:
>
> And like all other things of this nature, the more often you do it, the
> less invasive it will be as there is less to do.  That being said, I still
> wouldn''t recommend hourly scrubs. ;)
Unless things are quite broken, why would there be less to do?  It 
seems that the amount of work to do depends on the amount of data 
stored (and data transfer rate) since the scrub''s task is to read all 
of the data and make sure that it is consistent.

If my math is right, it seems that scrub on my drive array proceeds at 
about 15.2GB/minute (259MB/second).  It would be interesting to see 
what typical scrub rates are for various scenarios.

Bob
=====================================Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Erik Trimble

2008-Jun-16 08:45 UTC

head link

[zfs-discuss] zpool with RAID-5 from intelligent storage arrays

One thing I should mention on this is that I''ve had  _very_ bad 
experience with using single-LUN ZFS filesystems over FC.

that is, using an external SAN box to create a single LUN, export that 
LUN to a FC-connected host, then creating a pool as follows:

zpool create tank <LUN_ID>

It works fine, up until something bad happens to the array, or the FC 
connection (like, say, losing power to the whole system), and the host 
computer cannot talk to the LUN.

This will corrupt the zpool permanently, and there is no way to fix the 
pool  (and, without some magic in /etc/system , will leave the host in a 
permanent kernel panic loop).  This is a known bug, and the fix isn''t 
looking to be available anytime soon.

This problem doesn''t seem to manifest itself if the zpool has redundant
members, even if they are on the same array (and thus, the host loses 
contact with both LUNs at the same time).

So, for FC or iSCSI targets, I would HIGHLY recommend that ZFS _ALWAYS_ 
be configured in a redundant setup.

-- 
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA
Timezone: US/Pacific (GMT-0800)

Vincent Fox

2008-Jun-16 17:53 UTC

head link

[zfs-discuss] zpool with RAID-5 from intelligent storage arrays

I''m not sure why people obsess over this issue so much.  Disk is cheap.

We have a fair number of 3510 and 2540 on our SAN.  They make RAID-5 LUNs
available to various servers.

On the servers we take RAID-5 LUNs from different arrays and ZFS mirror them. 
So if any array goes away we are still uperational.

VERY ROBUST!

If you are trying to be cheap, then you could:
1) Use copies=2 to make sure data is duplicated
2) Advertise individual disks as LUN build RAIDZ2 on them.

The advantage of intelligent array is I have low-level control of matching a
hot-spare in array#1 to the LUN in array#1.  ZFS does not have this fine-grained
hot-spare capability yet so I just don''t use ZFS sparing.  Also the
array has SAN connectivity and caching and dual-controllers that just
don''t exist in the JBOD world.

I am hosting mailboxs for > 50K people, we cannot afford lengthy downtimes.
 
 
This message posted from opensolaris.org

Bob Friesenhahn

2008-Jun-16 18:15 UTC

head link

[zfs-discuss] zpool with RAID-5 from intelligent storage arrays

On Mon, 16 Jun 2008, Vincent Fox wrote:
> Also the array has SAN connectivity and caching and 
> dual-controllers that just don''t exist in the JBOD world.
As a clarification, you can convince your StorageTek 2540 to appear as 
JBOD on the SAN.  Then you obtain the SAN connectivity and caching and 
dual-controllers.  One does not exclude the other.  The sparing and 
user interface might be nicer using CAM and RAID-5.

Bob
=====================================Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Robert Milkowski

2008-Jun-16 22:33 UTC

head link

[zfs-discuss] zpool with RAID-5 from intelligent storage arrays

Hello Erik,

Monday, June 16, 2008, 9:45:13 AM, you wrote:

ET> One thing I should mention on this is that I''ve had  _very_ bad 
ET> experience with using single-LUN ZFS filesystems over FC.

ET> that is, using an external SAN box to create a single LUN, export that
ET> LUN to a FC-connected host, then creating a pool as follows:

ET> zpool create tank <LUN_ID>

ET> It works fine, up until something bad happens to the array, or the FC 
ET> connection (like, say, losing power to the whole system), and the host
ET> computer cannot talk to the LUN.

ET> This will corrupt the zpool permanently, and there is no way to fix the
ET> pool  (and, without some magic in /etc/system , will leave the host in a
ET> permanent kernel panic loop).  This is a known bug, and the fix
isn''t
ET> looking to be available anytime soon.

ET> This problem doesn''t seem to manifest itself if the zpool has
redundant
ET> members, even if they are on the same array (and thus, the host loses 
ET> contact with both LUNs at the same time).

ET> So, for FC or iSCSI targets, I would HIGHLY recommend that ZFS _ALWAYS_
ET> be configured in a redundant setup.


Have you got more details or at least bug ids?
Is it only (I dboubt) fc related?

-- 
Best regards,
 Robert                            mailto:milek at task.gda.pl
                                       http://milek.blogspot.com

Jeff Bonwick

2008-Jul-01 01:28 UTC

head link

[zfs-discuss] zpool with RAID-5 from intelligent storage arrays

Using ZFS to mirror two hardware RAID-5 LUNs is actually quite nice.
Because the data is mirrored at the ZFS level, you get all the benefits
of self-healing.  Moreover, you can survive a great variety of hardware
failures: three or more disks can die (one in the first array, two or
more in the second), failure of a cable, or failure of an entire array.

Jeff

On Sat, Jun 14, 2008 at 08:09:49AM -0700, zfsmonk wrote:> Mentioned on
http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide is the
following:
> "ZFS works well with storage based protected LUNs (RAID-5 or mirrored
LUNs from intelligent storage arrays). However, ZFS cannot heal corrupted blocks
that are detected by ZFS checksums."
> 
> based upon that, if we have LUNs already in RAID5 being served from
intelligent storage arrays, is it any benefit to create the zpool in a mirror if
zfs can''t heal any corrupted blocks? Or would we just be wasting disk
space?
>  
>  
> This message posted from opensolaris.org
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Erik Trimble

2008-Jul-01 02:42 UTC

head link

[zfs-discuss] zpool with RAID-5 from intelligent storage arrays

Jeff Bonwick wrote:> Using ZFS to mirror two hardware RAID-5 LUNs is actually quite nice.
> Because the data is mirrored at the ZFS level, you get all the benefits
> of self-healing.  Moreover, you can survive a great variety of hardware
> failures: three or more disks can die (one in the first array, two or
> more in the second), failure of a cable, or failure of an entire array.
>
> Jeff
>
> On Sat, Jun 14, 2008 at 08:09:49AM -0700, zfsmonk wrote:
>   
>> Mentioned on
http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide is the
following:
>> "ZFS works well with storage based protected LUNs (RAID-5 or
mirrored LUNs from intelligent storage arrays). However, ZFS cannot heal
corrupted blocks that are detected by ZFS checksums."
>>
>> based upon that, if we have LUNs already in RAID5 being served from
intelligent storage arrays, is it any benefit to create the zpool in a mirror if
zfs can''t heal any corrupted blocks? Or would we just be wasting disk
space?
>>     As Jeff mentioned, use two HW RAID-5 LUNs in a zpool for a mirror (or, 
even 3+ LUNs for a RAID-Z of RAID-5 :-)

The quote from the Best Practices Guide is applicable to single LUN 
zpools (and, applies to any single-vdev zpool).  Indeed, there are some 
nasty problems with using single-LUN zpools, so DON''T DO IT.   ZFS is 
happiest (and you will be too) when you allow some redundancy inside 
ZFS, and not just at the hardware level.

-- 

Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA
Timezone: US/Pacific (GMT-0800)

Mike Gerdts

2008-Jul-01 02:44 UTC

head link

[zfs-discuss] zpool with RAID-5 from intelligent storage arrays

On Mon, Jun 16, 2008 at 5:33 PM, Robert Milkowski <milek at task.gda.pl>
wrote:> Have you got more details or at least bug ids?
> Is it only (I dboubt) fc related?
I ran into something that looks like

6594621 dangling dbufs (dn=ffffff056a5ad0a8, dbuf=ffffff0520303300)
during stress

with LDoms 1.0.  It seems as though data that zfs in a guest LDom
thought was committed was not really committed.  Not FC related, but
it is quite frustrating to deal with a panic loop in a file system
(zpool) not required to boot the system to single user mode.  That one
has since been fixed.

More recently I reported:

6709336 panic in mzap_open(): avl_find() succeeded inside avl_add()

If the file that triggered this panic were in a place that was read at
boot, it would be a panic loop.

I asked on the list[1] if anyone was interested in a dump to dig into
it more, with no takers.  Earlier today I noticed that Jeff Bonwick
said that not getting dumps was criminal[2], so a special cc goes out
to him.  :)

1. http://mail.opensolaris.org/pipermail/zfs-discuss/2008-May/047869.html
2. http://mail.opensolaris.org/pipermail/caiman-discuss/2008-June/004405.html

I''ve run into many other problems with I/O errors when doing a stat()
of a file.  Repeated tries fails, but a reboot seems to clear it.
zpool scrub reports no errors and the pool consists of a single mirror
vdev.  I haven''t filed a bug on this yet.

-- 
Mike Gerdts
http://mgerdts.blogspot.com/

zfs discuss - Jun 2008 - zpool with RAID-5 from intelligent storage arrays

[zfs-discuss] zpool with RAID-5 from intelligent storage arrays

[zfs-discuss] zpool with RAID-5 from intelligent storage arrays

[zfs-discuss] zpool with RAID-5 from intelligent storage arrays

[zfs-discuss] zpool with RAID-5 from intelligent storage arrays

[zfs-discuss] zpool with RAID-5 from intelligent storage arrays

[zfs-discuss] zpool with RAID-5 from intelligent storage arrays

[zfs-discuss] zpool with RAID-5 from intelligent storage arrays

[zfs-discuss] zpool with RAID-5 from intelligent storage arrays

[zfs-discuss] zpool with RAID-5 from intelligent storage arrays

[zfs-discuss] zpool with RAID-5 from intelligent storage arrays

[zfs-discuss] zpool with RAID-5 from intelligent storage arrays

[zfs-discuss] zpool with RAID-5 from intelligent storage arrays

[zfs-discuss] zpool with RAID-5 from intelligent storage arrays

[zfs-discuss] zpool with RAID-5 from intelligent storage arrays

[zfs-discuss] zpool with RAID-5 from intelligent storage arrays

[zfs-discuss] zpool with RAID-5 from intelligent storage arrays

[zfs-discuss] zpool with RAID-5 from intelligent storage arrays

[zfs-discuss] zpool with RAID-5 from intelligent storage arrays

[zfs-discuss] zpool with RAID-5 from intelligent storage arrays