thr3ads.net - zfs discuss - [zfs-discuss] ZFS configuration input needed. [Aug 2009]

If this information is useful, please help other people find it:
Share via:

Ron Mexico

2009-Aug-21 21:46 UTC

[zfs-discuss] ZFS configuration input needed.

I''m in the process of setting up a NAS for my company. It''s
going to be based on Open Solaris and ZFS, running on a Dell R710 with two SAS
5/E HBAs. Each HBA will be connected to a 24 bay Supermicro JBOD chassis. Each
chassis will have 12 drives to start out with, giving us room for expansion as
needed.

Ideally, I''d like to have a mirror of a raidz2 setup, but from the
documentation I''ve read, it looks like I can''t do that, and
that a stripe of mirrors is the only way to accomplish this.

I''m interested in hearing the opinions of others about the best way to
set this up.

Thanks!
-- 
This message posted from opensolaris.org

Ross Walker

2009-Aug-21 22:26 UTC

head link

[zfs-discuss] ZFS configuration input needed.

On Aug 21, 2009, at 5:46 PM, Ron Mexico <no-reply at opensolaris.org>  
wrote:
> I''m in the process of setting up a NAS for my company.
It''s going to
> be based on Open Solaris and ZFS, running on a Dell R710 with two  
> SAS 5/E HBAs. Each HBA will be connected to a 24 bay Supermicro JBOD  
> chassis. Each chassis will have 12 drives to start out with, giving  
> us room for expansion as needed.
>
> Ideally, I''d like to have a mirror of a raidz2 setup, but from the
> documentation I''ve read, it looks like I can''t do that,
and that a
> stripe of mirrors is the only way to accomplish this.
Why?

It uses as many drives as a RAID10, but you loose 1 more drive of  
usable space then RAID10 and you get less then half the performance.

You might be thinking of a RAID50 which would be multiple raidz vdevs  
in a zpool, or striped RAID5s.

If not then stick with multiple mirror vdevs in a zpool (RAID10).

-Ross

Tim Cook

2009-Aug-21 22:34 UTC

head link

[zfs-discuss] ZFS configuration input needed.

On Fri, Aug 21, 2009 at 5:26 PM, Ross Walker <rswwalker at gmail.com>
wrote:
> On Aug 21, 2009, at 5:46 PM, Ron Mexico <no-reply at opensolaris.org>
wrote:
>
>  I''m in the process of setting up a NAS for my company.
It''s going to be
>> based on Open Solaris and ZFS, running on a Dell R710 with two SAS 5/E
HBAs.
>> Each HBA will be connected to a 24 bay Supermicro JBOD chassis. Each
chassis
>> will have 12 drives to start out with, giving us room for expansion as
>> needed.
>>
>> Ideally, I''d like to have a mirror of a raidz2 setup, but from
the
>> documentation I''ve read, it looks like I can''t do
that, and that a stripe of
>> mirrors is the only way to accomplish this.
>>
>
> Why?
>
Because some people are paranoid.

>
> It uses as many drives as a RAID10, but you loose 1 more drive of usable
> space then RAID10 and you get less then half the performance.
>
And far more protection.


>
> You might be thinking of a RAID50 which would be multiple raidz vdevs in a
> zpool, or striped RAID5s.
>
> If not then stick with multiple mirror vdevs in a zpool (RAID10).
>
> -Ross

Raid10 won''t provide as much protection.  Raidz21, you can lose any 4
drives, and up to 14 if it''s the right 14.  Raid10, if you lose the
wrong
two drives, you''re done.

--Tim
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090821/6309e106/attachment.html>

Jason Pfingstmann

2009-Aug-21 22:51 UTC

head link

[zfs-discuss] ZFS configuration input needed.

As you can add multiple vdevs to a pool, my suggestion would be to do several
smaller raidz1 or raidz2 vdevs in the pool.

With your setup - assuming 2 HBAs @ 24 drives each your setup would have yielded
20 drives usable storage (about) (assuming raidz2 with 2 spares on each HBA) and
then mirrored.

Maximum number of drives before failure (idea scenario): 5 (assuming the spare
hasn''t caught up yet), 9 (assuming the spare had caught up and more
drives failed)

Suggested setup (at least as far as I''m concerned - and I am kinda new
at ZFS, but not new to storage systems):
5 x raidz2 w/ 9 disks = 35 drives usable (9 disks ea x 5 raidz2 = 45 total
drives - (5 raidz2 x 2 parity drives ea))
This leaves you with 3 drives that you can assign as spares (assuming 48 drives
total)

Maximum number of drives before failure (ideal scenario): 11 (assuming the spare
hasn''t caught up yet), 14 (assuming the spare had caught up and more
drives failed)

Keep in mind, the parity information will take up additional space as well, but
it seems you were looking for maximum redundancy (and this setup would give you
that).

Sorry, I just saw you were talking about 12 drives in each chassis.  A similar
thing applies, I would do 1 9 drive raidz2 in each chassis and add 2 total
spares and then add drives 9 at a time (and 1 more spare at some point).

Note: Keep in mind, I''m still kinda new to ZFS, so I may be completely
wrong...  (if I am, somebody, please correct me)

P-Chan
-- 
This message posted from opensolaris.org

Ross Walker

2009-Aug-21 22:52 UTC

head link

[zfs-discuss] ZFS configuration input needed.

On Aug 21, 2009, at 6:34 PM, Tim Cook <tim at cook.ms> wrote:
>
>
> On Fri, Aug 21, 2009 at 5:26 PM, Ross Walker <rswwalker at gmail.com>
> wrote:
> On Aug 21, 2009, at 5:46 PM, Ron Mexico <no-reply at opensolaris.org>
> wrote:
>
> I''m in the process of setting up a NAS for my company.
It''s going to
> be based on Open Solaris and ZFS, running on a Dell R710 with two  
> SAS 5/E HBAs. Each HBA will be connected to a 24 bay Supermicro JBOD  
> chassis. Each chassis will have 12 drives to start out with, giving  
> us room for expansion as needed.
>
> Ideally, I''d like to have a mirror of a raidz2 setup, but from the
> documentation I''ve read, it looks like I can''t do that,
and that a
> stripe of mirrors is the only way to accomplish this.
>
> Why?
>
> Because some people are paranoid.
If that is the case how about a separate zpool of large SATA disks and  
either snapshot and send/recv to it, or use AVT to replicate to it.

>
>
> It uses as many drives as a RAID10, but you loose 1 more drive of  
> usable space then RAID10 and you get less then half the performance.
>
> And far more protection.

It''s not worth the cost, the complexity is so high that it itself will
be a point of failure and performance is too low for it to be any use.

>
>
> You might be thinking of a RAID50 which would be multiple raidz  
> vdevs in a zpool, or striped RAID5s.
>
> If not then stick with multiple mirror vdevs in a zpool (RAID10).
>
> -Ross
>
> Raid10 won''t provide as much protection.  Raidz21, you can lose
any
> 4 drives, and up to 14 if it''s the right 14.  Raid10, if you lose
> the wrong two drives, you''re done.

Setup a side raidz2 zpool of SATA disks, snap the RAID10 and zsend it  
to the other pool. In the event of catastrophy you can run off the  
raidz2 pool temporarily until the mirror pool is fixed (and it would  
still perform better then the mirrored raidz2 setup!).

-Ross

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090821/baac50d9/attachment.html>

Ian Collins

2009-Aug-21 23:09 UTC

head link

[zfs-discuss] ZFS configuration input needed.

Ron Mexico wrote:> I''m in the process of setting up a NAS for my company.
It''s going to be based on Open Solaris and ZFS, running on a Dell R710
with two SAS 5/E HBAs. Each HBA will be connected to a 24 bay Supermicro JBOD
chassis. Each chassis will have 12 drives to start out with, giving us room for
expansion as needed.
>
> Ideally, I''d like to have a mirror of a raidz2 setup, but from the
documentation I''ve read, it looks like I can''t do that, and
that a stripe of mirrors is the only way to accomplish this.
>
> I''m interested in hearing the opinions of others about the best
way to set this up.
>
>   You''ll have to add a bit of meat to "this"!

What are you resiliency, space and performance requirements?

-- 
Ian.

Tim Cook

2009-Aug-21 23:19 UTC

head link

[zfs-discuss] ZFS configuration input needed.

On Fri, Aug 21, 2009 at 5:52 PM, Ross Walker <rswwalker at gmail.com>
wrote:
> On Aug 21, 2009, at 6:34 PM, Tim Cook <tim at cook.ms> wrote:
>
>
>
> On Fri, Aug 21, 2009 at 5:26 PM, Ross Walker < <rswwalker at
gmail.com>
> rswwalker at gmail.com> wrote:
>
>> On Aug 21, 2009, at 5:46 PM, Ron Mexico < <no-reply at
opensolaris.org>
>> no-reply at opensolaris.org> wrote:
>>
>>  I''m in the process of setting up a NAS for my company.
It''s going to be
>>> based on Open Solaris and ZFS, running on a Dell R710 with two SAS
5/E HBAs.
>>> Each HBA will be connected to a 24 bay Supermicro JBOD chassis.
Each chassis
>>> will have 12 drives to start out with, giving us room for expansion
as
>>> needed.
>>>
>>> Ideally, I''d like to have a mirror of a raidz2 setup, but
from the
>>> documentation I''ve read, it looks like I can''t do
that, and that a stripe of
>>> mirrors is the only way to accomplish this.
>>>
>>
>> Why?
>>
>
> Because some people are paranoid.
>
>
> If that is the case how about a separate zpool of large SATA disks and
> either snapshot and send/recv to it, or use AVT to replicate to it.
>
That adds a window of opportunity for failure.  Potentially quite a large
window.


>
>
>
>
>>
>> It uses as many drives as a RAID10, but you loose 1 more drive of
usable
>> space then RAID10 and you get less then half the performance.
>>
>
> And far more protection.
>
>
>
> It''s not worth the cost, the complexity is so high that it itself
will be a
> point of failure and performance is too low for it to be any use.
>
>The complexity?  There should be no complexity involved in a mirrored
raid-z/z2 pool.



>
>
>
>
>> You might be thinking of a RAID50 which would be multiple raidz vdevs
in a
>> zpool, or striped RAID5s.
>>
>> If not then stick with multiple mirror vdevs in a zpool (RAID10).
>>
>> -Ross
>
>
> Raid10 won''t provide as much protection.  Raidz21, you can lose
any 4
> drives, and up to 14 if it''s the right 14.  Raid10, if you lose
the wrong
> two drives, you''re done.
>
>
>
> Setup a side raidz2 zpool of SATA disks, snap the RAID10 and zsend it to
> the other pool. In the event of catastrophy you can run off the raidz2 pool
> temporarily until the mirror pool is fixed (and it would still perform
> better then the mirrored raidz2 setup!).
>
>Snapshots are not a substitute for raid. That''s a completely different
protection mechanism.  If he wants another copy of the data, I''m sure
he''ll
setup a second server and do zfs send/receives.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090821/92e16179/attachment.html>

Ron Mexico

2009-Aug-22 00:24 UTC

head link

[zfs-discuss] ZFS configuration input needed.

> You''ll have to add a bit of meat to "this"!
> 
> What are you resiliency, space and performance
> requirements?
Resiliency is most important, followed by space and then speed. It''s
primary function is to host digital assets for ad agencies and backups of other
servers and workstations in the office.

Since I can''t make a mirrored raidz2, I''d like the next best
thing. If that means doing a zfs send from one raidz2 to the other,
that''s fine.
-- 
This message posted from opensolaris.org

Ian Collins

2009-Aug-22 00:31 UTC

head link

[zfs-discuss] ZFS configuration input needed.

Ron Mexico wrote:>> You''ll have to add a bit of meat to "this"!
>>
>> What are you resiliency, space and performance
>> requirements?
>>     
>
> Resiliency is most important, followed by space and then speed.
It''s primary function is to host digital assets for ad agencies and
backups of other servers and workstations in the office.
>
> Since I can''t make a mirrored raidz2, I''d like the next
best thing. If that means doing a zfs send from one raidz2 to the other,
that''s fine.
>   I normally use a strip of mirrors for "live" data and a stripe of
raidz2
(4+2) for "backup" data.  I always assign a couple of hot spares to
the
pools.  I also replicate important data between hosts or pools.

The replication provides resiliency during a resilver.

-- 
Ian.

Richard Elling

2009-Aug-22 00:41 UTC

head link

[zfs-discuss] ZFS configuration input needed.

On Aug 21, 2009, at 3:34 PM, Tim Cook wrote:
> On Fri, Aug 21, 2009 at 5:26 PM, Ross Walker <rswwalker at gmail.com>
> wrote:
> On Aug 21, 2009, at 5:46 PM, Ron Mexico <no-reply at opensolaris.org>
> wrote:
>
> I''m in the process of setting up a NAS for my company.
It''s going to
> be based on Open Solaris and ZFS, running on a Dell R710 with two  
> SAS 5/E HBAs. Each HBA will be connected to a 24 bay Supermicro JBOD  
> chassis. Each chassis will have 12 drives to start out with, giving  
> us room for expansion as needed.
>
> Ideally, I''d like to have a mirror of a raidz2 setup, but from the
> documentation I''ve read, it looks like I can''t do that,
and that a
> stripe of mirrors is the only way to accomplish this.
>
> Why?
>
> Because some people are paranoid.
cue the Kinks Destroyer :-)
> It uses as many drives as a RAID10, but you loose 1 more drive of  
> usable space then RAID10 and you get less then half the performance.
>
> And far more protection.
Yes. With raidz3 even more :-)
I put together a spreadsheet a while back to help folks make this sort
of decision.
http://blogs.sun.com/relling/entry/sample_raidoptimizer_output

I didn''t put the outputs for RAID-5+1, but RAIDoptmizer can calculate  
it.
It won''t calculate raidz+1 because there is no such option.  If there  
is some
demand, I can put together a normal RAID (LVM or array) output of  
similar
construction.
> You might be thinking of a RAID50 which would be multiple raidz  
> vdevs in a zpool, or striped RAID5s.
>
> If not then stick with multiple mirror vdevs in a zpool (RAID10).
>
> -Ross
My vote is with Ross. KISS wins :-)
Disclaimer: I''m also a member of BAARF.
> Raid10 won''t provide as much protection.  Raidz21, you can lose
any
> 4 drives, and up to 14 if it''s the right 14.  Raid10, if you lose
> the wrong two drives, you''re done.
One of the reasons I wrote RAIDoptimizer is to help people get a
handle on the math behind this.  You can see some of that orientation
in my other blogs on MTTDL. But at the end of the day, you can get a
pretty good ballpark by saying every level of parity adds about 3 orders
of magnitude to the MTTDL. No parity is always a loss.  Single parity
is better. Double parity even better. Eventually, common-cause problems
dominate.
  -- richard

Tim Cook

2009-Aug-22 00:55 UTC

head link

[zfs-discuss] ZFS configuration input needed.

On Fri, Aug 21, 2009 at 7:41 PM, Richard Elling <richard.elling at
gmail.com>wrote:
> On Aug 21, 2009, at 3:34 PM, Tim Cook wrote:
>
>  On Fri, Aug 21, 2009 at 5:26 PM, Ross Walker <rswwalker at
gmail.com> wrote:
>> On Aug 21, 2009, at 5:46 PM, Ron Mexico <no-reply at
opensolaris.org> wrote:
>>
>> I''m in the process of setting up a NAS for my company.
It''s going to be
>> based on Open Solaris and ZFS, running on a Dell R710 with two SAS 5/E
HBAs.
>> Each HBA will be connected to a 24 bay Supermicro JBOD chassis. Each
chassis
>> will have 12 drives to start out with, giving us room for expansion as
>> needed.
>>
>> Ideally, I''d like to have a mirror of a raidz2 setup, but from
the
>> documentation I''ve read, it looks like I can''t do
that, and that a stripe of
>> mirrors is the only way to accomplish this.
>>
>> Why?
>>
>> Because some people are paranoid.
>>
>
> cue the Kinks Destroyer :-)
>
>  It uses as many drives as a RAID10, but you loose 1 more drive of usable
>> space then RAID10 and you get less then half the performance.
>>
>> And far more protection.
>>
>
> Yes. With raidz3 even more :-)
> I put together a spreadsheet a while back to help folks make this sort
> of decision.
> http://blogs.sun.com/relling/entry/sample_raidoptimizer_output
>
> I didn''t put the outputs for RAID-5+1, but RAIDoptmizer can
calculate it.
> It won''t calculate raidz+1 because there is no such option.  If
there is
> some
> demand, I can put together a normal RAID (LVM or array) output of similar
> construction.

Good point as well.  Completely spaced on the fact raidz3 was added not so
long ago.  I don''t think it''s made it to any officially
supported build yet
though, has it?


>
>
>  You might be thinking of a RAID50 which would be multiple raidz vdevs in a
>> zpool, or striped RAID5s.
>>
>> If not then stick with multiple mirror vdevs in a zpool (RAID10).
>>
>> -Ross
>>
>
> My vote is with Ross. KISS wins :-)
> Disclaimer: I''m also a member of BAARF.


My point is, RAIDZx+1 SHOULD be simple.  I don''t entirely understand
why it
hasn''t been implemented.  I can only imagine like so many other things
it''s
because there hasn''t been significant customer demand.  Unfortunate if
it''s
as simple as I believe it is to implement.  (No, don''t ask me to do it,
I
put in my time programming in college and have no desire to do it again :))



>
>
>  Raid10 won''t provide as much protection.  Raidz21, you can lose
any 4
>> drives, and up to 14 if it''s the right 14.  Raid10, if you
lose the wrong
>> two drives, you''re done.
>>
>
> One of the reasons I wrote RAIDoptimizer is to help people get a
> handle on the math behind this.  You can see some of that orientation
> in my other blogs on MTTDL. But at the end of the day, you can get a
> pretty good ballpark by saying every level of parity adds about 3 orders
> of magnitude to the MTTDL. No parity is always a loss.  Single parity
> is better. Double parity even better. Eventually, common-cause problems
> dominate.
>  -- richard
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090821/0d1abce2/attachment.html>

Richard Elling

2009-Aug-22 01:04 UTC

head link

[zfs-discuss] ZFS configuration input needed.

On Aug 21, 2009, at 5:55 PM, Tim Cook wrote:> On Fri, Aug 21, 2009 at 7:41 PM, Richard Elling <richard.elling at
gmail.com
> > wrote:
>
> My vote is with Ross. KISS wins :-)
> Disclaimer: I''m also a member of BAARF.
>
>
> My point is, RAIDZx+1 SHOULD be simple.  I don''t entirely
understand
> why it hasn''t been implemented.  I can only imagine like so many  
> other things it''s because there hasn''t been significant
customer
> demand.  Unfortunate if it''s as simple as I believe it is to  
> implement.  (No, don''t ask me to do it, I put in my time
programming
> in college and have no desire to do it again :))
You can get in the same ballpark with at least two top-level raidz2  
devs and
copies=2.  If you have three or more top-level raidz2 vdevs, then you  
can even
do better with copies=3 ;-)

Note that I do not have a model for that because it would require  
separate
failure rate data for whole disk failures and all other non-whole disk  
failures.
The latter is not available in data sheets. The closest I can get with  
published
data is using the MTTDL[2] model which considers the published  
unrecoverable
read error rate. In other words, the model would be easy, but data to  
feed the
model is not available :-(  Suffice to say, 2 top-level raidz2 vdevs  
of similar size
with copies=2 should offer very nearly the same protection as raidz2+1.
  -- richard

Adam Sherman

2009-Aug-22 01:09 UTC

head link

[zfs-discuss] ZFS configuration input needed.

On 21-Aug-09, at 21:04 , Richard Elling wrote:>> My point is, RAIDZx+1 SHOULD be simple.  I don''t entirely  
>> understand why it hasn''t been implemented.  I can only imagine
like
>> so many other things it''s because there hasn''t been
significant
>> customer demand.  Unfortunate if it''s as simple as I believe
it is
>> to implement.  (No, don''t ask me to do it, I put in my time  
>> programming in college and have no desire to do it again :))
>
> You can get in the same ballpark with at least two top-level raidz2  
> devs and
> copies=2.  If you have three or more top-level raidz2 vdevs, then  
> you can even
> do better with copies=3 ;-)

Maybe this is noted somewhere, but I did not realize that "copies"  
invoked logic that distributed the copies among vdevs? Can you please  
provide some pointers about this?

Thanks,

A.

--
Adam Sherman
CTO, Versature Corp.
Tel: +1.877.498.3772 x113

Tim Cook

2009-Aug-22 01:17 UTC

head link

[zfs-discuss] ZFS configuration input needed.

On Fri, Aug 21, 2009 at 8:04 PM, Richard Elling <richard.elling at
gmail.com>wrote:
> On Aug 21, 2009, at 5:55 PM, Tim Cook wrote:
>
>> On Fri, Aug 21, 2009 at 7:41 PM, Richard Elling <richard.elling at
gmail.com>
>> wrote:
>>
>> My vote is with Ross. KISS wins :-)
>> Disclaimer: I''m also a member of BAARF.
>>
>>
>> My point is, RAIDZx+1 SHOULD be simple.  I don''t entirely
understand why
>> it hasn''t been implemented.  I can only imagine like so many
other things
>> it''s because there hasn''t been significant customer
demand.  Unfortunate if
>> it''s as simple as I believe it is to implement.  (No,
don''t ask me to do it,
>> I put in my time programming in college and have no desire to do it
again
>> :))
>>
>
> You can get in the same ballpark with at least two top-level raidz2 devs
> and
> copies=2.  If you have three or more top-level raidz2 vdevs, then you can
> even
> do better with copies=3 ;-)
>
> Note that I do not have a model for that because it would require separate
> failure rate data for whole disk failures and all other non-whole disk
> failures.
> The latter is not available in data sheets. The closest I can get with
> published
> data is using the MTTDL[2] model which considers the published
> unrecoverable
> read error rate. In other words, the model would be easy, but data to feed
> the
> model is not available :-(  Suffice to say, 2 top-level raidz2 vdevs of
> similar size
> with copies=2 should offer very nearly the same protection as raidz2+1.
>  -- richard
>

You sure about that?  Say I have a sas controller shit the bed (pardon the
french), and take one of the JBOD''s out entirely.  Even with copies=2,
isn''t
the entire pool going tits up and offline when it loses an entire vdev?

It would seem to me copies=2 is only applicable when you have both an entire
disk loss, and corrupt data on the "good disks".  But feel free to
enlighten
:)  That scenario seems far less likely than having a controller go bad, but
that''s with my anecdotal personal experiences.

--Tim
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090821/c77020ae/attachment.html>

Richard Elling

2009-Aug-22 01:20 UTC

head link

[zfs-discuss] ZFS configuration input needed.

On Aug 21, 2009, at 6:09 PM, Adam Sherman wrote:
> On 21-Aug-09, at 21:04 , Richard Elling wrote:
>>> My point is, RAIDZx+1 SHOULD be simple.  I don''t entirely
>>> understand why it hasn''t been implemented.  I can only
imagine
>>> like so many other things it''s because there
hasn''t been
>>> significant customer demand.  Unfortunate if it''s as
simple as I
>>> believe it is to implement.  (No, don''t ask me to do it, I
put in
>>> my time programming in college and have no desire to do it again
:))
>>
>> You can get in the same ballpark with at least two top-level raidz2  
>> devs and
>> copies=2.  If you have three or more top-level raidz2 vdevs, then  
>> you can even
>> do better with copies=3 ;-)
>
>
> Maybe this is noted somewhere, but I did not realize that
"copies"
> invoked logic that distributed the copies among vdevs? Can you  
> please provide some pointers about this?
It is hard to describe in words, so I made some pictures :-)
http://blogs.sun.com/relling/entry/zfs_copies_and_data_protection
  -- richard

Richard Elling

2009-Aug-22 01:28 UTC

head link

[zfs-discuss] ZFS configuration input needed.

comment far below...

On Aug 21, 2009, at 6:17 PM, Tim Cook wrote:> On Fri, Aug 21, 2009 at 8:04 PM, Richard Elling <richard.elling at
gmail.com
> > wrote:
> On Aug 21, 2009, at 5:55 PM, Tim Cook wrote:
> On Fri, Aug 21, 2009 at 7:41 PM, Richard Elling <richard.elling at
gmail.com
> > wrote:
>
> My vote is with Ross. KISS wins :-)
> Disclaimer: I''m also a member of BAARF.
>
>
> My point is, RAIDZx+1 SHOULD be simple.  I don''t entirely
understand
> why it hasn''t been implemented.  I can only imagine like so many  
> other things it''s because there hasn''t been significant
customer
> demand.  Unfortunate if it''s as simple as I believe it is to  
> implement.  (No, don''t ask me to do it, I put in my time
programming
> in college and have no desire to do it again :))
>
> You can get in the same ballpark with at least two top-level raidz2  
> devs and
> copies=2.  If you have three or more top-level raidz2 vdevs, then  
> you can even
> do better with copies=3 ;-)
>
> Note that I do not have a model for that because it would require  
> separate
> failure rate data for whole disk failures and all other non-whole  
> disk failures.
> The latter is not available in data sheets. The closest I can get  
> with published
> data is using the MTTDL[2] model which considers the published  
> unrecoverable
> read error rate. In other words, the model would be easy, but data  
> to feed the
> model is not available :-(  Suffice to say, 2 top-level raidz2 vdevs  
> of similar size
> with copies=2 should offer very nearly the same protection as  
> raidz2+1.
>  -- richard
>
>
> You sure about that?  Say I have a sas controller shit the bed  
> (pardon the french), and take one of the JBOD''s out entirely. 
Even
> with copies=2, isn''t the entire pool going tits up and offline
when
> it loses an entire vdev?
Yes. But you need to understand that the probability of a SAS  
controller failing is
much, much smaller than a disk. So in order to properly model the  
system, you
can''t treat them as having the same failure rate (the difference is an
order of
magnitude for HDDs). Depending on the repair policy, the probability  
of losing a
SAS controller is expected to be less than the probability of losing 3  
disks in a
raidz2. Since SAS is relatively easy to make redundant, a really  
paranoid person
would have two SAS controllers and the probability of losing two  
highly-reliable
SAS controllers at the same time is way small :-)
> It would seem to me copies=2 is only applicable when you have both  
> an entire disk loss, and corrupt data on the "good disks".  But
feel
> free to enlighten :)  That scenario seems far less likely than  
> having a controller go bad, but that''s with my anecdotal personal
> experiences.
As the Kinks sing, "paranoia will destroy ya!" :-)
  -- richard

Bob Friesenhahn

2009-Aug-22 01:58 UTC

head link

[zfs-discuss] ZFS configuration input needed.

On Fri, 21 Aug 2009, Tim Cook wrote:>
> Raid10 won''t provide as much protection.  Raidz21, you can lose
any 4
> drives, and up to 14 if it''s the right 14.  Raid10, if you lose
the wrong
> two drives, you''re done.
On the flip side, the chance of loosing a second drive during the 
recovery interval is much less with mirroring since only one drive 
needs to be read in order to support the resilver and there is far 
less mechanical action and I/Os involved.

If you make sure that you have a spare drive available to the pool, 
then the spare drive can be resilvered and take over while you sleep, 
minimizing the risk.

Bob
--
Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Bob Friesenhahn

2009-Aug-22 02:06 UTC

head link

[zfs-discuss] ZFS configuration input needed.

On Fri, 21 Aug 2009, Ron Mexico wrote:>
> Since I can''t make a mirrored raidz2, I''d like the next
best thing.
> If that means doing a zfs send from one raidz2 to the other,
that''s
> fine.
Without using heirarchical servers (e.g. volumes from a zfs pool 
exported via iSCSI to be part of another zfs storage pool) you can''t 
do mirrored raidz2 but you can easily do triple mirroring.  If disk 
space is not a concern, then it is difficult to beat the reliability 
of a triple mirror.

Bob
--
Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Bob Friesenhahn

2009-Aug-22 02:48 UTC

head link

[zfs-discuss] ZFS configuration input needed.

On Fri, 21 Aug 2009, Richard Elling wrote:
> magnitude for HDDs). Depending on the repair policy, the probability 
> of losing a SAS controller is expected to be less than the 
> probability of losing 3 disks in a raidz2. Since SAS is relatively 
> easy to make redundant, a really paranoid person would have two SAS 
> controllers and the probability of losing two highly-reliable SAS 
> controllers at the same time is way small :-)
This is a reason to prefer mirroring, with devices in the mirror 
carefully split across controllers.  This approach makes failures 
easier to understand and helps avoid propagation of errors.  Complex 
system designs lead to complex problems.  Some of the world''s largest 
and most successful 5-9s class systems are built using simple duplex 
redundancy.

It is possible to build raidz and raidz2 systems so that their devices 
are accessed via unique paths, but such systems rapidly become quite 
large and expensive.
> As the Kinks sing, "paranoia will destroy ya!" :-)
There''s a time device inside of me, I''m a self-destructin
disk!

When anything goes wrong in a system, the human factor becomes quite 
large.  It dramatically increases the probability that human error 
(the primary cause of data loss) will occur.  The system should be 
designed to accommodate the attendant humans.

Solaris is still much too complicated for people to understand in 
times of crisis.

Bob
--
Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Kees Nuyt

2009-Aug-22 20:02 UTC

head link

[zfs-discuss] ZFS configuration input needed.

On Fri, 21 Aug 2009 18:04:49 -0700, Richard Elling
<richard.elling at gmail.com> wrote:
> You can get in the same ballpark with at least two top-level
> raidz2 devs and copies=2. If you have three or more
> top-level raidz2 vdevs, then you can even do better 
> with copies=3 ;-)
Please note that copies=3 will be obsoleted soon, because
the space for the pointer to the third instance of the data
block was needed for some other purpose (I forgot which).
-- 
  (  Kees Nuyt
  )
c[_]

Richard Elling

2009-Aug-22 22:16 UTC

head link

[zfs-discuss] ZFS configuration input needed.

On Aug 22, 2009, at 1:02 PM, Kees Nuyt wrote:
> On Fri, 21 Aug 2009 18:04:49 -0700, Richard Elling
> <richard.elling at gmail.com> wrote:
>
>> You can get in the same ballpark with at least two top-level
>> raidz2 devs and copies=2. If you have three or more
>> top-level raidz2 vdevs, then you can even do better
>> with copies=3 ;-)
>
> Please note that copies=3 will be obsoleted soon, because
> the space for the pointer to the third instance of the data
> block was needed for some other purpose (I forgot which).
The limit will be copies=2 for ZFS encrypted datasets.
By default, file systems will not be encrypted.
  -- richard

Ron Mexico

2009-Aug-24 18:43 UTC

head link

[zfs-discuss] ZFS configuration input needed.

> Suffice to say, 2 top-level raidz2 vdevs of similar size with copies=2
> should offer very nearly the same protection as raidz2+1. 
> -- richard
This looks like the way to go. Thanks for your input. It''s much
appreciated!
-- 
This message posted from opensolaris.org

zfs discuss - Aug 2009 - ZFS configuration input needed.

[zfs-discuss] ZFS configuration input needed.

[zfs-discuss] ZFS configuration input needed.

[zfs-discuss] ZFS configuration input needed.

[zfs-discuss] ZFS configuration input needed.

[zfs-discuss] ZFS configuration input needed.

[zfs-discuss] ZFS configuration input needed.

[zfs-discuss] ZFS configuration input needed.

[zfs-discuss] ZFS configuration input needed.

[zfs-discuss] ZFS configuration input needed.

[zfs-discuss] ZFS configuration input needed.

[zfs-discuss] ZFS configuration input needed.

[zfs-discuss] ZFS configuration input needed.

[zfs-discuss] ZFS configuration input needed.

[zfs-discuss] ZFS configuration input needed.

[zfs-discuss] ZFS configuration input needed.

[zfs-discuss] ZFS configuration input needed.

[zfs-discuss] ZFS configuration input needed.

[zfs-discuss] ZFS configuration input needed.

[zfs-discuss] ZFS configuration input needed.

[zfs-discuss] ZFS configuration input needed.

[zfs-discuss] ZFS configuration input needed.

[zfs-discuss] ZFS configuration input needed.