thr3ads.net - zfs discuss - [zfs-discuss] RAIDZ versus mirrroed [Sep 2009]

If this information is useful, please help other people find it:
Share via:

eneal at businessgrade.com

2009-Sep-16 12:55 UTC

[zfs-discuss] RAIDZ versus mirrroed

Hi. If I am using slightly more reliable SAS drives versus SATA, SSDs  
for both L2Arc and ZIL and lots of RAM, will a mirrored pool of say 24  
disks hold any significant advantages over a RAIDZ pool?




--------------------------------------------------------------------------------

This email and any files transmitted with it are confidential and are  
intended solely for the use of the individual or entity to whom they  
are addressed. This communication may contain material protected by  
the attorney-client privilege. If you are not the intended recipient,  
be advised that any use, dissemination, forwarding, printing or  
copying is strictly prohibited. If you have received this email in  
error, please contact the sender and delete all copies.

Thomas Burgess

2009-Sep-16 13:17 UTC

head link

[zfs-discuss] RAIDZ versus mirrroed

it should be faster.  It really depends on what you are using it for though,
I''ve been using raidz for my system and i''m very happy with
it.


On Wed, Sep 16, 2009 at 8:55 AM, <eneal at businessgrade.com> wrote:
> Hi. If I am using slightly more reliable SAS drives versus SATA, SSDs for
> both L2Arc and ZIL and lots of RAM, will a mirrored pool of say 24 disks
> hold any significant advantages over a RAIDZ pool?
>
>
>
>
>
>
--------------------------------------------------------------------------------
>
> This email and any files transmitted with it are confidential and are
> intended solely for the use of the individual or entity to whom they are
> addressed. This communication may contain material protected by the
> attorney-client privilege. If you are not the intended recipient, be
advised
> that any use, dissemination, forwarding, printing or copying is strictly
> prohibited. If you have received this email in error, please contact the
> sender and delete all copies.
>
>
>
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090916/8d49c7bc/attachment.html>

Edward Ned Harvey

2009-Sep-16 14:31 UTC

head link

[zfs-discuss] RAIDZ versus mirrroed

> Hi. If I am using slightly more reliable SAS drives versus SATA, SSDs
> for both L2Arc and ZIL and lots of RAM, will a mirrored pool of say 24
> disks hold any significant advantages over a RAIDZ pool?
Generally speaking, striping mirrors will be faster than raidz or raidz2,
but it will require a higher number of disks and therefore higher cost to
get the same usable space.  The main reason to use raidz or raidz2 instead
of striping mirrors would be to keep the cost down, or to get higher usable
space out of a fixed number of drives.

David Magda

2009-Sep-16 14:36 UTC

head link

[zfs-discuss] RAIDZ versus mirrroed

On Wed, September 16, 2009 10:31, Edward Ned Harvey
wrote:>> Hi. If I am using slightly more reliable SAS drives versus SATA, SSDs
>> for both L2Arc and ZIL and lots of RAM, will a mirrored pool of say 24
>> disks hold any significant advantages over a RAIDZ pool?
>
> Generally speaking, striping mirrors will be faster than raidz or raidz2,
> but it will require a higher number of disks and therefore higher cost to
> get the same usable space.  The main reason to use raidz or raidz2 instead
> of striping mirrors would be to keep the cost down, or to get higher
> usable space out of a fixed number of drives.
And if you want space /and/ speed, then ZFS'' hybrid storage pools is
something worth looking into.

eneal at businessgrade.com

2009-Sep-16 14:39 UTC

head link

[zfs-discuss] RAIDZ versus mirrroed

Quoting David Magda <dmagda at ee.ryerson.ca>:
> On Wed, September 16, 2009 10:31, Edward Ned Harvey wrote:
>>> Hi. If I am using slightly more reliable SAS drives versus SATA,
SSDs
>>> for both L2Arc and ZIL and lots of RAM, will a mirrored pool of say
24
>>> disks hold any significant advantages over a RAIDZ pool?
>>
>> Generally speaking, striping mirrors will be faster than raidz or
raidz2,
>> but it will require a higher number of disks and therefore higher cost
to
>> get the same usable space.  The main reason to use raidz or raidz2
instead
>> of striping mirrors would be to keep the cost down, or to get higher
>> usable space out of a fixed number of drives.
>
> And if you want space /and/ speed, then ZFS'' hybrid storage pools
is
> something worth looking into.
>
>This is precisely my point. If I''m taking the hybrid approach - what  
advantages do mirrored pools hold over RAIDZ?
As I mentioned, a large amount of RAM, and SSD''s for both L2arc and
ZIL.

--------------------------------------------------------------------------------

This email and any files transmitted with it are confidential and are  
intended solely for the use of the individual or entity to whom they  
are addressed. This communication may contain material protected by  
the attorney-client privilege. If you are not the intended recipient,  
be advised that any use, dissemination, forwarding, printing or  
copying is strictly prohibited. If you have received this email in  
error, please contact the sender and delete all copies.

Cindy.Swearingen at Sun.COM

2009-Sep-16 15:35 UTC

head link

[zfs-discuss] RAIDZ versus mirrroed

In addition, if you need the flexibility of moving disks around until
the device removal CR integrates, then mirrored pools are more flexible.

Detaching disks from a mirror isn''t ideal but if you absolutely have
to reuse a disk temporarily then go with mirrors. See the output below.
You can replace disks in either configuration if you want to switch
smaller disks with larger disks, for example.

Cindy

# zpool status rzpool
   pool: rzpool
  state: ONLINE
  scrub: resilver completed after 0h0m with 0 errors on Tue Sep 15 
14:41:24 2009
config:

         NAME        STATE     READ WRITE CKSUM
         rzpool      ONLINE       0     0     0
           raidz2    ONLINE       0     0     0
             c2t0d0  ONLINE       0     0     0
             c2t2d0  ONLINE       0     0     0
             c2t4d0  ONLINE       0     0     0
             c2t5d0  ONLINE       0     0     0
             c2t6d0  ONLINE       0     0     0
         spares
           c2t7d0    AVAIL

errors: No known data errors
# zpool detach rzpool c2t6d0
cannot detach c2t6d0: only applicable to mirror and replacing vdevs
# zpool destroy rzpool
# zpool create mirpool mirror c2t0d0 c2t2d0 mirror c2t4d0 c2t6d0 spare 
c2t5d0
# zpool status mirpool
   pool: mirpool
  state: ONLINE
  scrub: none requested
config:

         NAME        STATE     READ WRITE CKSUM
         mirpool     ONLINE       0     0     0
           mirror    ONLINE       0     0     0
             c2t0d0  ONLINE       0     0     0
             c2t2d0  ONLINE       0     0     0
           mirror    ONLINE       0     0     0
             c2t4d0  ONLINE       0     0     0
             c2t6d0  ONLINE       0     0     0
         spares
           c2t5d0    AVAIL

errors: No known data errors
# zpool detach mirpool c2t6d0
#

On 09/16/09 08:31, Edward Ned Harvey wrote:>>Hi. If I am using slightly more reliable SAS drives versus SATA, SSDs
>>for both L2Arc and ZIL and lots of RAM, will a mirrored pool of say 24
>>disks hold any significant advantages over a RAIDZ pool?
> 
> 
> Generally speaking, striping mirrors will be faster than raidz or raidz2,
> but it will require a higher number of disks and therefore higher cost to
> get the same usable space.  The main reason to use raidz or raidz2 instead
> of striping mirrors would be to keep the cost down, or to get higher usable
> space out of a fixed number of drives.
> 
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Scott Meilicke

2009-Sep-16 15:36 UTC

head link

[zfs-discuss] RAIDZ versus mirrroed

I think in theory the ZIL/L2ARC should make things nice and fast if your
workload includes sync requests (database, iscsi, nfs, etc.), regardless of the
backend disks. But the only sure way to know is test with your work load.

-Scott
-- 
This message posted from opensolaris.org

Bob Friesenhahn

2009-Sep-16 16:02 UTC

head link

[zfs-discuss] RAIDZ versus mirrroed

On Wed, 16 Sep 2009, eneal at businessgrade.com wrote:
> Hi. If I am using slightly more reliable SAS drives versus SATA, SSDs for 
> both L2Arc and ZIL and lots of RAM, will a mirrored pool of say 24 disks
hold
> any significant advantages over a RAIDZ pool?
A mirrored pool will support more IOPs.  This is even true when using 
SSDs for L2Arc and ZIL.  Using a SSD for the ZIL dramatically reduces 
synchronous write latency but the data still needs to be committed to 
backing store.  If the bulk of the synchronous writes are also random 
writes, then the throughput is still dependent on the IOPs capacity of 
the backing store.  Similarly, more RAM and/or a large SSD L2Arc 
improves the probability that a repeated read will be retrieved from 
the ARC rather than the backing store but this depends on the size of 
the working set, and whether the reads are ever repeated.  There are 
cases (e.g. daily backups) where reads are rarely repeated.

In summary, write IOPs are still write IOPs, and a read cache only 
works effectively for repeated reads (or reads of recently written 
data).

You still need to look at the nature of your workload in order to 
decide if RAIDZ is appropriate.

Bob
--
Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Marty Scholes

2009-Sep-16 16:38 UTC

head link

[zfs-discuss] RAIDZ versus mirrroed

> Generally speaking, striping mirrors will be faster
> than raidz or raidz2,
> but it will require a higher number of disks and
> therefore higher cost to
> The main reason to use
> raidz or raidz2 instead
> of striping mirrors would be to keep the cost down,
> or to get higher usable
> space out of a fixed number of drives.
While it has been a while since I have done storage management for critical
systems, the advantage I see with RAIDZN is better fault tolerance: any N drives
may fail before  the set goes critical.

With straight mirroring, failure of the wrong two drives will invalidate the
whole pool.

The advantage of striped mirrors is that it offers a better chance of higher
iops (assuming the I/O is distributed correctly).  Also, it might be easier to
expand a mirror by upgrading only two drives with larger drives.  With RAID, the
entire stripe of drives would need to be upgraded.
-- 
This message posted from opensolaris.org

Markus Kovero

2009-Sep-16 17:02 UTC

head link

[zfs-discuss] RAIDZ versus mirrroed

It''s possible to do 3-way (or more) mirrors too, so you may achieve
better redundancy than raidz2/3

Yours
Markus Kovero

-----Original Message-----
From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss-bounces at
opensolaris.org] On Behalf Of Marty Scholes
Sent: 16. syyskuuta 2009 19:38
To: zfs-discuss at opensolaris.org
Subject: Re: [zfs-discuss] RAIDZ versus mirrroed
> Generally speaking, striping mirrors will be faster
> than raidz or raidz2,
> but it will require a higher number of disks and
> therefore higher cost to
> The main reason to use
> raidz or raidz2 instead
> of striping mirrors would be to keep the cost down,
> or to get higher usable
> space out of a fixed number of drives.
While it has been a while since I have done storage management for critical
systems, the advantage I see with RAIDZN is better fault tolerance: any N drives
may fail before  the set goes critical.

With straight mirroring, failure of the wrong two drives will invalidate the
whole pool.

The advantage of striped mirrors is that it offers a better chance of higher
iops (assuming the I/O is distributed correctly).  Also, it might be easier to
expand a mirror by upgrading only two drives with larger drives.  With RAID, the
entire stripe of drives would need to be upgraded.
-- 
This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
zfs-discuss at opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

David Dyer-Bennet

2009-Sep-16 17:14 UTC

head link

[zfs-discuss] RAIDZ versus mirrroed

On Wed, September 16, 2009 10:35, Cindy.Swearingen at Sun.COM wrote:
> Detaching disks from a mirror isn''t ideal but if you absolutely
have
> to reuse a disk temporarily then go with mirrors. See the output below.
> You can replace disks in either configuration if you want to switch
> smaller disks with larger disks, for example.
In a small configuration, like a home NAS, like I''m running, the
upgrade
issue was what drove me to mirrors over RAIDZ, despite the cost.  A
typical configuration would have 4 or 5 hot-swap bays.  I have 8, though
only interfaces for 6 of them, and two are used for boot disks in a
mirror, so my data pool is in fact 4 drives.

It was cheaper to start with a two-disk mirror, knowing that I could add a
second two-disk mirror when needed, than it would have been to invest in 4
disks right away.  And (after filling all the slots) it''s cheaper to
upgrade the two disks in a mirror than the ~4 disks in a RAIDZ if I need
more space.

Despite my digital photography, and multiple housemates, I haven''t
filled
the current 800gb usable space (two vdevs, each a two-disk mirror of 400GB
drives).  By the time I do, I will certainly be able to afford larger
drives!

-- 
David Dyer-Bennet, dd-b at dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

Thomas Burgess

2009-Sep-16 17:15 UTC

head link

[zfs-discuss] RAIDZ versus mirrroed

At the end of the day, it TOTALLY depends on your needs.  raidz may be the
best bet for you if you simply do not need the speed of mirrors, and as
another user mentioned, it DOES offer better fault tollerence.  Figure out
what your needs are for your workload THEN ask.

These type of loaded questions will ALWAYS get the same answers.


On Wed, Sep 16, 2009 at 10:39 AM, <eneal at businessgrade.com> wrote:
> Quoting David Magda <dmagda at ee.ryerson.ca>:
>
>  On Wed, September 16, 2009 10:31, Edward Ned Harvey wrote:
>>
>>> Hi. If I am using slightly more reliable SAS drives versus SATA,
SSDs
>>>> for both L2Arc and ZIL and lots of RAM, will a mirrored pool of
say 24
>>>> disks hold any significant advantages over a RAIDZ pool?
>>>>
>>>
>>> Generally speaking, striping mirrors will be faster than raidz or
raidz2,
>>> but it will require a higher number of disks and therefore higher
cost to
>>> get the same usable space.  The main reason to use raidz or raidz2
>>> instead
>>> of striping mirrors would be to keep the cost down, or to get
higher
>>> usable space out of a fixed number of drives.
>>>
>>
>> And if you want space /and/ speed, then ZFS'' hybrid storage
pools is
>> something worth looking into.
>>
>>
>>  This is precisely my point. If I''m taking the hybrid approach
- what
> advantages do mirrored pools hold over RAIDZ?
> As I mentioned, a large amount of RAM, and SSD''s for both L2arc
and ZIL.
>
>
>
>
--------------------------------------------------------------------------------
>
> This email and any files transmitted with it are confidential and are
> intended solely for the use of the individual or entity to whom they are
> addressed. This communication may contain material protected by the
> attorney-client privilege. If you are not the intended recipient, be
advised
> that any use, dissemination, forwarding, printing or copying is strictly
> prohibited. If you have received this email in error, please contact the
> sender and delete all copies.
>
>
>
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090916/03d934ee/attachment.html>

Richard Elling

2009-Sep-16 17:23 UTC

head link

[zfs-discuss] RAIDZ versus mirrroed

On Sep 16, 2009, at 9:38 AM, Marty Scholes wrote:>> Generally speaking, striping mirrors will be faster
>> than raidz or raidz2,
>> but it will require a higher number of disks and
>> therefore higher cost to
>> The main reason to use
>> raidz or raidz2 instead
>> of striping mirrors would be to keep the cost down,
>> or to get higher usable
>> space out of a fixed number of drives.
>
> While it has been a while since I have done storage management for  
> critical systems, the advantage I see with RAIDZN is better fault  
> tolerance: any N drives may fail before  the set goes critical.
>
> With straight mirroring, failure of the wrong two drives will  
> invalidate the whole pool.
This line of reasoning doesn''t get you very far.  It is much better to
take a look at
the mean time to data loss (MTTDL) for the various configurations.  I  
wrote a
series of blogs to show how this is done.
http://blogs.sun.com/relling/tags/mttdl

  -- richard

Thomas Burgess

2009-Sep-16 17:42 UTC

head link

[zfs-discuss] RAIDZ versus mirrroed

Mirrors are much quicker to replace if one DOES fail though...so i would
think that bad stuff could happen with EITHER solution....If you buy a bunch
of hard drives for a raidz and they are all from the same batch they might
all fail around the same time...what if you have a raidz2 group and 2 drives
fail, then you''re adding 2 drives back and another fails before
it''s
complete because it takes SO long to resilver? At least with mirrors they
resilver fast.

The bottom line is that bad stuff CAN happen and often does...so don''t
let
raidz or mirrors be the only solution you have.  Redundancy is good.

More redundancy is better... but backups are the best.

On Wed, Sep 16, 2009 at 1:23 PM, Richard Elling <richard.elling at
gmail.com>wrote:
> On Sep 16, 2009, at 9:38 AM, Marty Scholes wrote:
>
>> Generally speaking, striping mirrors will be faster
>>> than raidz or raidz2,
>>> but it will require a higher number of disks and
>>> therefore higher cost to
>>> The main reason to use
>>> raidz or raidz2 instead
>>> of striping mirrors would be to keep the cost down,
>>> or to get higher usable
>>> space out of a fixed number of drives.
>>>
>>
>> While it has been a while since I have done storage management for
>> critical systems, the advantage I see with RAIDZN is better fault
tolerance:
>> any N drives may fail before  the set goes critical.
>>
>> With straight mirroring, failure of the wrong two drives will
invalidate
>> the whole pool.
>>
>
> This line of reasoning doesn''t get you very far.  It is much
better to take
> a look at
> the mean time to data loss (MTTDL) for the various configurations.  I wrote
> a
> series of blogs to show how this is done.
> http://blogs.sun.com/relling/tags/mttdl
>
>  -- richard
>
>
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090916/96d37119/attachment.html>

Richard Elling

2009-Sep-16 18:14 UTC

head link

[zfs-discuss] RAIDZ versus mirrroed

On Sep 16, 2009, at 10:42 AM, Thomas Burgess wrote:
> Mirrors are much quicker to replace if one DOES fail though...so i  
> would think that bad stuff could happen with EITHER solution....If  
> you buy a bunch of hard drives for a raidz and they are all from the  
> same batch they might all fail around the same time...what if you  
> have a raidz2 group and 2 drives fail, then you''re adding 2 drives
> back and another fails before it''s complete because it takes SO
long
> to resilver? At least with mirrors they resilver fast.
In general, resilver is bound by either the media write bandwidth of  
the resilvering device
or the random IOP capacity of the remaining good drives. Although I  
don''t know of any
studies comparing mirrors vs raidz resilvering, I would not expect  
much difference between
the two, all else held constant.
  -- richard

Thomas Burgess

2009-Sep-16 18:41 UTC

head link

[zfs-discuss] RAIDZ versus mirrroed

hrm, i always thought raidz took longer....learn something every day =)


On Wed, Sep 16, 2009 at 2:14 PM, Richard Elling <richard.elling at
gmail.com>wrote:
> On Sep 16, 2009, at 10:42 AM, Thomas Burgess wrote:
>
>  Mirrors are much quicker to replace if one DOES fail though...so i would
>> think that bad stuff could happen with EITHER solution....If you buy a
bunch
>> of hard drives for a raidz and they are all from the same batch they
might
>> all fail around the same time...what if you have a raidz2 group and 2
drives
>> fail, then you''re adding 2 drives back and another fails
before it''s
>> complete because it takes SO long to resilver? At least with mirrors
they
>> resilver fast.
>>
>
> In general, resilver is bound by either the media write bandwidth of the
> resilvering device
> or the random IOP capacity of the remaining good drives. Although I
don''t
> know of any
> studies comparing mirrors vs raidz resilvering, I would not expect much
> difference between
> the two, all else held constant.
>  -- richard
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090916/f36e371a/attachment.html>

Marty Scholes

2009-Sep-16 19:50 UTC

head link

[zfs-discuss] RAIDZ versus mirrroed

> This line of reasoning doesn&#39;t get you very far.
>  It is much better to take a look at<br>
> the mean time to data loss (MTTDL) for the various
> configurations.  I wrote a<br>
> series of blogs to show how this is done.<br>
> <a href="http://blogs.sun.com/relling/tags/mttdl"
target="_blank">http://blogs.sun.com/relling/tags/mttdl</a><br><br>
I will play the Devils advocate here and point out that the chart shows MTTDL
for RAIDZ2, both 6 and 8 disk, is much better than mirroring.

The chart does show that three way mirroring is better still and I would guess
that RAIDZ3 surpasses that.
-- 
This message posted from opensolaris.org

Richard Elling

2009-Sep-16 20:00 UTC

head link

[zfs-discuss] RAIDZ versus mirrroed

On Sep 16, 2009, at 12:50 PM, Marty Scholes wrote:
>> This line of reasoning doesn&#39;t get you very far.
>> It is much better to take a look at<br>
>> the mean time to data loss (MTTDL) for the various
>> configurations.  I wrote a<br>
>> series of blogs to show how this is done.<br>
>> <a href="http://blogs.sun.com/relling/tags/mttdl"
target="_blank">http://blogs.sun.com/relling/tags/mttdl
>> </a><br><br>
>
> I will play the Devils advocate here and point out that the chart  
> shows MTTDL for RAIDZ2, both 6 and 8 disk, is much better than  
> mirroring.
>
> The chart does show that three way mirroring is better still and I  
> would guess that RAIDZ3 surpasses that.
Yes.  This is a mathematical way of saying "lose any P+1 of N disks."

The important part is that the number of parity disks (or mirror sides)
is the big knob to use. But every choice is a trade-off.  For a single
set, the results should be intuitive. But as you vary the number of  
sets,
it quickly becomes easier to use the models.  For example, with a
Thumper, you have 48 disks and zillions of possible combinations
to choose from.
  -- richard

Bob Friesenhahn

2009-Sep-16 20:09 UTC

head link

[zfs-discuss] RAIDZ versus mirrroed

On Wed, 16 Sep 2009, Thomas Burgess wrote:
> hrm, i always thought raidz took longer....learn something every day =)
And you were probably right, in spite of Richard''s lack of knowledge 
of a study or the feeling in his gut.  Just look at the many postings 
here about resilvering and you will see far more complaints about 
raidz taking a long time.

Resilver of mirrors will surely do better for large pools which 
continue to be used during the resilvering.

Bob
--
Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Marty Scholes

2009-Sep-16 20:29 UTC

head link

[zfs-discuss] RAIDZ versus mirrroed

> Yes.  This is a mathematical way of saying
> "lose any P+1 of N disks."
I am hesitant to beat this dead horse, yet it is a nuance that either I have
completely misunderstood or many people I''ve met have completely
missed.

Whether a stripe of mirrors or mirror of a stripes, any single failure makes the
array critical, i.e. one failure from disaster.

For example, suppose a stripe of four sets of mirrors.  That stripe has 8 disks
total: four data and four mirrors.  If one disk fails, say on mirror set 3, then
set 3 is running on a single disk.  Should that remaining disk in set 3 fail,
the whole stripe is lost.  Yes, the stripe is safe as long as the next failure
is not from set 3.

Contrast that to RAIDZ3.  Suppose seven total disks with the same effective pool
size: 4 data and 3 parity.  If any single disk is lost then the array is not
critical and can still survive any other loss.  In fact, it can survive a total
of any three disk failures before it becomes critical.

I just see it too often where someone states that a stripe of four mirror sets
can sustain four disk failures.  Yes, that''s true, as long as the
correct four disks fail.  If we could control which disks fail, then none of
this would even be necessary, so that argument seems rather silly.

Richard Elling

2009-Sep-16 21:19 UTC

head link

[zfs-discuss] RAIDZ versus mirrroed

On Sep 16, 2009, at 1:09 PM, Bob Friesenhahn wrote:
> On Wed, 16 Sep 2009, Thomas Burgess wrote:
>
>> hrm, i always thought raidz took longer....learn something every  
>> day =)
>
> And you were probably right, in spite of Richard''s lack of
knowledge
> of a study or the feeling in his gut.  Just look at the many  
> postings here about resilvering and you will see far more complaints  
> about raidz taking a long time.
Actually, I had a ton of data on resilvering which shows mirrors and
raidz equivalently bottlenecked on the media write bandwidth. However,
there are other cases which are IOPS bound (or CR bound :-) which
cover some of the postings here. I think Sommerfeld has some other
data which could be pertinent.
  -- richard

Richard Elling

2009-Sep-16 21:23 UTC

head link

[zfs-discuss] RAIDZ versus mirrroed

On Sep 16, 2009, at 1:29 PM, Marty Scholes wrote:>> Yes.  This is a mathematical way of saying
>> "lose any P+1 of N disks."
>
> I am hesitant to beat this dead horse, yet it is a nuance that  
> either I have completely misunderstood or many people I''ve met
have
> completely missed.
>
> Whether a stripe of mirrors or mirror of a stripes, any single  
> failure makes the array critical, i.e. one failure from disaster.
>
> For example, suppose a stripe of four sets of mirrors.  That stripe  
> has 8 disks total: four data and four mirrors.  If one disk fails,  
> say on mirror set 3, then set 3 is running on a single disk.  Should  
> that remaining disk in set 3 fail, the whole stripe is lost.  Yes,  
> the stripe is safe as long as the next failure is not from set 3.
Yes. I don''t think I''ve blogged the data, but the MTTDL models
will
show that
RAID-1+0 has a higher MTTDL than RAID-0+1.
> Contrast that to RAIDZ3.  Suppose seven total disks with the same  
> effective pool size: 4 data and 3 parity.  If any single disk is  
> lost then the array is not critical and can still survive any other  
> loss.  In fact, it can survive a total of any three disk failures  
> before it becomes critical.
Yes, but can you quantify this?  2x better?  5x better? 1.01x better?  
The
MTTDL models can help you quantify this.
> I just see it too often where someone states that a stripe of four  
> mirror sets can sustain four disk failures.  Yes, that''s true, as
> long as the correct four disks fail.  If we could control which  
> disks fail, then none of this would even be necessary, so that  
> argument seems rather silly.
The MTTDL models account for this.
  -- richard

Eric Schrock

2009-Sep-16 21:30 UTC

head link

[zfs-discuss] RAIDZ versus mirrroed

On 09/16/09 14:19, Richard Elling wrote:> On Sep 16, 2009, at 1:09 PM, Bob Friesenhahn wrote:
> 
>> On Wed, 16 Sep 2009, Thomas Burgess wrote:
>>
>>> hrm, i always thought raidz took longer....learn something every
day =)
>>
>> And you were probably right, in spite of Richard''s lack of
knowledge
>> of a study or the feeling in his gut.  Just look at the many postings 
>> here about resilvering and you will see far more complaints about 
>> raidz taking a long time.
> 
> Actually, I had a ton of data on resilvering which shows mirrors and
> raidz equivalently bottlenecked on the media write bandwidth. However,
> there are other cases which are IOPS bound (or CR bound :-) which
> cover some of the postings here. I think Sommerfeld has some other
> data which could be pertinent.
This primarily has to do with the stripe width and block size.  The 
difference between mirroring and RAID-Z is that with RAID-Z each ZFS 
block is again chunked up into smaller blocks and distributed across the 
stripe.  So if you have a wide stripe (i.e. 32), a 128k block can be 
chunked up into 4k blocks, while a small recordsize can be chunked even 
smaller (i.e. 8k to 1k or 512).

ZFS resilvering is metadata based to allow for efficient resilvering of 
outages, but when a relatively full disk needs to be replaced you end up 
bottlenecked on the metadata traversal.   If your blocks are chunked up 
small enough, this becomes a random I/O benchmark for the good disks in 
the RAID stripe.  If your pool is backed by 7200 RPM disks, this can end 
up taking a very long time.

The ZFS team is actively working on improvements in this area.

- Eric

-- 
Eric Schrock, Fishworks                    http://blogs.sun.com/eschrock

Ross Walker

2009-Sep-16 22:32 UTC

head link

[zfs-discuss] RAIDZ versus mirrroed

On Sep 16, 2009, at 4:29 PM, "Marty Scholes" <martyscholes at
yahoo.com>
wrote:
>> Yes.  This is a mathematical way of saying
>> "lose any P+1 of N disks."
>
> I am hesitant to beat this dead horse, yet it is a nuance that  
> either I have completely misunderstood or many people I''ve met
have
> completely missed.
>
> Whether a stripe of mirrors or mirror of a stripes, any single  
> failure makes the array critical, i.e. one failure from disaster.
>
> For example, suppose a stripe of four sets of mirrors.  That stripe  
> has 8 disks total: four data and four mirrors.  If one disk fails,  
> say on mirror set 3, then set 3 is running on a single disk.  Should  
> that remaining disk in set 3 fail, the whole stripe is lost.  Yes,  
> the stripe is safe as long as the next failure is not from set 3.
>
> Contrast that to RAIDZ3.  Suppose seven total disks with the same  
> effective pool size: 4 data and 3 parity.  If any single disk is  
> lost then the array is not critical and can still survive any other  
> loss.  In fact, it can survive a total of any three disk failures  
> before it becomes critical.
>
> I just see it too often where someone states that a stripe of four  
> mirror sets can sustain four disk failures.  Yes, that''s true, as
> long as the correct four disks fail.  If we could control which  
> disks fail, then none of this would even be necessary, so that  
> argument seems rather silly.
There is another type of failure that mirrors help with and that is  
controller or path failures. If one side of a mirror set is on one  
controller or path and the other on another then a failure of one will  
not take down the set.

You can''t get that with RAIDZn.

-Ross

Bob Friesenhahn

2009-Sep-16 22:43 UTC

head link

[zfs-discuss] RAIDZ versus mirrroed

On Wed, 16 Sep 2009, Ross Walker wrote:>
> There is another type of failure that mirrors help with and that is 
> controller or path failures. If one side of a mirror set is on one
controller
> or path and the other on another then a failure of one will not take down
the
> set.
>
> You can''t get that with RAIDZn.
Sure you can.  Just make sure that ''n'' is the same as the
number of
data disks, and make sure that each disk in the vdev is accessed via a 
unique controller path. Use raidz3 with six disks.  You probably need 
a lot of vdevs to make this even somewhat cost effective. :-)

Regardless, mirrors are known to be more resilient to temporary path 
failures.

Bob
--
Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Marion Hakanson

2009-Sep-16 22:50 UTC

head link

[zfs-discuss] RAIDZ versus mirrroed

rswwalker at gmail.com said:> There is another type of failure that mirrors help with and that is
> controller or path failures. If one side of a mirror set is on one
> controller or path and the other on another then a failure of one will  
not
> take down the set.
> 
> You can''t get that with RAIDZn. 
You can if you have a stripe of RAIDZn''s, and enough controllers
(or paths) to go around.  The raidz2 below should be able to survive
the loss of two controllers, shouldn''t it?

Regards,

Marion


$ zpool status -v
  pool: zp1
 state: ONLINE
 scrub: scrub completed after 7h9m with 0 errors on Mon Sep 14 13:39:03 2009
config:

        NAME        STATE     READ WRITE CKSUM
        bulk_zp01   ONLINE       0     0     0
          raidz2    ONLINE       0     0     0
            c0t1d0  ONLINE       0     0     0
            c1t1d0  ONLINE       0     0     0
            c4t1d0  ONLINE       0     0     0
            c5t1d0  ONLINE       0     0     0
            c6t1d0  ONLINE       0     0     0
            c7t1d0  ONLINE       0     0     0
          raidz2    ONLINE       0     0     0
            c0t2d0  ONLINE       0     0     0
            c1t2d0  ONLINE       0     0     0
            c4t2d0  ONLINE       0     0     0
            c5t2d0  ONLINE       0     0     0
            c6t2d0  ONLINE       0     0     0
            c7t2d0  ONLINE       0     0     0
          raidz2    ONLINE       0     0     0
            c0t3d0  ONLINE       0     0     0
            c1t3d0  ONLINE       0     0     0
            c4t3d0  ONLINE       0     0     0
            c5t3d0  ONLINE       0     0     0
            c6t3d0  ONLINE       0     0     0
            c7t3d0  ONLINE       0     0     0
          raidz2    ONLINE       0     0     0
            c0t4d0  ONLINE       0     0     0
            c1t4d0  ONLINE       0     0     0
            c4t4d0  ONLINE       0     0     0
            c5t4d0  ONLINE       0     0     0
            c6t4d0  ONLINE       0     0     0
            c7t4d0  ONLINE       0     0     0
          raidz2    ONLINE       0     0     0
            c0t5d0  ONLINE       0     0     0
            c1t5d0  ONLINE       0     0     0
            c4t5d0  ONLINE       0     0     0
            c5t5d0  ONLINE       0     0     0
            c6t5d0  ONLINE       0     0     0
            c7t5d0  ONLINE       0     0     0
          raidz2    ONLINE       0     0     0
            c0t6d0  ONLINE       0     0     0
            c1t6d0  ONLINE       0     0     0
            c4t6d0  ONLINE       0     0     0
            c5t6d0  ONLINE       0     0     0
            c6t6d0  ONLINE       0     0     0
            c7t6d0  ONLINE       0     0     0
          raidz2    ONLINE       0     0     0
            c0t7d0  ONLINE       0     0     0
            c1t7d0  ONLINE       0     0     0
            c4t7d0  ONLINE       0     0     0
            c5t7d0  ONLINE       0     0     0
            c6t7d0  ONLINE       0     0     0
            c7t7d0  ONLINE       0     0     0
        spares
          c0t0d0    AVAIL
          c1t0d0    AVAIL
          c4t0d0    AVAIL
          c7t0d0    AVAIL

errors: No known data errors
$

Ross Walker

2009-Sep-17 02:10 UTC

head link

[zfs-discuss] RAIDZ versus mirrroed

On Sep 16, 2009, at 6:50 PM, Marion Hakanson <hakansom at ohsu.edu> wrote:
> rswwalker at gmail.com said:
>> There is another type of failure that mirrors help with and that is
>> controller or path failures. If one side of a mirror set is on one
>> controller or path and the other on another then a failure of one  
>> will   not
>> take down the set.
>>
>> You can''t get that with RAIDZn.
>
> You can if you have a stripe of RAIDZn''s, and enough controllers
> (or paths) to go around.  The raidz2 below should be able to survive
> the loss of two controllers, shouldn''t it?
It''s not the stripes that make a difference, but the number of  
controllers there.

What''s the system config on that puppy?

-Ross

Ross Walker

2009-Sep-17 02:17 UTC

head link

[zfs-discuss] RAIDZ versus mirrroed

On Sep 16, 2009, at 6:43 PM, Bob Friesenhahn <bfriesen at simple.dallas.tx.us
 > wrote:
> On Wed, 16 Sep 2009, Ross Walker wrote:
>>
>> There is another type of failure that mirrors help with and that is  
>> controller or path failures. If one side of a mirror set is on one  
>> controller or path and the other on another then a failure of one  
>> will not take down the set.
>>
>> You can''t get that with RAIDZn.
>
> Sure you can.  Just make sure that ''n'' is the same as the
number of
> data disks, and make sure that each disk in the vdev is accessed via  
> a unique controller path. Use raidz3 with six disks.  You probably  
> need a lot of vdevs to make this even somewhat cost effective. :-)
Well yes, if you have an equal number of parity disks to data disks it  
would survive, but at that point what''s the cost effectiveness to  
resilency ratio?
> Regardless, mirrors are known to be more resilient to temporary path  
> failures.
As another list member pointed out you could also avoid the issue by  
having a raidz disk per controller. But if I''m buying that kind of big
iron I might just opt for a 3par or emc and save myself the work, and  
probably some $ too.

-Ross

Richard Elling

2009-Sep-17 02:27 UTC

head link

[zfs-discuss] RAIDZ versus mirrroed

On Sep 16, 2009, at 7:17 PM, Ross Walker wrote:>> more resilient to temporary path failures.
>
> As another list member pointed out you could also avoid the issue by  
> having a raidz disk per controller. But if I''m buying that kind of
> big iron I might just opt for a 3par or emc and save myself the  
> work, and probably some $ too.
In general, for SAS or SATA, having separate controllers does little  
to improve
data availability. The reason is because SAS and SATA are point-to-point
or point-to-switch-to-point architectures and you don''t have the  
shared bus
issues that plague parallel SCSI or IDE. The controllers themselves are
approximately an order of magnitude more reliable than your CPU  and are
around two orders of magnitude more reliable than your disk. Put your
redundancy where your reliability is weak (disk), if you want to improve
availability.
http://blogs.sun.com/relling/entry/zfs_raid_recommendations_space_vs

  -- richard

Eugen Leitl

2009-Sep-17 10:16 UTC

head link

[zfs-discuss] RAIDZ versus mirrroed

On Wed, Sep 16, 2009 at 08:02:35PM +0300, Markus Kovero wrote:
> It''s possible to do 3-way (or more) mirrors too, so you may
achieve better redundancy than raidz2/3
I understand there''s almost no additional performance penalty to raidz3
over raidz2 in terms of CPU load. Is that correct?

So SSDs for ZIL/L2ARC don''t bring that much when used with
raidz2/raidz3,
if I write a lot, at least, and don''t access the cache very much,
according
to some recent posts on this list.

How much drive space am I''m losing with mirrored pools versus raidz3?
IIRC
in RAID 10 it''s only 10% over RAID 6, which is why I went for RAID 10
in
my 14-drive SATA (WD RE4) setup.

Let''s assume I want to fill a 24-drive Supermicro chassis with 1 TByte
WD Caviar Black or 2 TByte RE4 drives, and use 4x X25-M 80 GByte
2nd gen Intel consumer drives, mirrored, each pair as ZIL/L2ARC
for the 24 SATA drives behind them. Let''s assume CPU is not an issue,
with dual-socket Nehalems and 24 GByte RAM or more. There are applications
packaged in Solaris containers running on the same box, however.

Let''s say the workload is mostly multiple streams (hundreds to
thousands
simultaneously, some continuous, some bursty) each writing data 
to the storage system. However, some few clients will be using database-like
queries to read, potentially on the entire data store.

With above workload, is raidz2/raid3 right out, and will I need mirrored
pools? 

How would you lay out the pools for above workload, assuming 24 SATA
drives/chassis (24-48 TBytes raw storage), and 80 GByte SSD each for ZIL/L2ARC 
(is that too little?  Would 160 GByte work better?)

Thanks lots.
 
-- 
Eugen* Leitl <a href="http://leitl.org">leitl</a>
http://leitl.org
______________________________________________________________
ICBM: 48.07100, 11.36820 http://www.ativel.com http://postbiota.org
8B29F6BE: 099D 78BA 2FD3 B014 B08A  7779 75B0 2443 8B29 F6BE

Eugen Leitl

2009-Sep-17 10:31 UTC

head link

[zfs-discuss] RAIDZ versus mirrroed

On Wed, Sep 16, 2009 at 10:23:01AM -0700, Richard Elling wrote:
> This line of reasoning doesn''t get you very far.  It is much
better to
> take a look at
> the mean time to data loss (MTTDL) for the various configurations.  I  
> wrote a
> series of blogs to show how this is done.
> http://blogs.sun.com/relling/tags/mttdl
Excellent information, thanks! I presume MTTDL[1] years and
MTTDL[2] is the same as in 
http://blogs.sun.com/relling/entry/a_story_of_two_mttdl 

Do you think it would be possible to publish same information
for 24 drives (not all of us can buy a Thumper), and maybe 
include raidz3 into the number crunch?

Thanks!

-- 
Eugen* Leitl <a href="http://leitl.org">leitl</a>
http://leitl.org
______________________________________________________________
ICBM: 48.07100, 11.36820 http://www.ativel.com http://postbiota.org
8B29F6BE: 099D 78BA 2FD3 B014 B08A  7779 75B0 2443 8B29 F6BE

Tomas Ögren

2009-Sep-17 10:55 UTC

head link

[zfs-discuss] RAIDZ versus mirrroed

On 17 September, 2009 - Eugen Leitl sent me these 2,0K bytes:
> On Wed, Sep 16, 2009 at 08:02:35PM +0300, Markus Kovero wrote:
> 
> > It''s possible to do 3-way (or more) mirrors too, so you may
achieve better redundancy than raidz2/3
> 
> I understand there''s almost no additional performance penalty to
raidz3
> over raidz2 in terms of CPU load. Is that correct?
> 
> So SSDs for ZIL/L2ARC don''t bring that much when used with
raidz2/raidz3,
> if I write a lot, at least, and don''t access the cache very much,
according
> to some recent posts on this list.
> 
> How much drive space am I''m losing with mirrored pools versus
raidz3? IIRC
> in RAID 10 it''s only 10% over RAID 6, which is why I went for RAID
10 in
> my 14-drive SATA (WD RE4) setup.
It''s not a fixed value per technology, it depends on the number of
disks
per group. RAID5/RAIDZ1 "loses" 1 disk worth to parity per group.
RAID6/RAIDZ" loses 2 disks. RAIDZ3 loses 3 disks. Raid1/mirror loses
half the disks. So in your 14 drive case, if you go for one big
raid6/raidz2 setup (which is larger than recommended for performance
reasons), you will lose 2 disks worth of storage to parity leaving 12
disks worth of data. With raid10 you will lose half, 7 disks to
parity/redundancy. With two raidz2 sets, you will get (5+2)+(5+2), that
is 5+5 disks worth of storage and 2+2 disks worth of redundancy. The
actual redudancy/parity is spread over all disks, not like raid3 which
has a dedicated parity disk.

For more info, see for example http://en.wikipedia.org/wiki/RAID

/Tomas
-- 
Tomas ?gren, stric at acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Ume?
`- Sysadmin at {cs,acc}.umu.se

Erik Trimble

2009-Sep-17 11:15 UTC

head link

[zfs-discuss] RAIDZ versus mirrroed

Eugen Leitl wrote:> On Wed, Sep 16, 2009 at 08:02:35PM +0300, Markus Kovero wrote:
>
>   
>> It''s possible to do 3-way (or more) mirrors too, so you may
achieve better redundancy than raidz2/3
>>     
>
> I understand there''s almost no additional performance penalty to
raidz3
> over raidz2 in terms of CPU load. Is that correct?
>   As far as I understand the z3 algorithms, the performance penalty is 
very slightly higher than z2. I think it''s reasonable to treat z1, z2, 
and z3 as equal in terms of CPU load.
> So SSDs for ZIL/L2ARC don''t bring that much when used with
raidz2/raidz3,
> if I write a lot, at least, and don''t access the cache very much,
according
> to some recent posts on this list.
>   Not true.

Remember:  ZIL = write cache
                   L2ARC = read cache

So, if you have a write-heavy workload which seldom does much more than 
large reads, an L2ARC SSD doesn''t make much sense. Main RAM should 
suffice for storing the read cache.

Random reads aren''t fast on RAIDZ, so a read cache is a good thing if 
you are doing that kind of I/O.   Similarly, random writes (particularly 
small random writes) are suck hard on RADIZ, so a write cache is a 
fabulous idea there.

If you are doing very large sequential writes to a RAIDZ (any sort) 
pool, then a write cache will likely be much less helpful.  But 
remember, very large means that you frequently exceed the size of the 
SSD you''ve allocated for the ZIL.   I''d have to run the
numbers, but you
should still see a major performance improvement by using a SSD for ZIL, 
up to the point where your typical write load exceeds 10% of the size of 
SSD.  Naturally, write-heavy workloads will be murder on a MLC or hybrid 
SSD''s life expectancy, though, a large sequential-write-heavy load will
allow the SSD to perform better and longer than a small random write load.

A write SSD will help you up until you try to write to the SSD faster 
than it can flush out it''s contents to actual disk. So, you need to
take
into consideration exactly how much data is coming in, and the write 
speed of your (non-SSD) disks. If you are continuously (and constantly) 
exceeding the speed of your disks with incoming data, then SSDs won''t 
really help. You''ll see some help up until the SSD fills up, then 
performance will drop to equal that as if the SSD didn''t exist. 

Doing [very] rough calculations, let''s say your SSD has a read/write 
throughput of 200MB/s, and is 100GB in size. If your hard drives can 
only do 50MB/s,   then you can write up to 150MB/s to the SSD, read 
50MB/s  from the SSD, and write 50MB/s to the disks.  This means, each 
second, you fill the SSD with 100MB more data that can''t be flushed out
fast enough.  At 100MB/s, it takes 1,000 seconds to fill 100GB. So, in 
about 17 minutes, you''ve completely filled the SSD, and performance 
drops like a rock.   There is a similar cliff problem around IOPS.
> How much drive space am I''m losing with mirrored pools versus
raidz3? IIRC
> in RAID 10 it''s only 10% over RAID 6, which is why I went for RAID
10 in
> my 14-drive SATA (WD RE4) setup.
>   Basic math says for N disks, you get N-3 amount of space for a RAIDZ3, 
and N/2 for a 2-way mirror.  N-3 > N/2 for all N = 6 or more.   But, 
remember, you''ll generally need at least one hot spare for a mirror, so
really, the equations looks like this:

N-3 > (N/2) -1         which means, RAIDZ3 gives you more space for N > 4
> Let''s assume I want to fill a 24-drive Supermicro chassis with 1
TByte
> WD Caviar Black or 2 TByte RE4 drives, and use 4x X25-M 80 GByte
> 2nd gen Intel consumer drives, mirrored, each pair as ZIL/L2ARC
> for the 24 SATA drives behind them. Let''s assume CPU is not an
issue,
> with dual-socket Nehalems and 24 GByte RAM or more. There are applications
> packaged in Solaris containers running on the same box, however.
>   Remember to take a look at Richard''s spreadsheet about drive errors and
the amount of time you can expect to go without serious issue.    He''s 
also got good stuff about optimizing for speed vs space.

http://blogs.sun.com/relling/

Quick math for a 24-drive setup:

Scenario A:    stripe of mirrors, plus global spares.
                      11 x 2-way mirror = 11 disks of data, plus 2 
additional hot spares

Scenario B:    stripe of raidz3, no global spares
                     3 x  8-drive RAIDz3  (5 data + 3 parity drives )=  
3 x 5 = 15 data drives, with a total of 9 internal "spares"

Thus, A gives you about 30% less disk space than B.

> Let''s say the workload is mostly multiple streams (hundreds to
thousands
> simultaneously, some continuous, some bursty) each writing data 
> to the storage system. However, some few clients will be using
database-like
> queries to read, potentially on the entire data store.
>
> With above workload, is raidz2/raid3 right out, and will I need mirrored
> pools? 
>   The database queries will definitely benefit from a L2ARC SSD - the size 
of that SSD depends on exactly how much data the query has to check. If 
it''s just checking metadata (mod times, file sizes, permissions, etc.) 
of lots of files, then you''re probably good with a smaller SSD. If you 
have to actually read large amounts of the streams, then you''re pretty 
well hosed, as your data set is far larger than any cache can hold.
> How would you lay out the pools for above workload, assuming 24 SATA
> drives/chassis (24-48 TBytes raw storage), and 80 GByte SSD each for
ZIL/L2ARC
> (is that too little?  Would 160 GByte work better?)
>
> Thanks lots.
>   I can''t make recommendations about SSD size without much more specific 
numbers about the actual workload.

Look at Richard''s Raid Optimizer output for a 48-disk Thumper/Thor.

http://blogs.sun.com/relling/entry/sample_raidoptimizer_output

It should give you a good idea about IOPS and read/write speeds for 
various configs.

Reading/Writing a large stream is a sequential operation. Bursty 
read/write of a stream looks like random I/O.  But, more importantly, 
the relative size of the stream is important. Whether continuous or 
bursty, the important characteristic is HOW MUCH data needs to be 
written/read at once. Anything under 100k is definitely "random", and 
anything over 10MB is "sequential" (as far as general performance 
goes).  Sizes in between makes it depend on how much other stuff is 
going on (i.e. having 100,000 streams each trying to write 1MB has a 
different impact than 1,000 streams trying to write the same 1MB each).

Personally, I hate to use any form of SATA drive for a heavy random 
write or read workload, even with an SSD.  SAS disks performs so much 
better.

-- 
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA

Eugen Leitl

2009-Sep-17 11:32 UTC

head link

[zfs-discuss] RAIDZ versus mirrroed

On Thu, Sep 17, 2009 at 12:55:35PM +0200, Tomas ?gren wrote:
> It''s not a fixed value per technology, it depends on the number of
disks
> per group. RAID5/RAIDZ1 "loses" 1 disk worth to parity per group.
> RAID6/RAIDZ" loses 2 disks. RAIDZ3 loses 3 disks. Raid1/mirror loses
> half the disks. So in your 14 drive case, if you go for one big
> raid6/raidz2 setup (which is larger than recommended for performance
I presume for 24 disks (my next project, the current 16-disk 
one had to be converted to CentOS for software compatibility reasons) 
you would recommend splitting them into two groups, a la 12 disks. 
With raidz3, there would be 9 disks left for data, 18 total -- 
36 TBytes effective in case of 2 TByte WD RE4 drives, half that 
for WD Caviar Black. How many hot spares should I leave in 
each pool, one or more? 

Is it safe to stripe over two such 12-disk pools? 
Or is mirror the right thing to do, regardless of drive costs?

Speaking of which, does anyone use NFSv4 clustering in production
to aggregate individual zfs boxes? Experiences good/bad?
> reasons), you will lose 2 disks worth of storage to parity leaving 12
> disks worth of data. With raid10 you will lose half, 7 disks to
> parity/redundancy. With two raidz2 sets, you will get (5+2)+(5+2), that
> is 5+5 disks worth of storage and 2+2 disks worth of redundancy. The
> actual redudancy/parity is spread over all disks, not like raid3 which
> has a dedicated parity disk.
So raidz3 has a dedicated parity disk? I couldn''t see that from
skimming http://blogs.sun.com/ahl/entry/triple_parity_raid_z
 > For more info, see for example http://en.wikipedia.org/wiki/RAID
Unfortunately, this is very thin on zfs.

http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide
is very helpful, but it doesn''t offer concrete layout examples for
odd number of disks (understandable, since Sun has to sell the 
Thumper), and is pretty mum on raidz3.

Thank you. This list is fun, and helpful.

-- 
Eugen* Leitl <a href="http://leitl.org">leitl</a>
http://leitl.org
______________________________________________________________
ICBM: 48.07100, 11.36820 http://www.ativel.com http://postbiota.org
8B29F6BE: 099D 78BA 2FD3 B014 B08A  7779 75B0 2443 8B29 F6BE

Darren J Moffat

2009-Sep-17 11:34 UTC

head link

[zfs-discuss] RAIDZ versus mirrroed

Erik Trimble wrote:>> So SSDs for ZIL/L2ARC don''t bring that much when used with
raidz2/raidz3,
>> if I write a lot, at least, and don''t access the cache very
much,
>> according
>> to some recent posts on this list.
>>   
> Not true.
> 
> Remember:  ZIL = write cache
ZIL is NOT a write cache.  The ZIL is the Intent Log not a cache.  It is 
used only for synchronous writes.   It is not a cache because the term 
"cache" implies the data is also somewhere else and you lose nothing
but
potential performance if you loose the cache.

ZFS calls the devices used to hold the ZIL (there is one ZIL per 
dataset) a SLOG (Separate Log device).

Note also the recent addition of the "logbias" dataset property.

-- 
Darren J Moffat

Erik Trimble

2009-Sep-17 16:20 UTC

head link

[zfs-discuss] RAIDZ versus mirrroed

Darren J Moffat wrote:> Erik Trimble wrote:
>>> So SSDs for ZIL/L2ARC don''t bring that much when used with
>>> raidz2/raidz3,
>>> if I write a lot, at least, and don''t access the cache
very much,
>>> according
>>> to some recent posts on this list.
>>>   
>> Not true.
>>
>> Remember:  ZIL = write cache
>
> ZIL is NOT a write cache.  The ZIL is the Intent Log not a cache.  It 
> is used only for synchronous writes.   It is not a cache because the 
> term "cache" implies the data is also somewhere else and you lose
> nothing but potential performance if you loose the cache.
>
> ZFS calls the devices used to hold the ZIL (there is one ZIL per 
> dataset) a SLOG (Separate Log device).
>
> Note also the recent addition of the "logbias" dataset property.
>
I should have more properly used the term "buffer", which is what ZIL
is
more closely related to. Sorry about that - I didn''t mean to imply that
the ZIL was the same as something like a STK6140''s NVRAM.



-- 
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA

Marion Hakanson

2009-Sep-17 18:25 UTC

head link

[zfs-discuss] RAIDZ versus mirrroed

rswwalker at gmail.com said:> It''s not the stripes that make a difference, but the number of
controllers
> there.
> 
> What''s the system config on that puppy? 
The "zpool status -v" output was from a Thumper (X4500), slightly
edited,
since in our real-world Thumper, we use c6t0d0 in c5t4d0''s place in the
"optimal" layout I posted, because c5t4d0 is used in the boot-drive
mirror.

See the following for our 2006 Thumper benchmarks, which appear to bear
out Richard Elling''s RaidOptimizer analysis:
	http://acc.ohsu.edu/~hakansom/thumper_bench.html

While I''m at it, filebench numbers from a recent J4400-based database
server deployment, with some "slog vs no-slog" comparisons (sorry, no
SSD''s available here yet):
	http://acc.ohsu.edu/~hakansom/j4400_bench.html

Regards,

Marion

Adam Leventhal

2009-Sep-17 18:41 UTC

head link

[zfs-discuss] RAIDZ versus mirrroed

On Thu, Sep 17, 2009 at 01:32:43PM +0200, Eugen Leitl
wrote:> > reasons), you will lose 2 disks worth of storage to parity leaving 12
> > disks worth of data. With raid10 you will lose half, 7 disks to
> > parity/redundancy. With two raidz2 sets, you will get (5+2)+(5+2),
that
> > is 5+5 disks worth of storage and 2+2 disks worth of redundancy. The
> > actual redudancy/parity is spread over all disks, not like raid3 which
> > has a dedicated parity disk.
> 
> So raidz3 has a dedicated parity disk? I couldn''t see that from
> skimming http://blogs.sun.com/ahl/entry/triple_parity_raid_z
Note that Tomas was talking about RAID-3 not raidz3. To summarize the RAID
levels:

  RAID-0	striping
  RAID-1	mirror
  RAID-2	ECC (basically not used)
  RAID-3	bit-interleaved parity (basically not used)
  RAID-4	block-interleaved parity
  RAID-5	block-interleaved distributed parity
  RAID-6	block-interleaved double distributed parity

raidz1 is most like RAID-5; raidz2 is most like RAID-6. There''s no RAID
level that covers more than two parity disks, but raidz3 is most like RAID-6,
but with triple distributed parity.

Adam

-- 
Adam Leventhal, Fishworks                     http://blogs.sun.com/ahl

Brandon High

2009-Sep-18 07:10 UTC

head link

[zfs-discuss] RAIDZ versus mirrroed

On Thu, Sep 17, 2009 at 11:41 AM, Adam Leventhal <ahl at eng.sun.com>
wrote:> ?RAID-3 ? ? ? ?bit-interleaved parity (basically not used)
There was a hardware RAID chipset that used RAID-3. Netcell Revolution
I think it was called.

It looked interesting and I thought about grabbing one at the time but
never got around to it. Netcell is defunct or got bought out, so the
controller is no longer available.

-B

-- 
Brandon High : bhigh at freaks.com
Always try to do things in chronological order; it''s less confusing
that way.

Bill Sommerfeld

2009-Sep-18 21:26 UTC

head link

[zfs-discuss] RAIDZ versus mirrroed

On Wed, 2009-09-16 at 14:19 -0700, Richard Elling wrote:> Actually, I had a ton of data on resilvering which shows mirrors and
> raidz equivalently bottlenecked on the media write bandwidth. However,
> there are other cases which are IOPS bound (or CR bound :-) which
> cover some of the postings here. I think Sommerfeld has some other
> data which could be pertinent.
I''m not sure I have data, but I have anecdotes and observations, and a
few large production pools used for solaris development by me and my
coworkers.

the biggest one (by disk count) takes 80-100 hours to scrub and/or
resilver.

my working hypothesis is that resilver of pools which:
 1) have a lot of files, directories, filesystems, and periodic
snapshots
 2) have atime updates enabled (default config)
 3) have regular (daily) jobs doing large-scale filesystem tree-walks

wind up rewriting most blocks of the dnode files on every tree walk
doing atime updates, and as a result the dnode file (but not most of the
blocks it points to) differs greatly from daily snapshot to daily
snapshot.

as a result, scrub/resilver traversals end up spending most of their 
time doing random reads of the dnode files of each snapshot.

here are some bugs that, if fixed, might help:

6678033 resilver code should prefetch
6730737 investigate colocating directory dnodes

						- Bill

zfs discuss - Sep 2009 - RAIDZ versus mirrroed

[zfs-discuss] RAIDZ versus mirrroed

[zfs-discuss] RAIDZ versus mirrroed

[zfs-discuss] RAIDZ versus mirrroed

[zfs-discuss] RAIDZ versus mirrroed

[zfs-discuss] RAIDZ versus mirrroed

[zfs-discuss] RAIDZ versus mirrroed

[zfs-discuss] RAIDZ versus mirrroed

[zfs-discuss] RAIDZ versus mirrroed

[zfs-discuss] RAIDZ versus mirrroed

[zfs-discuss] RAIDZ versus mirrroed

[zfs-discuss] RAIDZ versus mirrroed

[zfs-discuss] RAIDZ versus mirrroed

[zfs-discuss] RAIDZ versus mirrroed

[zfs-discuss] RAIDZ versus mirrroed

[zfs-discuss] RAIDZ versus mirrroed

[zfs-discuss] RAIDZ versus mirrroed

[zfs-discuss] RAIDZ versus mirrroed

[zfs-discuss] RAIDZ versus mirrroed

[zfs-discuss] RAIDZ versus mirrroed

[zfs-discuss] RAIDZ versus mirrroed

[zfs-discuss] RAIDZ versus mirrroed

[zfs-discuss] RAIDZ versus mirrroed

[zfs-discuss] RAIDZ versus mirrroed

[zfs-discuss] RAIDZ versus mirrroed

[zfs-discuss] RAIDZ versus mirrroed

[zfs-discuss] RAIDZ versus mirrroed

[zfs-discuss] RAIDZ versus mirrroed

[zfs-discuss] RAIDZ versus mirrroed

[zfs-discuss] RAIDZ versus mirrroed

[zfs-discuss] RAIDZ versus mirrroed

[zfs-discuss] RAIDZ versus mirrroed

[zfs-discuss] RAIDZ versus mirrroed

[zfs-discuss] RAIDZ versus mirrroed

[zfs-discuss] RAIDZ versus mirrroed

[zfs-discuss] RAIDZ versus mirrroed

[zfs-discuss] RAIDZ versus mirrroed

[zfs-discuss] RAIDZ versus mirrroed

[zfs-discuss] RAIDZ versus mirrroed

[zfs-discuss] RAIDZ versus mirrroed

[zfs-discuss] RAIDZ versus mirrroed