thr3ads.net - zfs discuss - [zfs-discuss] Raid-Z with N^2+1 disks [Jul 2008]

If this information is useful, please help other people find it:
Share via:

Søren Ragsdale

2008-Jul-14 23:47 UTC

[zfs-discuss] Raid-Z with N^2+1 disks

I was reading some ZFS pages on another discussion list I found this comment by
few people who may or may not know something:

http://episteme.arstechnica.com/eve/forums/a/tpc/f/24609792/m/956007108831

[i]"You tend to get better write performance when the number of disks in
the raid is (a power of 2) + 1, as the parity calculations are more efficient.
...
5 drives is better than 4 because you are more likely to write an entire stripe
if it is the same size as your IO block, which is a power of 2. There is a
thread around here somewhere that talks about that. RAID-Z always writes a full
stripe, so that may not apply as much."[/i]

It sounds like they''re talking more about traditional hardware RAID but
is this also true for ZFS?  Right now I''ve got four 750GB drives that
I''m planning to use in a raid-z 3+1 array.  Will I get markedly better
performance with 5 drives (2^2+1) or 6 drives 2*(2^1+1) because the parity
calculations are more efficient across N^2 drives?
 
 
This message posted from opensolaris.org

Daniel Phillips

2008-Jul-14 23:54 UTC

head link

[zfs-discuss] Raid-Z with N^2+1 disks

Nit: you meant 2^N + 1 I believe.

Daniel

Bob Friesenhahn

2008-Jul-15 00:49 UTC

head link

[zfs-discuss] Raid-Z with N^2+1 disks

On Mon, 14 Jul 2008, [UTF-8] S?ren Ragsdale wrote:

It seems to me that the blanket 8% improvement statement by ''Black 
Jacque'' is clearly the most technically correct even though it is a 
not a serious answer.
> [i]"You tend to get better write performance when the number of 
> disks in the raid is (a power of 2) + 1, as the parity calculations 
> are more efficient. ... 5 drives is better than 4 because you are 
> more likely to write an entire stripe if it is the same size as your 
> IO block, which is a power of 2. There is a thread around here 
> somewhere that talks about that. RAID-Z always writes a full stripe, 
> so that may not apply as much."[/i]
This seems like a bunch of hog-wash to me.  Any time you see even a 
single statement which is incorrect, it is best to ignore that forum 
poster entirely and if no one corrects him, then ignore the entire 
forum.  For example, from what I have read, RAID-Z does not always 
write a full stripe.  While RAID-Z uses parity similar to RAID-5 and 
writes a stripe across disks, it does not need to write across all the 
disks like RAID-5 does.  If you had 10 disks, it might decide to 
stripe across 6 disks per block write.
> It sounds like they''re talking more about traditional hardware
RAID
> but is this also true for ZFS?  Right now I''ve got four 750GB
drives
> that I''m planning to use in a raid-z 3+1 array.  Will I get
markedly
> better performance with 5 drives (2^2+1) or 6 drives 2*(2^1+1) 
> because the parity calculations are more efficient across N^2 
> drives?
With ZFS and modern CPUs, the parity calculation is surely in the 
noise to the point of being unmeasurable.  Even the ZFS fletcher 
checksum algorithm is an order or two of magnitude more costly than 
the parity calculation.

If you can afford the extra drives, then you can use them to obtain 
more performance.  The solution to more performance (depending on the 
type of performance you are looking for) may be to create more small 
VDEVs using mirrors, RAID-Z, or RAID-Z2 and load share across them. 
With ZFS, the mirrored configuration is usually fastest since it uses 
the least device IOPS and does not split up the blocks.

ZFS is really easy to try out various incantations with so if you have 
the drives available there is no excuse to not test it for yourself 
with your own workload.

Bob

2008-Jul-15 02:13 UTC

head link

[zfs-discuss] Raid-Z with 2^N+1 disks

> Will I get markedly better performance with 5 drives (2^2+1) or 6 drives
2*(2^1+1) because the parity calculations are more efficient across 2^N drives?
If only parity calculations stand to benefit, then it wouldn''t make a
difference because your CPU is more than powerful enough to take care of it
either way ;)
 
 
This message posted from opensolaris.org

Frank Cusack

2008-Jul-15 04:54 UTC

head link

[zfs-discuss] Raid-Z with N^2+1 disks

On July 14, 2008 7:49:58 PM -0500 Bob Friesenhahn 
<bfriesen at simple.dallas.tx.us> wrote:> This seems like a bunch of hog-wash to me.  Any time you see even a
> single statement which is incorrect, it is best to ignore that forum
> poster entirely and if no one corrects him, then ignore the entire forum.
I don''t know what "forums" you''re on, but in my
experience that would
eliminate approximately 100% of them.
> For example, from what I have read, RAID-Z does not always write a full
> stripe.  While RAID-Z uses parity similar to RAID-5 and writes a stripe
> across disks, it does not need to write across all the disks like RAID-5
> does.  If you had 10 disks, it might decide to stripe across 6 disks per
> block write.
Which is a "full stripe" in that the parity calculation covers the
entire write.
>> It sounds like they''re talking more about traditional hardware
RAID
>> but is this also true for ZFS?  Right now I''ve got four 750GB
drives
>> that I''m planning to use in a raid-z 3+1 array.  Will I get
markedly
>> better performance with 5 drives (2^2+1) or 6 drives 2*(2^1+1)
>> because the parity calculations are more efficient across N^2
>> drives?
>
> With ZFS and modern CPUs, the parity calculation is surely in the noise
> to the point of being unmeasurable.
I would agree with that.  The parity calculation has *never* been a
factor in and of itself.  The problem is having to read the rest of
the stripe and then having to wait for a disk revolution before writing.

-frank

Anton B. Rang

2008-Jul-15 19:06 UTC

head link

[zfs-discuss] Raid-Z with N^2+1 disks

One nit ... the parity computation is ''in the noise'' as far as
the CPU goes, but it tends to flush the CPU caches (or rather, replace useful
cached data with parity), which affects application performance.

Modern CPU architectures (including x86/SPARC) provide instructions which allow
data to be streamed through the cache without actually replacing everything
else, but Solaris doesn''t take advantage of these yet.
 
 
This message posted from opensolaris.org

Frank Cusack

2008-Jul-16 03:41 UTC

head link

[zfs-discuss] Raid-Z with N^2+1 disks

On July 14, 2008 9:54:43 PM -0700 Frank Cusack <fcusack at fcusack.com>
wrote:> On July 14, 2008 7:49:58 PM -0500 Bob Friesenhahn 
<bfriesen at simple.dallas.tx.us> wrote:>>> It sounds like they''re talking more about traditional
hardware RAID
>>> but is this also true for ZFS?  Right now I''ve got four
750GB drives
>>> that I''m planning to use in a raid-z 3+1 array.  Will I
get markedly
>>> better performance with 5 drives (2^2+1) or 6 drives 2*(2^1+1)
>>> because the parity calculations are more efficient across N^2
>>> drives?
>>
>> With ZFS and modern CPUs, the parity calculation is surely in the noise
>> to the point of being unmeasurable.
>
> I would agree with that.  The parity calculation has *never* been a
> factor in and of itself.  The problem is having to read the rest of
> the stripe and then having to wait for a disk revolution before writing.
oh, you know what though?  raid-z had this bug, or maybe we should just
call it a behavior, where you only want an {even,odd} number of drives
in the vdev.  I can''t remember if it was even or odd.  Or maybe it was
that you wanted only N^2+1 disks, choose any N.  Otherwise you had
suboptimal performance in certain cases.  I can''t remember the exact
details but it wasn''t because of "more efficient parity
calculations".
Maybe something about block sizes having to be powers of two and the
wrong number of disks forcing a read?

Anybody know what I''m referring to?  Has it been fixed?  I see the
zfs best practices guide says to use only odd numbers of disks, but
it doesn''t say why.  (don''t you hate that?)

-frank

Richard Elling

2008-Jul-16 15:29 UTC

head link

[zfs-discuss] Raid-Z with N^2+1 disks

Frank Cusack wrote:> On July 14, 2008 9:54:43 PM -0700 Frank Cusack <fcusack at
fcusack.com> wrote:
>   
>> On July 14, 2008 7:49:58 PM -0500 Bob Friesenhahn 
>>     
> <bfriesen at simple.dallas.tx.us> wrote:
>   
>>>> It sounds like they''re talking more about traditional
hardware RAID
>>>> but is this also true for ZFS?  Right now I''ve got
four 750GB drives
>>>> that I''m planning to use in a raid-z 3+1 array.  Will
I get markedly
>>>> better performance with 5 drives (2^2+1) or 6 drives 2*(2^1+1)
>>>> because the parity calculations are more efficient across N^2
>>>> drives?
>>>>         
>>> With ZFS and modern CPUs, the parity calculation is surely in the
noise
>>> to the point of being unmeasurable.
>>>       
>> I would agree with that.  The parity calculation has *never* been a
>> factor in and of itself.  The problem is having to read the rest of
>> the stripe and then having to wait for a disk revolution before
writing.
>>     
>
> oh, you know what though?  raid-z had this bug, or maybe we should just
> call it a behavior, where you only want an {even,odd} number of drives
> in the vdev.  I can''t remember if it was even or odd.  Or maybe it
was
> that you wanted only N^2+1 disks, choose any N.  Otherwise you had
> suboptimal performance in certain cases.  I can''t remember the
exact
> details but it wasn''t because of "more efficient parity
calculations".
> Maybe something about block sizes having to be powers of two and the
> wrong number of disks forcing a read?
>
> Anybody know what I''m referring to?  Has it been fixed?  I see the
> zfs best practices guide says to use only odd numbers of disks, but
> it doesn''t say why.  (don''t you hate that?)
>   
See the "Metaslab alignment" thread.
http://www.opensolaris.org/jive/thread.jspa?messageID=60241&#60241
 -- richard

David Magda

2008-Jul-16 23:11 UTC

head link

[zfs-discuss] Raid-Z with N^2+1 disks

On Jul 14, 2008, at 20:49, Bob Friesenhahn wrote:
> Any time you see even a single statement which is incorrect, it is  
> best to ignore that forum poster entirely and if no one corrects  
> him, then ignore the entire forum.
Yes, because each and every one of us must correct inaccuracies on the  
Internet:

http://xkcd.com/386/

	:)

Rob Clark

2008-Jul-20 02:01 UTC

head link

[zfs-discuss] Raid-Z with N^2+1 disks

> On July 14, 2008 7:49:58 PM -0500 Bob Friesenhahn 
> <bfriesen at simple.dallas.tx.us> wrote:
> > With ZFS and modern CPUs, the parity calculation is
> surely in the noise to the point of being unmeasurable.
> 
> I would agree with that.  The parity calculation has *never* been a 
> factor in and of itself.  The problem is having to read the rest of
> the stripe and then having to wait for a disk revolution before writing.
> -frank
And this is where a HW RAID controller comes in. We hope it has a uP for
the calculations, full knowledge of the head positions, and a list of free 
blocks -- then it simply chooses one of the drives that suit the criteria 
for the RAID level used and writes immediately to the free block under 
one of the heads. If only ...

Maybe in a few years Sun will make a HW RAID controller using ZFS once 
we all get the bugs out. With Flash updates this should work wonderfully.
 
 
This message posted from opensolaris.org

Richard Elling

2008-Jul-21 06:06 UTC

head link

[zfs-discuss] Raid-Z with N^2+1 disks

Rob Clark wrote:>> On July 14, 2008 7:49:58 PM -0500 Bob Friesenhahn 
>> <bfriesen at simple.dallas.tx.us> wrote:
>>     
>>> With ZFS and modern CPUs, the parity calculation is
>>>       
>> surely in the noise to the point of being unmeasurable.
>>
>> I would agree with that.  The parity calculation has *never* been a 
>> factor in and of itself.  The problem is having to read the rest of
>> the stripe and then having to wait for a disk revolution before
writing.
>> -frank
>>     
>
> And this is where a HW RAID controller comes in. We hope it has a uP for
> the calculations, full knowledge of the head positions, and a list of free 
> blocks -- then it simply chooses one of the drives that suit the criteria 
> for the RAID level used and writes immediately to the free block under 
> one of the heads. If only ...
>
> Maybe in a few years Sun will make a HW RAID controller using ZFS once 
> we all get the bugs out. With Flash updates this should work wonderfully.
>   
Given that a general-purpose CPU today tends to offer much better
performance than embedded processors and the cost of developing
special-purpose processors is high, how would you define the next
generation "HW RAID" controller?
 -- richard

zfs discuss - Jul 2008 - Raid-Z with N^2+1 disks

[zfs-discuss] Raid-Z with N^2+1 disks

[zfs-discuss] Raid-Z with N^2+1 disks

[zfs-discuss] Raid-Z with N^2+1 disks

[zfs-discuss] Raid-Z with 2^N+1 disks

[zfs-discuss] Raid-Z with N^2+1 disks

[zfs-discuss] Raid-Z with N^2+1 disks

[zfs-discuss] Raid-Z with N^2+1 disks

[zfs-discuss] Raid-Z with N^2+1 disks

[zfs-discuss] Raid-Z with N^2+1 disks

[zfs-discuss] Raid-Z with N^2+1 disks

[zfs-discuss] Raid-Z with N^2+1 disks