I was reading some ZFS pages on another discussion list I found this comment by few people who may or may not know something: http://episteme.arstechnica.com/eve/forums/a/tpc/f/24609792/m/956007108831 [i]"You tend to get better write performance when the number of disks in the raid is (a power of 2) + 1, as the parity calculations are more efficient. ... 5 drives is better than 4 because you are more likely to write an entire stripe if it is the same size as your IO block, which is a power of 2. There is a thread around here somewhere that talks about that. RAID-Z always writes a full stripe, so that may not apply as much."[/i] It sounds like they''re talking more about traditional hardware RAID but is this also true for ZFS? Right now I''ve got four 750GB drives that I''m planning to use in a raid-z 3+1 array. Will I get markedly better performance with 5 drives (2^2+1) or 6 drives 2*(2^1+1) because the parity calculations are more efficient across N^2 drives? This message posted from opensolaris.org
Nit: you meant 2^N + 1 I believe. Daniel
On Mon, 14 Jul 2008, [UTF-8] S?ren Ragsdale wrote: It seems to me that the blanket 8% improvement statement by ''Black Jacque'' is clearly the most technically correct even though it is a not a serious answer.> [i]"You tend to get better write performance when the number of > disks in the raid is (a power of 2) + 1, as the parity calculations > are more efficient. ... 5 drives is better than 4 because you are > more likely to write an entire stripe if it is the same size as your > IO block, which is a power of 2. There is a thread around here > somewhere that talks about that. RAID-Z always writes a full stripe, > so that may not apply as much."[/i]This seems like a bunch of hog-wash to me. Any time you see even a single statement which is incorrect, it is best to ignore that forum poster entirely and if no one corrects him, then ignore the entire forum. For example, from what I have read, RAID-Z does not always write a full stripe. While RAID-Z uses parity similar to RAID-5 and writes a stripe across disks, it does not need to write across all the disks like RAID-5 does. If you had 10 disks, it might decide to stripe across 6 disks per block write.> It sounds like they''re talking more about traditional hardware RAID > but is this also true for ZFS? Right now I''ve got four 750GB drives > that I''m planning to use in a raid-z 3+1 array. Will I get markedly > better performance with 5 drives (2^2+1) or 6 drives 2*(2^1+1) > because the parity calculations are more efficient across N^2 > drives?With ZFS and modern CPUs, the parity calculation is surely in the noise to the point of being unmeasurable. Even the ZFS fletcher checksum algorithm is an order or two of magnitude more costly than the parity calculation. If you can afford the extra drives, then you can use them to obtain more performance. The solution to more performance (depending on the type of performance you are looking for) may be to create more small VDEVs using mirrors, RAID-Z, or RAID-Z2 and load share across them. With ZFS, the mirrored configuration is usually fastest since it uses the least device IOPS and does not split up the blocks. ZFS is really easy to try out various incantations with so if you have the drives available there is no excuse to not test it for yourself with your own workload. Bob
> Will I get markedly better performance with 5 drives (2^2+1) or 6 drives 2*(2^1+1) because the parity calculations are more efficient across 2^N drives?If only parity calculations stand to benefit, then it wouldn''t make a difference because your CPU is more than powerful enough to take care of it either way ;) This message posted from opensolaris.org
On July 14, 2008 7:49:58 PM -0500 Bob Friesenhahn <bfriesen at simple.dallas.tx.us> wrote:> This seems like a bunch of hog-wash to me. Any time you see even a > single statement which is incorrect, it is best to ignore that forum > poster entirely and if no one corrects him, then ignore the entire forum.I don''t know what "forums" you''re on, but in my experience that would eliminate approximately 100% of them.> For example, from what I have read, RAID-Z does not always write a full > stripe. While RAID-Z uses parity similar to RAID-5 and writes a stripe > across disks, it does not need to write across all the disks like RAID-5 > does. If you had 10 disks, it might decide to stripe across 6 disks per > block write.Which is a "full stripe" in that the parity calculation covers the entire write.>> It sounds like they''re talking more about traditional hardware RAID >> but is this also true for ZFS? Right now I''ve got four 750GB drives >> that I''m planning to use in a raid-z 3+1 array. Will I get markedly >> better performance with 5 drives (2^2+1) or 6 drives 2*(2^1+1) >> because the parity calculations are more efficient across N^2 >> drives? > > With ZFS and modern CPUs, the parity calculation is surely in the noise > to the point of being unmeasurable.I would agree with that. The parity calculation has *never* been a factor in and of itself. The problem is having to read the rest of the stripe and then having to wait for a disk revolution before writing. -frank
One nit ... the parity computation is ''in the noise'' as far as the CPU goes, but it tends to flush the CPU caches (or rather, replace useful cached data with parity), which affects application performance. Modern CPU architectures (including x86/SPARC) provide instructions which allow data to be streamed through the cache without actually replacing everything else, but Solaris doesn''t take advantage of these yet. This message posted from opensolaris.org
On July 14, 2008 9:54:43 PM -0700 Frank Cusack <fcusack at fcusack.com> wrote:> On July 14, 2008 7:49:58 PM -0500 Bob Friesenhahn<bfriesen at simple.dallas.tx.us> wrote:>>> It sounds like they''re talking more about traditional hardware RAID >>> but is this also true for ZFS? Right now I''ve got four 750GB drives >>> that I''m planning to use in a raid-z 3+1 array. Will I get markedly >>> better performance with 5 drives (2^2+1) or 6 drives 2*(2^1+1) >>> because the parity calculations are more efficient across N^2 >>> drives? >> >> With ZFS and modern CPUs, the parity calculation is surely in the noise >> to the point of being unmeasurable. > > I would agree with that. The parity calculation has *never* been a > factor in and of itself. The problem is having to read the rest of > the stripe and then having to wait for a disk revolution before writing.oh, you know what though? raid-z had this bug, or maybe we should just call it a behavior, where you only want an {even,odd} number of drives in the vdev. I can''t remember if it was even or odd. Or maybe it was that you wanted only N^2+1 disks, choose any N. Otherwise you had suboptimal performance in certain cases. I can''t remember the exact details but it wasn''t because of "more efficient parity calculations". Maybe something about block sizes having to be powers of two and the wrong number of disks forcing a read? Anybody know what I''m referring to? Has it been fixed? I see the zfs best practices guide says to use only odd numbers of disks, but it doesn''t say why. (don''t you hate that?) -frank
Frank Cusack wrote:> On July 14, 2008 9:54:43 PM -0700 Frank Cusack <fcusack at fcusack.com> wrote: > >> On July 14, 2008 7:49:58 PM -0500 Bob Friesenhahn >> > <bfriesen at simple.dallas.tx.us> wrote: > >>>> It sounds like they''re talking more about traditional hardware RAID >>>> but is this also true for ZFS? Right now I''ve got four 750GB drives >>>> that I''m planning to use in a raid-z 3+1 array. Will I get markedly >>>> better performance with 5 drives (2^2+1) or 6 drives 2*(2^1+1) >>>> because the parity calculations are more efficient across N^2 >>>> drives? >>>> >>> With ZFS and modern CPUs, the parity calculation is surely in the noise >>> to the point of being unmeasurable. >>> >> I would agree with that. The parity calculation has *never* been a >> factor in and of itself. The problem is having to read the rest of >> the stripe and then having to wait for a disk revolution before writing. >> > > oh, you know what though? raid-z had this bug, or maybe we should just > call it a behavior, where you only want an {even,odd} number of drives > in the vdev. I can''t remember if it was even or odd. Or maybe it was > that you wanted only N^2+1 disks, choose any N. Otherwise you had > suboptimal performance in certain cases. I can''t remember the exact > details but it wasn''t because of "more efficient parity calculations". > Maybe something about block sizes having to be powers of two and the > wrong number of disks forcing a read? > > Anybody know what I''m referring to? Has it been fixed? I see the > zfs best practices guide says to use only odd numbers of disks, but > it doesn''t say why. (don''t you hate that?) >See the "Metaslab alignment" thread. http://www.opensolaris.org/jive/thread.jspa?messageID=60241 -- richard
On Jul 14, 2008, at 20:49, Bob Friesenhahn wrote:> Any time you see even a single statement which is incorrect, it is > best to ignore that forum poster entirely and if no one corrects > him, then ignore the entire forum.Yes, because each and every one of us must correct inaccuracies on the Internet: http://xkcd.com/386/ :)
> On July 14, 2008 7:49:58 PM -0500 Bob Friesenhahn > <bfriesen at simple.dallas.tx.us> wrote: > > With ZFS and modern CPUs, the parity calculation is > surely in the noise to the point of being unmeasurable. > > I would agree with that. The parity calculation has *never* been a > factor in and of itself. The problem is having to read the rest of > the stripe and then having to wait for a disk revolution before writing. > -frankAnd this is where a HW RAID controller comes in. We hope it has a uP for the calculations, full knowledge of the head positions, and a list of free blocks -- then it simply chooses one of the drives that suit the criteria for the RAID level used and writes immediately to the free block under one of the heads. If only ... Maybe in a few years Sun will make a HW RAID controller using ZFS once we all get the bugs out. With Flash updates this should work wonderfully. This message posted from opensolaris.org
Rob Clark wrote:>> On July 14, 2008 7:49:58 PM -0500 Bob Friesenhahn >> <bfriesen at simple.dallas.tx.us> wrote: >> >>> With ZFS and modern CPUs, the parity calculation is >>> >> surely in the noise to the point of being unmeasurable. >> >> I would agree with that. The parity calculation has *never* been a >> factor in and of itself. The problem is having to read the rest of >> the stripe and then having to wait for a disk revolution before writing. >> -frank >> > > And this is where a HW RAID controller comes in. We hope it has a uP for > the calculations, full knowledge of the head positions, and a list of free > blocks -- then it simply chooses one of the drives that suit the criteria > for the RAID level used and writes immediately to the free block under > one of the heads. If only ... > > Maybe in a few years Sun will make a HW RAID controller using ZFS once > we all get the bugs out. With Flash updates this should work wonderfully. >Given that a general-purpose CPU today tends to offer much better performance than embedded processors and the cost of developing special-purpose processors is high, how would you define the next generation "HW RAID" controller? -- richard