Hello, After being immersed in this list and other ZFS sites for the past few weeks I am having some doubts about the zpool layout on my new server. It''s not too late to make a change so I thought I would ask for comments. My current plan to to have 12 x 1.5 TB disks in a what I would normally call a RAID 10 configuration. That doesn''t seem to be the right term here, but there are 6 sets of mirrored disks striped together. I know that "smaller" sets of disks are preferred, but how small is small? I am wondering if I should break this into two sets of 6 disks. I do have a 13th disk available as a hot spare. Would it be available for either pool if I went with two? Finally, would I be better off with raidz2 or something else instead of the striped mirrored sets? Performance and fault tolerance are my highest priorities. Thank you, Chris Dunbar
12 disks in mirrored pairs is a small configuration. The "smaller" sets you referrer to might be the number of disks in a raidz/raidz2/raidz3 top level vdev. You say performance is one of your top priorities but what is the workload ? Mostly read ? Mostly write ? Random ? Sequential ? See the ZFS Best Practices guide on the solarisinternals.com site for guidance on how to select your pool layout. http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide In particular this part: http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide#Storage_Pool_Performance_Considerations -- Darren J Moffat
You will get much better random IO with mirrors, and better reliability when a disk fails with raidz2. Six sets of mirrors are fine for a pool. From what I have read, a hot spare can be shared across pools. I think the correct term would be "load balanced mirrors", vs RAID 10. What kind of performance do you need? Maybe raidz2 will give you the performance you need. Maybe not. Measure the performance of each configuration and decide for yourself. I am a big fan of iometer for this type of work. -Scott -- This message posted from opensolaris.org
On Fri, Mar 19, 2010 at 5:32 AM, Chris Dunbar - Earthside, LLC < cdunbar at earthside.net> wrote:> if I went with two? Finally, would I be better off with raidz2 or something > else instead of the striped mirrored sets? Performance and fault tolerance > are my highest priorities. >Performance and fault tolerance are somewhat conflicting. You''ll have good fault tolerance and performance using a wide raidz3 stripe, eg: 12-disk raidz3 with a spare. You''ll have the best fault tolerance using small raidz3 stripes with a spare, for instance 2 x 6-disk raidz3. This uses 50% of your disks for redundancy. You''ll have slightly better performance and slightly worse fault tolerance using raidz2 instead in both cases above. I would not recommend using raidz, as it will offer almost no real fault tolerance with the size of drives you''re using. You''ll have your best performance and fault tolerance using 3-way mirrors, but you sacrifice 2/3 of your disks to do it. Actually, I think that raidz3 is higher tolerance still, but the performance difference will be huge. 2-way mirrors is slightly worse for fault tolerance (below raidz2 I believe) and good performance. -B -- Brandon High : bhigh at freaks.com -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100319/444e4f6a/attachment.html>
Brandon High wrote:> On Fri, Mar 19, 2010 at 5:32 AM, Chris Dunbar - Earthside, LLC > <cdunbar at earthside.net <mailto:cdunbar at earthside.net>> wrote: > > if I went with two? Finally, would I be better off with raidz2 or > something else instead of the striped mirrored sets? Performance > and fault tolerance are my highest priorities. > > > Performance and fault tolerance are somewhat conflicting. > > You''ll have good fault tolerance and performance using a wide raidz3 > stripe, eg: 12-disk raidz3 with a spare. >Actually, except on certain loads (large, streaming write/read), this config is going to give pretty poor performance.> You''ll have the best fault tolerance using small raidz3 stripes with a > spare, for instance 2 x 6-disk raidz3. This uses 50% of your disks for > redundancy. > > You''ll have slightly better performance and slightly worse fault > tolerance using raidz2 instead in both cases above. I would not > recommend using raidz, as it will offer almost no real fault tolerance > with the size of drives you''re using. >Realistically, a 2 x 6-disk raidz2 with a hot spare will provide /almost/ the same level of redundancy as 2 x 6-disk raidz3, and about 30% better performance and space. (he said he had 13 disks)> You''ll have your best performance and fault tolerance using 3-way > mirrors, but you sacrifice 2/3 of your disks to do it. Actually, I > think that raidz3 is higher tolerance still, but the performance > difference will be huge. > > 2-way mirrors is slightly worse for fault tolerance (below raidz2 I > believe) and good performance.Yes - see my followup post for percentages of failures. -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA
Chris Dunbar - Earthside, LLC wrote:> Hello, > > After being immersed in this list and other ZFS sites for the past few weeks I am having some doubts about the zpool layout on my new server. It''s not too late to make a change so I thought I would ask for comments. My current plan to to have 12 x 1.5 TB disks in a what I would normally call a RAID 10 configuration. That doesn''t seem to be the right term here, but there are 6 sets of mirrored disks striped together. I know that "smaller" sets of disks are preferred, but how small is small? I am wondering if I should break this into two sets of 6 disks. I do have a 13th disk available as a hot spare. Would it be available for either pool if I went with two? Finally, would I be better off with raidz2 or something else instead of the striped mirrored sets? Performance and fault tolerance are my highest priorities. > > Thank you, > Chris DunbarThere''s not much benefit I can see to having two pools if both are using the same configuration (i.e all mirrors or all raidz). There are reasons to do so, but I don''t see that they would be of any real benefit for what you describe. A Hot spare disk can be assigned to multiple pools (often referred to as a "global" hot spare) Preferences for raidz[123] configs is to have 4-6 data disks in the vdev. Realistically speaking, you have several different (practical) configurations possible, in order of general performance: (a) 6 x 2-way mirrors + 1 pool hot spare -> 9TB usable (b) 4 x 3-ways mirrors + 1 pool hot spare -> 6TB usable (c) 1 6-disk raidz + 1 7-disk raidz -> 16.5TB usable (d) 2 6-disk raidz + 1 pool hot spare -> 15TB usable (e) 1 6-disk raidz2 + 1 7-disk raidz2 -> 13.5TB usable (f) 2 6-disk raidz2 + 1 pool hot spare -> 12TB usable (g) 1 6-disk raidz3 + 1 7-disk raidz3 -> 10.5TB usable (h) 1 13-disk raidz3 -> 15TB usable Given the size of your disks, resilvering is likely to have a significant time problem in any RAIDZ[123] configuration. That is, unless you are storing (almost exclusively) very large files, resilver time is going to be significant, and can potentially be radically higher than a mirrored config. The mirroring configs will out-perform raidz[123] on everything except large streaming write/reads, and even then, it''s a toss-up. Overall, the (a), (d), and (f) configurations generally offer the best balance of redundancy, space, and performance. Here''s the chances to survive disk failures (assuming hot spares are unable to be used; that is, all disk failures happen in a short period of time) - note that all three can always survive a single disk failure: (a) 90% for 2, 73% for 3, 49% for 4, 25% for 5. (d) 55% for 2, 27% for 3, 0% for 4 or more (f) 100% for 2, 80% for 3, 56% for 4, 0% for 5. Depending on your exact requirements, I''d go with (a) or (f) as the best choices - (a) if performance is more important, (f) if redundancy overrides performance. -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA
On Mar 19, 2010, at 5:32 AM, Chris Dunbar - Earthside, LLC wrote:> Hello, > > After being immersed in this list and other ZFS sites for the past few weeks I am having some doubts about the zpool layout on my new server. It''s not too late to make a change so I thought I would ask for comments. My current plan to to have 12 x 1.5 TB disks in a what I would normally call a RAID 10 configuration. That doesn''t seem to be the right term here, but there are 6 sets of mirrored disks striped together. I know that "smaller" sets of disks are preferred, but how small is small? I am wondering if I should break this into two sets of 6 disks. I do have a 13th disk available as a hot spare. Would it be available for either pool if I went with two? Finally, would I be better off with raidz2 or something else instead of the striped mirrored sets? Performance and fault tolerance are my highest priorities.Do you believe in coincidence? :-) I recently blogged about the reliability analysis using 12 disks as a representative sample. I didn''t add a hot spare for this analysis, but it would help in all cases. http://blog.richardelling.com/2010/02/zfs-data-protection-comparison.html For those disinclined to click, data retention when mirroring wins over raidz when looking at the problem from the perspective of number of drives available. Why? Because 5+1 raidz survives the loss of any disk, but 3 sets of 2-way mirrors can survive the loss of 3 disks, as long as 2 of those disks are not in the same set. The rest is just math. -- richard ZFS storage and performance consulting at http://www.RichardElling.com ZFS training on deduplication, NexentaStor, and NAS performance Las Vegas, April 29-30, 2010 http://nexenta-vegas.eventbrite.com
On Sat, Mar 20, 2010 at 1:35 PM, Richard Elling <richard.elling at gmail.com>wrote:> For those disinclined to click, data retention when mirroring wins over > raidz > when looking at the problem from the perspective of number of drives > available. Why? Because 5+1 raidz survives the loss of any disk, but 3 > sets > of 2-way mirrors can survive the loss of 3 disks, as long as 2 of those > disks > are not in the same set. The rest is just math. >The one dimension left out in your comparison is the portion of space that''s available for use vs. redundancy overhead. I''m sure you just never thought of it. ;-) For 12 disks using a 4-way mirror, you''d have 75% overhead but the best MTTDL. raidz3 is only 25% overhead, but provides a better MTTDL than 3-way mirrors (at 66% overhead). raidz2 (16% overhead) has better MTTDL than 2-way mirrors (at 50%). So clearly, if fault tolerance is the absolute most important factor, a really big mirror is best. This will also give very good read performance. I imagine a 12-way mirror would last a while (2.09E+57 years according to Richard''s formula) but it''s also at high cost. I think the only real route to follow is to determine how much space you need, and then optimize MTTDL and performance around that constraint. If you determine that you need 10 TB available, then (using 1.5T drives) you need to use at least 7 disks for data. That means a 12-disk raidz3 (13.5 TB), or 2x 6-disk raidz2 (12 TB). The raidz3 will have higher fault tolerance, but lower performance. -B -- Brandon High : bhigh at freaks.com -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100320/92b01017/attachment.html>
On Mar 20, 2010, at 10:12 PM, Brandon High wrote:> On Sat, Mar 20, 2010 at 1:35 PM, Richard Elling <richard.elling at gmail.com> wrote: > For those disinclined to click, data retention when mirroring wins over raidz > when looking at the problem from the perspective of number of drives > available. Why? Because 5+1 raidz survives the loss of any disk, but 3 sets > of 2-way mirrors can survive the loss of 3 disks, as long as 2 of those disks > are not in the same set. The rest is just math. > > The one dimension left out in your comparison is the portion of space that''s available for use vs. redundancy overhead. I''m sure you just never thought of it. ;-)There are two dimensions missing: space and performance.> For 12 disks using a 4-way mirror, you''d have 75% overhead but the best MTTDL. raidz3 is only 25% overhead, but provides a better MTTDL than 3-way mirrors (at 66% overhead). raidz2 (16% overhead) has better MTTDL than 2-way mirrors (at 50%).The "all-in" post puts all three on one chart, but in this case it is for 46 disks, not 12. http://blogs.sun.com/relling/entry/zfs_raid_recommendations_space_performance1> So clearly, if fault tolerance is the absolute most important factor, a really big mirror is best. This will also give very good read performance. I imagine a 12-way mirror would last a while (2.09E+57 years according to Richard''s formula) but it''s also at high cost. > > I think the only real route to follow is to determine how much space you need, and then optimize MTTDL and performance around that constraint. If you determine that you need 10 TB available, then (using 1.5T drives) you need to use at least 7 disks for data. That means a 12-disk raidz3 (13.5 TB), or 2x 6-disk raidz2 (12 TB). The raidz3 will have higher fault tolerance, but lower performance.Indeed. Space, performance, dependability: pick two -- richard ZFS storage and performance consulting at http://www.RichardElling.com ZFS training on deduplication, NexentaStor, and NAS performance Las Vegas, April 29-30, 2010 http://nexenta-vegas.eventbrite.com
Thank you to all who responded. This response in particular was very helpful and I think I will stick with my current zpool configuration (choice "a" if you''re reading below). I primarily host VMware virtual machines over NFS from this server''s predecessor and this server will be doing the same thing. I think the 6 x 2-way mirror configuration gives me the best mix of performance and fault tolerance. Regards, Chris Dunbar On Mar 19, 2010, at 5:44 PM, Erik Trimble wrote:> Chris Dunbar - Earthside, LLC wrote: > > Hello, > > > > After being immersed in this list and other ZFS sites for the past few weeks I am having some doubts about the zpool layout on my new server. It''s not too late to make a change so I thought I would ask for comments. My current plan to to have 12 x 1.5 TB disks in a what I would normally call a RAID 10 configuration. That doesn''t seem to be the right term here, but there are 6 sets of mirrored disks striped together. I know that "smaller" sets of disks are preferred, but how small is small? I am wondering if I should break this into two sets of 6 disks. I do have a 13th disk available as a hot spare. Would it be available for either pool if I went with two? Finally, would I be better off with raidz2 or something else instead of the striped mirrored sets? Performance and fault tolerance are my highest priorities. > > > > Thank you, > > Chris Dunbar > There''s not much benefit I can see to having two pools if both are using > the same configuration (i.e all mirrors or all raidz). There are reasons > to do so, but I don''t see that they would be of any real benefit for > what you describe. A Hot spare disk can be assigned to multiple pools > (often referred to as a "global" hot spare) > > Preferences for raidz[123] configs is to have 4-6 data disks in the vdev. > > Realistically speaking, you have several different (practical) > configurations possible, in order of general performance: > > (a) 6 x 2-way mirrors + 1 pool hot spare -> 9TB usable > (b) 4 x 3-ways mirrors + 1 pool hot spare -> 6TB usable > (c) 1 6-disk raidz + 1 7-disk raidz -> 16.5TB usable > (d) 2 6-disk raidz + 1 pool hot spare -> 15TB usable > (e) 1 6-disk raidz2 + 1 7-disk raidz2 -> 13.5TB usable > (f) 2 6-disk raidz2 + 1 pool hot spare -> 12TB usable > (g) 1 6-disk raidz3 + 1 7-disk raidz3 -> 10.5TB usable > (h) 1 13-disk raidz3 -> 15TB usable > > Given the size of your disks, resilvering is likely to have a > significant time problem in any RAIDZ[123] configuration. That is, > unless you are storing (almost exclusively) very large files, resilver > time is going to be significant, and can potentially be radically higher > than a mirrored config. > > The mirroring configs will out-perform raidz[123] on everything except > large streaming write/reads, and even then, it''s a toss-up. > > Overall, the (a), (d), and (f) configurations generally offer the best > balance of redundancy, space, and performance. > > Here''s the chances to survive disk failures (assuming hot spares are > unable to be used; that is, all disk failures happen in a short period > of time) - note that all three can always survive a single disk failure: > > (a) 90% for 2, 73% for 3, 49% for 4, 25% for 5. > (d) 55% for 2, 27% for 3, 0% for 4 or more > (f) 100% for 2, 80% for 3, 56% for 4, 0% for 5. > > > Depending on your exact requirements, I''d go with (a) or (f) as the best > choices - (a) if performance is more important, (f) if redundancy > overrides performance. > > -- > Erik Trimble > Java System Support > Mailstop: usca22-123 > Phone: x17195 > Santa Clara, CA > > eSoft SpamFilter Training Tool > Train as Spam > Blacklist for All Users > Whitelist for All Users-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100322/9d17c42f/attachment.html>