Edward Ned Harvey
2010-Oct-20  13:03 UTC
[zfs-discuss] Myth? 21 disk raidz3: "Don''t put more than ___ disks in a vdev"
In a discussion a few weeks back, it was mentioned that the Best Practices Guide says something like "Don''t put more than ___ disks into a single vdev." At first, I challenged this idea, because I see no reason why a 21-disk raidz3 would be bad. It seems like a good thing. I was operating on assumption that resilver time was limited by sustainable throughput of disks, which was wrong. At present, resilver time is limited by random IO, so the ZFS resilver time is typically much longer than it would be if you were resilvering the whole disk serially. But that was the only negative against 21-disk raidz3. That was the only negative, against using more than ___ disks in a single vdev. Assuming this one problem is improved at some point, is there any other reason to stay below ___ disks in a vdev? Does the random IO resilver performance problem also apply to scrub or zfs send? Again the problem is: resilver is done in effectively random order, so the disks perform zillions of random seeks instead of serializing IO and minimizing seeks during resilver. Is the same thing true for scrub or zfs send?
Darren J Moffat
2010-Oct-20  13:48 UTC
[zfs-discuss] Myth? 21 disk raidz3: "Don''t put more than ___ disks in a vdev"
On 20/10/2010 14:03, Edward Ned Harvey wrote:> In a discussion a few weeks back, it was mentioned that the Best Practices > Guide says something like "Don''t put more than ___ disks into a single > vdev." At first, I challenged this idea, because I see no reason why a > 21-disk raidz3 would be bad. It seems like a good thing.If you have those 21 disks spread across 3 top level vdevs each of raidz3 with 7 disks then ZFS can will stripe across 3 vdevs rather than than 1. Here is an example from the Sun ZFS Storage Appliance GUI: Each O is a score out of 5 ---------------------------------------------------------------------- AVAIL PERF CAPACITY Double parity RAID OOOO_ OOO__ OOOO_ 1.45T Mirrored OOOO_ OOOOO O____ 808G Single partiy RAID, narrow stripes OOO__ OOOO_ OO___ 1.18T Striped _____ OOOOO OOOOO 1.84T Triple mirrored OOOO_ OOOOO _____ 538G Triple parity RAID, wide stripes OOOO_ OO___ OOOOO 1.31T ---------------------------------------------------------------------- -- Darren J Moffat
Richard Elling
2010-Oct-20  13:57 UTC
[zfs-discuss] Myth? 21 disk raidz3: "Don''t put more than ___ disks in a vdev"
On Oct 20, 2010, at 6:03 AM, Edward Ned Harvey wrote:> In a discussion a few weeks back, it was mentioned that the Best Practices > Guide says something like "Don''t put more than ___ disks into a single > vdev." At first, I challenged this idea, because I see no reason why a > 21-disk raidz3 would be bad. It seems like a good thing.It is a choice. The reason we have the "best practices" guide is because the implications of such choices are not always immediately obvious. Anecdote: when the X4500 was first released, people built 46-disk wide raidz1 pools. This is not a good idea for most cases. Hence the man page and ZFS best practices guide recommendations to limit the number of disks in a set.> I was operating on assumption that resilver time was limited by sustainable > throughput of disks, which was wrong.It is limited by the random write capacity of the resilvering disk.> At present, resilver time is limited > by random IO, so the ZFS resilver time is typically much longer than it > would be if you were resilvering the whole disk serially.Resilver is also throttled.> But that was the only negative against 21-disk raidz3.Untrue. The performance of a 21-disk raidz3 will be nowhere near the performance of a 20 disk 2-way mirrror.> That was the only > negative, against using more than ___ disks in a single vdev. Assuming this > one problem is improved at some point, is there any other reason to stay > below ___ disks in a vdev?Taking this to a limit, would you say a 1,000 disk raidz3 set is a good thing? 10,000 disks?> Does the random IO resilver performance problem also apply to scrub or zfs > send? Again the problem is: resilver is done in effectively random order, > so the disks perform zillions of random seeks instead of serializing IO and > minimizing seeks during resilver. Is the same thing true for scrub or zfs > send?To recap: + The data will be read from where it was written. + Resilver is throttled. + Resilver performance is typically bound by the random write capability of the resilvering disk. + Resilvering is done in temporal order. + If too many disks fail during resilver, the resulting zpool can still be on-disk consistent. + ZFS is open source, feel free to modify and share your ideas for improvement. -- richard -- OpenStorage Summit, October 25-27, Palo Alto, CA http://nexenta-summit2010.eventbrite.com USENIX LISA ''10 Conference, November 7-12, San Jose, CA ZFS and performance consulting http://www.RichardElling.com
Phil Harman
2010-Oct-20  14:22 UTC
[zfs-discuss] Myth? 21 disk raidz3: "Don''t put more than ___ disks in a vdev"
On 20/10/2010 14:48, Darren J Moffat wrote:> On 20/10/2010 14:03, Edward Ned Harvey wrote: >> In a discussion a few weeks back, it was mentioned that the Best >> Practices >> Guide says something like "Don''t put more than ___ disks into a single >> vdev." At first, I challenged this idea, because I see no reason why a >> 21-disk raidz3 would be bad. It seems like a good thing. > > If you have those 21 disks spread across 3 top level vdevs each of > raidz3 with 7 disks then ZFS can will stripe across 3 vdevs rather > than than 1. > > Here is an example from the Sun ZFS Storage Appliance GUI: > > Each O is a score out of 5 > ---------------------------------------------------------------------- > AVAIL PERF CAPACITY > Double parity RAID OOOO_ OOO__ OOOO_ 1.45T > Mirrored OOOO_ OOOOO O____ 808G > Single partiy RAID, narrow stripes OOO__ OOOO_ OO___ 1.18T > Striped _____ OOOOO OOOOO 1.84T > Triple mirrored OOOO_ OOOOO _____ 538G > Triple parity RAID, wide stripes OOOO_ OO___ OOOOO 1.31T > > ----------------------------------------------------------------------Yes, that''s all rather simplistic, isn''t it?! Does it use a sinusoidal function when plotting the O''s (e.g. 1.31T scores more than 1.45T)? ;) Does the AVAIL score takes into account the size of the stripe, the time taken to resilver, controller topology, etc? The PERF score is utterly meaningless without reference to a workload (e.g. read vs write, random vs sequential, big vs small, uniform vs non-uniform, etc) and it''s all without reference to SSDs.
Marty Scholes
2010-Oct-20  19:03 UTC
[zfs-discuss] Myth? 21 disk raidz3: "Don''t put more than ___ disks in a vdev"
Richard wrote:> > Untrue. The performance of a 21-disk raidz3 will be nowhere near the > performance of a 20 disk 2-way mirrror.You know this stuff better than I do. Assuming no bus/cpu bottlenecks, a 21 disk raidz3 should provide sequential throughput of 18 disks and random throughput of 1 disk. A 20 disk 2-way mirror should provide sequential read throughput of (at best) 20 disks, sequential write throughput of (at best) 10 disks, random read throughput of between 2 and 20 disks and random write throughput of between 1 and 10 disks. At one extreme, mirrors are marginally better and at the other extreme mirrors are 10x the write and 20x the read performance. That''s a wide range.> Taking this to a limit, would you say a 1,000 disk > raidz3 set is a good thing? > 10,000 disks?I don''t know, maybe. Even If we accept that there is some magic X where stripes wider than X are bad, what is that X and how do we determine it? Likely, it depends on the several factors, including r/w iops (both of which can be mitigated by L2ARC and SLOG) and resilver times. If seek time was a non-issue (flash?) then there is no real case for mirrors. Mirrors can, if the data is laid out perfectly, provide sequential throughput which grows linearly with the vdev count. RAIDZN always will provide sequential throughput which grows linearly with the stripe width. Therefore, with low access time and low throughput storage (flash?), RAIDZN with very wide stripes makes an awful lot of sense.> FS is open source, feel free to modify and share your > ideas for improvement.And that''s what we are doing here: sharing ideas. -- This message posted from opensolaris.org
Bob Friesenhahn
2010-Oct-21  00:39 UTC
[zfs-discuss] Myth? 21 disk raidz3: "Don''t put more than ___ disks in a vdev"
On Wed, 20 Oct 2010, Marty Scholes wrote:>> >> Untrue. The performance of a 21-disk raidz3 will be nowhere near the >> performance of a 20 disk 2-way mirrror. > > You know this stuff better than I do. Assuming no bus/cpu > bottlenecks, a 21 disk raidz3 should provide sequential throughput > of 18 disks and random throughput of 1 disk.I have some swampland in Florida that you can drain and put a condo on. If you wait long enough it may become beachfront property. :-) Before zfs moves on to the next block, it needs to write data to all the disks and this means that the disks need to line up to the required position and deposit their load. What you say might be true in a perfect world. In the real world disks don''t perform like perfect little soldiers and get in mis-step and so the actual performance is much less than what is theoretically possible. An extremely well designed and tuned array could support more disks per vdev. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/