I want to setup a ZFS server with RAID-Z. Right now I have 3 disks. In 6 months, I want to add a 4th drive and still have everything under RAID-Z without a backup/wipe/restore scenario. Is this possible? I''ve used NetApps in the past (1996 even!) and they do it. I think they''re using RAID4. This message posted from opensolaris.org
> I want to setup a ZFS server with RAID-Z. Right now I have 3 disks. In 6 > months, I want to add a 4th drive and still have everything under RAID-Z > without a backup/wipe/restore scenario. Is this possible?You can add additional storage to the same pool effortlessly, such that the pool will be striped across two raidz:s. You cannot (AFAIK) expand the raidz itself. End result is 9 disks, with 7 disks worth of effective storage capacity. The ZFS administratiion guide contains examples of doing exactly this, except I believe the examples use mirrors. ZFS administration guide: http://opensolaris.org/os/community/zfs/docs/zfsadmin.pdf -- / Peter Schuller, InfiDyne Technologies HB PGP userID: 0xE9758B7D or ''Peter Schuller <peter.schuller at infidyne.com>'' Key retrieval: Send an E-Mail to getpgpkey at scode.org E-Mail: peter.schuller at infidyne.com Web: http://www.scode.org
> I want to setup a ZFS server with RAID-Z. Right now > I have 3 disks. In 6 months, I want to add a 4th > drive and still have everything under RAID-Z without > a backup/wipe/restore scenario. Is this possible?I am trying to figure out how to code this right now, as I see it being one of most needed and ignored features of ZFS. Unfortunately, there exists precious little documentation of how the stripes are laid out, so I find myself studying the code. In addition to having the ability to add/remove a data drive, I can see use cases for: * Adding/removing arbitrary numbers of parity drives. Raidz2 uses Reed-Solomon codes for the 2nd parity, which implies that there is no practical limit on the number of parity drives. * Maximizing the use of different disk sizes Allowing the stripe geometry to vary throughout the vdev would allow maximal use of space for different size devices, while preserving the desired fault tolerance. If such capabilities exist, you could start with a single disk vdev and grow it to consume a large disk farm with any number of parity drives, all while the system is fully available. This message posted from opensolaris.org
[i]* Maximizing the use of different disk sizes[/i] [i]If such capabilities exist, you could start with a single disk vdev and grow it to consume a large disk farm with any number of parity drives, all while the system is fully available.[/i] Now you''re just teasing me ;-) This message posted from opensolaris.org
I agree for non enterprise users the expansion of raidz vdevs is a critical missing feature. This message posted from opensolaris.org
> I agree for non enterprise users the expansion of > raidz vdevs is a critical missing feature.Now you''ve got me curious. I''m not trying to be inflammatory here, but how is online expansion a non-enterprise feature? From my perspective, enterprise users are the ones most likely to keep legacy filesystems for extended lengths of time, well past any rational usage plan. Enterprise users are also the ones most likely to need 24/7 availability. Any hacker-in-a-basement can take a storage pool offline to expand or contract it, while enterprise users lack this luxury. Experience taught me that enterprise users most need future flexibility and zero downtime. Again, I''m not arguing here, only interested in your contrasting viewpoint. This message posted from opensolaris.org
Hi Martin, Martin wrote:>> I agree for non enterprise users the expansion of >> raidz vdevs is a critical missing feature. >> > > Now you''ve got me curious. I''m not trying to be inflammatory here, but how is online expansion a non-enterprise feature? From my perspective, enterprise users are the ones most likely to keep legacy filesystems for extended lengths of time, well past any rational usage plan. Enterprise users are also the ones most likely to need 24/7 availability. Any hacker-in-a-basement can take a storage pool offline to expand or contract it, while enterprise users lack this luxury. >Not exactly. All users would lack the ability to expand a raidz dev (which in turn could require resilvering so it comes with lots of other Enterprise feature questions), but it''s possible now to expand a pool containing raidz devs-- and this is the more likely case with enterprise users: # ls -lh /var/tmp/fakedisk/ total 1229568 -rw------T 1 root root 100M Jan 9 20:22 disk1 -rw------T 1 root root 100M Jan 9 20:22 disk2 -rw------T 1 root root 100M Jan 9 20:22 disk3 -rw------T 1 root root 100M Jan 9 20:22 disk4 -rw------T 1 root root 100M Jan 9 20:22 disk5 -rw------T 1 root root 100M Jan 9 20:22 disk6 # zpool create test raidz /var/tmp/fakedisk/disk1 /var/tmp/fakedisk/disk2 /var/tmp/fakedisk/disk3 # zpool list test NAME SIZE USED AVAIL CAP HEALTH ALTROOT test 286M 155K 286M 0% ONLINE - # zpool add test raidz /var/tmp/fakedisk/disk4 /var/tmp/fakedisk/disk5 /var/tmp/fakedisk/disk6 # zpool list test NAME SIZE USED AVAIL CAP HEALTH ALTROOT test 572M 159K 572M 0% ONLINE - Otherwise, you''re absolutely correct. I think some enterprise users would probably like to have the ability to expand/contract even raidz groups. I''m sure it''s possible to implement this, and luckily ZFS was designed with the ability to add these features over the course of time. Still, it''s better to get ZFS out and in use sooner rather than later, right?> Experience taught me that enterprise users most need future flexibility and zero downtime. >With respect to expanding a pool based on raidz vdevs (and definitely with respect to expanding a filesystem), that''s available today, with the limitation that you can''t expand a raidz group itself. Regards, - Matt -- Matt Ingenthron - Web Infrastructure Solutions Architect Sun Microsystems, Inc. - Client Solutions, Systems Practice http://blogs.sun.com/mingenthron/ email: matt.ingenthron at sun.com Phone: 310-242-6439 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20070109/44086bcb/attachment.html>
Martin wrote:>> I agree for non enterprise users the expansion of >> raidz vdevs is a critical missing feature. >> > > Now you''ve got me curious. I''m not trying to be inflammatory here, but how is online expansion a non-enterprise feature? From my perspective, enterprise users are the ones most likely to keep legacy filesystems for extended lengths of time, well past any rational usage plan. Enterprise users are also the ones most likely to need 24/7 availability. Any hacker-in-a-basement can take a storage pool offline to expand or contract it, while enterprise users lack this luxury. > > Experience taught me that enterprise users most need future flexibility and zero downtime. > > Again, I''m not arguing here, only interested in your contrasting viewpoint. >I think the original poster, was thinking that non-enterprise users would be most interested in only having to *purchase* one drive at a time. Enterprise users aren''t likely to balk at purchasing 6-10 drives at a time, so for them adding an additional *new* RaidZ to stripe across is easier. Remember though that it''s been mathematically figured that the disadvantages to RaidZ start to show up after 9 or 10 drives. (That''s been posted in this list earlier. I don''t know that they''re great disadvantages - and probably even less to non-enterprise users, so I think this would be useful.) So Most enterprise users are going to go the ''new raidz'' route. -Kyle> > > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
[i]Enterprise feature questions), but it''s possible now to expand a pool containing raidz devs-- and this is the more likely case with enterprise users: <pre># ls -lh /var/tmp/fakedisk/ total 1229568 -rw------T 1 root root 100M Jan 9 20:22 disk1 -rw------T 1 root root 100M Jan 9 20:22 disk2 -rw------T 1 root root 100M Jan 9 20:22 disk3 -rw------T 1 root root 100M Jan 9 20:22 disk4 -rw------T 1 root root 100M Jan 9 20:22 disk5 -rw------T 1 root root 100M Jan 9 20:22 disk6 # zpool create test raidz /var/tmp/fakedisk/disk1 /var/tmp/fakedisk/disk2 /var/tmp/fakedisk/disk3 # zpool list test NAME SIZE USED AVAIL CAP HEALTH ALTROOT test 286M 155K 286M 0% ONLINE - # zpool add test raidz /var/tmp/fakedisk/disk4 /var/tmp/fakedisk/disk5 /var/tmp/fakedisk/disk6 # zpool list test NAME SIZE USED AVAIL CAP HEALTH ALTROOT test 572M 159K 572M 0% ONLINE - </pre>[/i] Does this mean I can expand a raidz pool? That I could take my 3 disk raidz and add a 4th disk into the pool? This message posted from opensolaris.org
[i]I think the original poster, was thinking that non-enterprise users would be most interested in only having to *purchase* one drive at a time. Enterprise users aren''t likely to balk at purchasing 6-10 drives at a time, so for them adding an additional *new* RaidZ to stripe across is easier. [/i] Yes. I have $xxx to spend on disks and can afford 3. As my needs increase, I''ll have saved enough to buy another disk. Traditionally, you RAID your disks together then use a VM to divvy it up into partitions that can grow/shrink as needed. The total size of the RAID isn''t important until you''ve filled it. Then you want to increase the RAID. You could just add new RAID chunks and have a VM on each chunk. But you''d be wasting some of your space. The incremental cost of the added space is the same as the original RAID. 3*n*R5=2n 4*n*R5=3n. Or doubling the disks: 6*n*R5=5n vs 3*n*R5 + 3*n*R5 = 2n + 2n = 4n (6 disks) or 3*n*R5 + 4*n*R5 = 2n + 3n = 5n (7 disks) The cost of scaling/loss of space is balanced against the cost of backup/wipe&reraid/restore. This message posted from opensolaris.org
Hello Kyle, Wednesday, January 10, 2007, 5:33:12 PM, you wrote: KM> Remember though that it''s been mathematically figured that the KM> disadvantages to RaidZ start to show up after 9 or 10 drives. (That''s Well, nothing like this was proved and definitely not mathematically. It''s just a common sense advise - for many users keeping raidz groups below 9 disks should give good enough performance. However if someone creates raidz group of 48 disks he/she probable expects also performance and in general raid-z wouldn''t offer one. -- Best regards, Robert mailto:rmilkowski at task.gda.pl http://milek.blogspot.com
Robert Milkowski wrote:> Hello Kyle, > > Wednesday, January 10, 2007, 5:33:12 PM, you wrote: > > KM> Remember though that it''s been mathematically figured that the > KM> disadvantages to RaidZ start to show up after 9 or 10 drives. (That''s > > Well, nothing like this was proved and definitely not mathematically. > > It''s just a common sense advise - for many users keeping raidz groups > below 9 disks should give good enough performance. However if someone > creates raidz group of 48 disks he/she probable expects also > performance and in general raid-z wouldn''t offer one. > > >It''s very possible I misstated something. :) I thought I had read though, something like over 9 or so disks would put mean that each FS block would be written to less than a single disk block on each disk? Or maybe it was that waiting to read from all drives for files less than a FS block would suffer? Ahhh... I can''t remember what the effect were thought to be. I thought there was some theoretical math involved though. I do remember people advising against it though. Not just on a performance basis, but also on a increased risk of failure basis. I think it was just seen as a good balancing point. -Kyle
Hi Kyle, I think there was a lot of talk about this behavior on the RAIDZ2 vs. RAID-10 thread. My understanding from that discussion was that every write stripes the block across all disks on a RAIDZ/Z2 group, thereby making writing the group no faster than writing to a single disk. However reads are much faster, as all the disk are activated in the read process. The default config on the X4500 we received recently was RAIDZ-groups of 6 disks (across the 6 controllers) striped together into one large zpool. Best Regards, Jason On 1/10/07, Kyle McDonald <Kyle.McDonald at bigbandnet.com> wrote:> Robert Milkowski wrote: > > Hello Kyle, > > > > Wednesday, January 10, 2007, 5:33:12 PM, you wrote: > > > > KM> Remember though that it''s been mathematically figured that the > > KM> disadvantages to RaidZ start to show up after 9 or 10 drives. (That''s > > > > Well, nothing like this was proved and definitely not mathematically. > > > > It''s just a common sense advise - for many users keeping raidz groups > > below 9 disks should give good enough performance. However if someone > > creates raidz group of 48 disks he/she probable expects also > > performance and in general raid-z wouldn''t offer one. > > > > > > > It''s very possible I misstated something. :) > > I thought I had read though, something like over 9 or so disks would put > mean that each FS block would be written to less than a single disk > block on each disk? > > Or maybe it was that waiting to read from all drives for files less than > a FS block would suffer? > > Ahhh... I can''t remember what the effect were thought to be. I thought > there was some theoretical math involved though. > > I do remember people advising against it though. Not just on a > performance basis, but also on a increased risk of failure basis. I > think it was just seen as a good balancing point. > > -Kyle > > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
Hello Jason, Wednesday, January 10, 2007, 10:54:29 PM, you wrote: JJWW> Hi Kyle, JJWW> I think there was a lot of talk about this behavior on the RAIDZ2 vs. JJWW> RAID-10 thread. My understanding from that discussion was that every JJWW> write stripes the block across all disks on a RAIDZ/Z2 group, thereby JJWW> making writing the group no faster than writing to a single disk. JJWW> However reads are much faster, as all the disk are activated in the JJWW> read process. The opposite actually. Because of COW, writing (modifying as well) will give you up-to N-1 disks performance for raid-z1 and N-2 disks performance for raid-z2. Howeer reading can be slow in case of many small random reads as to read each fs block you''ve got to wait for all data disks in a group. JJWW> The default config on the X4500 we received recently was RAIDZ-groups JJWW> of 6 disks (across the 6 controllers) striped together into one large JJWW> zpool. However the problem with that config is lack of hot-spare. Of course it depends what you want (and there was no hot spare support in U2 which is os installed in factory so far). -- Best regards, Robert mailto:rmilkowski at task.gda.pl http://milek.blogspot.com
Wade.Stuart at fallon.com
2007-Jan-10 23:30 UTC
[zfs-discuss] Re: Adding disk to a RAID-Z?
zfs-discuss-bounces at opensolaris.org wrote on 01/10/2007 05:16:33 PM:> Hello Jason, > > Wednesday, January 10, 2007, 10:54:29 PM, you wrote: > > JJWW> Hi Kyle, > > JJWW> I think there was a lot of talk about this behavior on the RAIDZ2vs.> JJWW> RAID-10 thread. My understanding from that discussion was thatevery> JJWW> write stripes the block across all disks on a RAIDZ/Z2 group,thereby> JJWW> making writing the group no faster than writing to a single disk. > JJWW> However reads are much faster, as all the disk are activated in the > JJWW> read process. > > The opposite actually. Because of COW, writing (modifying as well) > will give you up-to N-1 disks performance for raid-z1 and N-2 disks > performance for > raid-z2. Howeer reading can be slow in case of many small random reads > as to read each fs block you''ve got to wait for all data disks in a > group. > > > JJWW> The default config on the X4500 we received recently wasRAIDZ-groups> JJWW> of 6 disks (across the 6 controllers) striped together into onelarge> JJWW> zpool. > > However the problem with that config is lack of hot-spare. > Of course it depends what you want (and there was no hot spare support > in U2 which is os installed in factory so far). >Yeah, this kinda ticked me off, first thing I notice is that the thumper that was on back order for 3 months waiting for U3 fixes was shipped with U2 + patches. Called support to try to track down if U3 base was installable with/without patches and spent 3 days of off and on calling to get to someone who could find the info (sun''s internal documentation was locked down and unpublished to support at the time). 5 out of 6 support engineers I talked to did not even realize that U3 was released (three weeks after the fact). It also took 4 (long) calls to clarify that it did infact need 220 power (at the time I ordered it was listed as 110, and it shipped with 110 rated cables). Long story short, I wiped and reinstalled with U3 and raidz2 with hostspares like it should have had in the first place. -Wade
Hi Robert, I read the following section from http://blogs.sun.com/roch/entry/when_to_and_not_to as indicating random writes to a RAID-Z had the performance of a single disk regardless of the group size:>Effectively, as a first approximation, an N-disk RAID-Z group will >behave as a single device in terms of delivered random input >IOPS. Thus a 10-disk group of devices each capable of 200-IOPS, will >globally act as a 200-IOPS capable RAID-Z group.Best Regards, Jason On 1/10/07, Robert Milkowski <rmilkowski at task.gda.pl> wrote:> Hello Jason, > > Wednesday, January 10, 2007, 10:54:29 PM, you wrote: > > JJWW> Hi Kyle, > > JJWW> I think there was a lot of talk about this behavior on the RAIDZ2 vs. > JJWW> RAID-10 thread. My understanding from that discussion was that every > JJWW> write stripes the block across all disks on a RAIDZ/Z2 group, thereby > JJWW> making writing the group no faster than writing to a single disk. > JJWW> However reads are much faster, as all the disk are activated in the > JJWW> read process. > > The opposite actually. Because of COW, writing (modifying as well) > will give you up-to N-1 disks performance for raid-z1 and N-2 disks performance for > raid-z2. Howeer reading can be slow in case of many small random reads > as to read each fs block you''ve got to wait for all data disks in a > group. > > > JJWW> The default config on the X4500 we received recently was RAIDZ-groups > JJWW> of 6 disks (across the 6 controllers) striped together into one large > JJWW> zpool. > > However the problem with that config is lack of hot-spare. > Of course it depends what you want (and there was no hot spare support > in U2 which is os installed in factory so far). > > > -- > Best regards, > Robert mailto:rmilkowski at task.gda.pl > http://milek.blogspot.com > >
Hello Jason, Thursday, January 11, 2007, 12:46:32 AM, you wrote: JJWW> Hi Robert, JJWW> I read the following section from JJWW> http://blogs.sun.com/roch/entry/when_to_and_not_to as indicating JJWW> random writes to a RAID-Z had the performance of a single disk JJWW> regardless of the group size:>>Effectively, as a first approximation, an N-disk RAID-Z group will >>behave as a single device in terms of delivered random input >>IOPS. Thus a 10-disk group of devices each capable of 200-IOPS, will >>globally act as a 200-IOPS capable RAID-Z group."random input IOPS" means random reads not writes. -- Best regards, Robert mailto:rmilkowski at task.gda.pl http://milek.blogspot.com
Hello Wade, Thursday, January 11, 2007, 12:30:40 AM, you wrote: WSfc> Long story short, I wiped and reinstalled with U3 and raidz2 with WSfc> hostspares like it should have had in the first place. The same here. Besides I always put "my own" system and I''m not using preinstalled ones - except when x4500s arrive I run small script (dd + scrubbing) for 2-3 days to see if everything works fine before putting into production. Then I re-install. -- Best regards, Robert mailto:rmilkowski at task.gda.pl http://milek.blogspot.com
> It''s just a common sense advise - for many users keeping raidz groups > below 9 disks should give good enough performance. However if someone > creates raidz group of 48 disks he/she probable expects also > performance and in general raid-z wouldn''t offer one.There is at least one reason for wanting more drives in the same raidz/raid5/etc: redundancy. Suppose you have 18 drives. Having two raidz:s constisting of 9 drives is going to mean you are more likaly to fail than having a single raidz2 consisting of 18 drives, since in the former case yes - two drives can go down, but only if they are the *right* two drives. In the latter case any two drives can go down. The ZFS administration guide mentions this recommendation, but does not give any hint as to why. A reader may assume/believe it''s just general adviced, based on someone''s opinion that with more than 9 drives, the statistical probability of failure is too high for raidz (or raid5). It''s a shame the statement in the guide is not further qualified to actually explain that there is a concrete issue at play. (I haven''t looked into the archives to find the previously mentioned discussion.) -- / Peter Schuller, InfiDyne Technologies HB PGP userID: 0xE9758B7D or ''Peter Schuller <peter.schuller at infidyne.com>'' Key retrieval: Send an E-Mail to getpgpkey at scode.org E-Mail: peter.schuller at infidyne.com Web: http://www.scode.org
Hello Peter, Thursday, January 11, 2007, 1:08:38 AM, you wrote:>> It''s just a common sense advise - for many users keeping raidz groups >> below 9 disks should give good enough performance. However if someone >> creates raidz group of 48 disks he/she probable expects also >> performance and in general raid-z wouldn''t offer one.PS> There is at least one reason for wanting more drives in the same PS> raidz/raid5/etc: redundancy. PS> Suppose you have 18 drives. Having two raidz:s constisting of 9 drives is PS> going to mean you are more likaly to fail than having a single raidz2 PS> consisting of 18 drives, since in the former case yes - two drives can go PS> down, but only if they are the *right* two drives. In the latter case any two PS> drives can go down. PS> The ZFS administration guide mentions this recommendation, but does not give PS> any hint as to why. A reader may assume/believe it''s just general adviced, PS> based on someone''s opinion that with more than 9 drives, the statistical PS> probability of failure is too high for raidz (or raid5). It''s a shame the PS> statement in the guide is not further qualified to actually explain that PS> there is a concrete issue at play. I don''t know if ZFS MAN pages should teach people about RAID. If somebody doesn''t understand RAID basics then some kind of tool where you just specify pool of disk and have to choose from: space efficient, performance, non-redundant and that''s it - all the rest will be hidden. -- Best regards, Robert mailto:rmilkowski at task.gda.pl http://milek.blogspot.com
> Hello Kyle, > > Wednesday, January 10, 2007, 5:33:12 PM, you wrote: > > KM> Remember though that it''s been mathematically > figured that the > KM> disadvantages to RaidZ start to show up after 9 > or 10 drives. (That''s > > Well, nothing like this was proved and definitely not > mathematically. > > It''s just a common sense advise - for many users > keeping raidz groups > below 9 disks should give good enough performance. > However if someone > creates raidz group of 48 disks he/she probable > expects also > performance and in general raid-z wouldn''t offer one.Wow, lots of good discussion here. I started the idea of allowing a RAIDZ group to grow to arbitrary drives because I was unaware of the downsides to massive pools. From my RAID5 experience, a perfect world would be large numbers of data spindles and a sufficient number of parity spindles, e.g. 99+17 (99 data drives and 17 parity drives). In RAID5 this would give massive iops and redundancy. After studying the code and reading the blogs, a few things have jumped out, with some interesting (and sometimes goofy) implications. Since I am still learning, I could be wrong on any of the following. RAIDZ pools operate with a storage granularity of one stripe. If you request a read of a block within the stripe, you get the whole stripe. If you modify a block within the stripe, the whole stripe is written to a different location (ala COW). This implies that ANY read will require the whole stripe, therefore all spindles to seek and read a sector. All drives will return the sectors (mostly) simultaneously. For performance purposes, a RAIDZ pool seeks like a single drive would and has the throughput of multiple drives. Unlike traditional RAID5, adding more spindles does NOT increase read IOPS. Another implication is ZFS checksums the stripe, not the component sectors. If a drive silently returns a bad sector, ZFS only knows is that the whole stripe is bad (which could probably also be inferred from a bogus parity sector). ZFS has no clue which drive produced bad data, only that the whole stripe failed the checksum. ZFS finds the offending sector by process of elimination: going through the sectors one at a time, throwing away the data actually read, reconstructing the data from the parity then determining if the stripe passes the checksum. Two parity drives make this a bigger problem still, almost squaring the number of computations needed. If a stripe has enough parity drives, then the cost of determining N bad data sectors in a stripe is roughly O(k^N), where k is some constant. Another implication is that there is no RAID5 "write penalty." More accurately, the write penalty is incurred during the read operation where an entire stripe is read. Finally, there is no need to rotate parity. Rotating parity was introduced in RAID5 because every write of a single sector in a stripe also necessitated the read and subsequent write of the parity sector. Since there are no partial stripe writes in ZFS, there is no need to read then write the parity sector. For those in the know, where I am off base here? Thanks! Marty This message posted from opensolaris.org
Robert Milkowski wrote:> I don''t know if ZFS MAN pages should teach people about RAID. > > If somebody doesn''t understand RAID basics then some kind of tool > where you just specify pool of disk and have to choose from: space > efficient, performance, non-redundant and that''s it - all the rest > will be hidden. >Actually, this would be really nice to put into some sort of a low-level CLI tool, ala "mdadm" for linux. That is, a nice little tool that presents you with a list of drives, allows you to select which ones you''d like to put into a ZFS pool, then lets you select from a couple of different options based on performance/space/redundancy. It would also prompt for answers to some of the common options. Figure a nice little perl/python script, or even a Borne-shell script would be sufficient. Target audience would be non-sysadmin folks, plus entry-level admins. Also, on another note: adding drives to a RAIDZ[2] isn''t that important for enterprise folks with massive disk arrays, since creating new pools (or adding to a stripe RAIDZs) is the most likely action when acquiring new disk space. However, adding to a RAIDZ is really, really, really common action in the mid-size and small-size business, as well as at the department level for enterprises. These people tend to have setups of a couple of dozen disks at best, but do occasionally either add single disks or rarely add a whole disk shelf. Migrating data for these folks is a royal pain (they generally don''t have the super-experienced staff, or their staff is severely overworked), so it would be really nice to provide them with this functionality. Given that the x64 stuff is now a huge part of Sun''s business, and that we''re selling large chunks of them to mid-size companies or large companies for department/remote office use, we should definitely consider this target audience as at least equal to our traditional enterprise market in terms of feature priority. :-) -Erik
> PS> The ZFS administration guide mentions this recommendation, but does not > give PS> any hint as to why. A reader may assume/believe it''s just general > adviced, PS> based on someone''s opinion that with more than 9 drives, the > statistical PS> probability of failure is too high for raidz (or raid5). > It''s a shame the PS> statement in the guide is not further qualified to > actually explain that PS> there is a concrete issue at play. > > I don''t know if ZFS MAN pages should teach people about RAID. > > If somebody doesn''t understand RAID basics then some kind of tool > where you just specify pool of disk and have to choose from: space > efficient, performance, non-redundant and that''s it - all the rest > will be hidden.But the guide *does* make a recommendation, but does not qualify it. And if there is a problem specific to ZFS that is NOT just obvious results of some general principle, that''s very relevant for the ZFS administration guide IMO (and man pages for that matter). -- / Peter Schuller, InfiDyne Technologies HB PGP userID: 0xE9758B7D or ''Peter Schuller <peter.schuller at infidyne.com>'' Key retrieval: Send an E-Mail to getpgpkey at scode.org E-Mail: peter.schuller at infidyne.com Web: http://www.scode.org
Hi Peter, I think you must be referring to this section in the ZFS admin guide: http://docs.sun.com/app/docs/doc/819-5461/6n7ht6qrr?a=view If you are creating a RAID-Z configuration with many disks, as in this example, a RAID-Z configuration with 14 disks is better split into a two 7-disk groupings. RAID-Z configurations with single-digit groupings of disks should perform better. This is a general recommendation about performance of RAID and not a comment about disk failure probabilities. If this isn''t the text that is causing you grief, please let me know and I''ll fix that one. Maintaining a balance in the admin guide between providing ZFS features and examples and a kitchen sink of everything you wanted to know or do with ZFS is a difficult task. :-) I hope to include more links to blogs and our developing ZFS best practices site in the Admin Guide to provide more practical recommendations based on what you want to do with ZFS. http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide Cindy Peter Schuller wrote:>>PS> The ZFS administration guide mentions this recommendation, but does not >>give PS> any hint as to why. A reader may assume/believe it''s just general >>adviced, PS> based on someone''s opinion that with more than 9 drives, the >>statistical PS> probability of failure is too high for raidz (or raid5). >>It''s a shame the PS> statement in the guide is not further qualified to >>actually explain that PS> there is a concrete issue at play. >> >>I don''t know if ZFS MAN pages should teach people about RAID. >> >>If somebody doesn''t understand RAID basics then some kind of tool >>where you just specify pool of disk and have to choose from: space >>efficient, performance, non-redundant and that''s it - all the rest >>will be hidden. > > > But the guide *does* make a recommendation, but does not qualify it. And if > there is a problem specific to ZFS that is NOT just obvious results of some > general principle, that''s very relevant for the ZFS administration guide IMO > (and man pages for that matter). >