The zfs best practices page (and all the experts in general) talk about MTTDL and raidz2 is better than raidz and so on. Has anyone here ever actually experienced data loss in a raidz that has a hot spare? Of course, I mean from disk failure, not from bugs or admin error, etc. -frank
If you are asking if anyone has experienced two drive failures simultaneously? The answer is yes. It has happened to me (at home) and to one client, at least that I can remember. In both cases, I was able to dd off one of the failed disks (with just bad sectors or less bad sectors) and reconstruct the raid 5 (force it online) to then copy data off the raid onto new drives. Personally, I think mirroring is safer (and 3 way mirroring) than raidz/z2/5. All my "boot from zfs" systems have 3 way mirrors root/usr/var disks (using 9 disks) but all my data partitions are 2 way mirrors (usually 8 disks or more and a spare.) -- This message posted from opensolaris.org
Yes, a coworker lost a second disk during a rebuild of a raid5 and lost all data. I have not had a failure, however when migrating EqualLogic arrays in and out of pools, I lost a disk on an array. No data loss, but it concerns me because during the moves, you are essentially reading and writing all of the data on the disk. Did I have a latent problem on that particular disk that only exposed itself when doing such a large read/write? What if another disk had failed, and during the rebuild this latent problem was exposed? Trouble, trouble. They say security is an onion. So is data protection. Scott -- This message posted from opensolaris.org
Anyone who''s lost data this way: were you doing weekly scrubs, or did you find out about the simultaneous failures after not touching the bits for months? mike -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20091221/7d43cffa/attachment.html>
Hey James,> Personally, I think mirroring is safer (and 3 way mirroring) than raidz/z2/5. All my "boot from zfs" systems have 3 way mirrors root/usr/var disks (using 9 disks) but all my data partitions are 2 way mirrors (usually 8 disks or more and a spare.)Double-parity (or triple-parity) RAID are certainly more resilient against some failure modes than 2-way mirroring. For example, bit errors can arise at a certain rate from disks. In the case of a disk failure in a mirror, it''s possible to encounter a bit error such that data is lost. I recently wrote an article for ACM Queue that examines recent trends in hard drives and makes the case for triple-parity RAID. It''s at least peripherally relevant to this conversation: http://blogs.sun.com/ahl/entry/acm_triple_parity_raid Adam -- Adam Leventhal, Fishworks http://blogs.sun.com/ahl
On Dec 21, 2009, at 4:09 PM, Michael Herf <mbherf at gmail.com> wrote:> Anyone who''s lost data this way: were you doing weekly scrubs, or > did you find out about the simultaneous failures after not touching > the bits for months?Scrubbing on a routine basis is good for detecting problems early, but it doesn''t solve the problem of a double failure during resilver. As the size of disks become huge the chance of a double failure during resilvering increases to the point of real possibility. Due to the amount of data, the bit error rates of the medium and the prolonged stress of resilvering these monsters. For up to 1TB drives use nothing less than raidz2. For 1TB+ drives use raidz3. Avoid raidz vdevs larger than 7 drives, better to have multiple vdevs both for performance and reliability. With 24 2.5" drive enclosures you can easily create 3 7 drive raidz3s or 4 5 drive raidz2s with a spare for each vdev, or 2 spares and 1-2 SSD drives. Both options give 12/24 usable disk space. 4 raidz2s give more performance, 3 raidz3s gives more reliability. -Ross
> On Dec 21, 2009, at 4:09 PM, Michael Herf > <mbherf at gmail.com> wrote: > > > Anyone who''s lost data this way: were you doing > weekly scrubs, or > > did you find out about the simultaneous failures > after not touching > > the bits for months? > > Scrubbing on a routine basis is good for detecting > problems early, but > it doesn''t solve the problem of a double failure > during resilver. As > the size of disks become huge the chance of a double > failure during > resilvering increases to the point of real > possibility. Due to the > amount of data, the bit error rates of the medium and > the prolonged > stress of resilvering these monsters. > > For up to 1TB drives use nothing less than raidz2. > For 1TB+ drives use > raidz3. Avoid raidz vdevs larger than 7 drives, > better to have > multiple vdevs both for performance and reliability. > > With 24 2.5" drive enclosures you can easily create 3 > 7 drive raidz3s > or 4 5 drive raidz2s with a spare for each vdev, or 2 > spares and 1-2 > SSD drives. Both options give 12/24 usable disk > space. 4 raidz2s give > more performance, 3 raidz3s gives more reliability. > > -Ross >Hi Ross, What about old good raid10? It''s a pretty reasonable choice for heavy loaded storages, isn''t it? I remember when I migrated raidz2 to 8xdrives raid10 the application administrators were just really happy with the new access speed. (we didn''t use stripped raidz2 though as you are suggesting). -- Roman -- This message posted from opensolaris.org
> On Dec 21, 2009, at 4:09 PM, Michael Herf > <mbherf at gmail.com> wrote: > > > Anyone who''s lost data this way: were you doing > weekly scrubs, or > > did you find out about the simultaneous failures > after not touching > > the bits for months? > > Scrubbing on a routine basis is good for detecting > problems early, but > it doesn''t solve the problem of a double failure > during resilver. As > the size of disks become huge the chance of a double > failure during > resilvering increases to the point of real > possibility. Due to the > amount of data, the bit error rates of the medium and > the prolonged > stress of resilvering these monsters. > > For up to 1TB drives use nothing less than raidz2. > For 1TB+ drives use > raidz3. Avoid raidz vdevs larger than 7 drives, > better to have > multiple vdevs both for performance and reliability. > > With 24 2.5" drive enclosures you can easily create 3 > 7 drive raidz3s > or 4 5 drive raidz2s with a spare for each vdev, or 2 > spares and 1-2 > SSD drives. Both options give 12/24 usable disk > space. 4 raidz2s give > more performance, 3 raidz3s gives more reliability. > > -Ross >Hi Ross, What about old good raid10? It''s a pretty reasonable choice for heavy loaded storages, isn''t it? I remember when I migrated raidz2 to 8xdrives raid10 the application administrators were just really happy with the new access speed. (we didn''t use stripped raidz2 though as you are suggesting). -- Roman -- This message posted from opensolaris.org
On Dec 21, 2009, at 11:56 PM, Roman Naumenko <roman at naumenko.ca> wrote:>> On Dec 21, 2009, at 4:09 PM, Michael Herf >> <mbherf at gmail.com> wrote: >> >>> Anyone who''s lost data this way: were you doing >> weekly scrubs, or >>> did you find out about the simultaneous failures >> after not touching >>> the bits for months? >> >> Scrubbing on a routine basis is good for detecting >> problems early, but >> it doesn''t solve the problem of a double failure >> during resilver. As >> the size of disks become huge the chance of a double >> failure during >> resilvering increases to the point of real >> possibility. Due to the >> amount of data, the bit error rates of the medium and >> the prolonged >> stress of resilvering these monsters. >> >> For up to 1TB drives use nothing less than raidz2. >> For 1TB+ drives use >> raidz3. Avoid raidz vdevs larger than 7 drives, >> better to have >> multiple vdevs both for performance and reliability. >> >> With 24 2.5" drive enclosures you can easily create 3 >> 7 drive raidz3s >> or 4 5 drive raidz2s with a spare for each vdev, or 2 >> spares and 1-2 >> SSD drives. Both options give 12/24 usable disk >> space. 4 raidz2s give >> more performance, 3 raidz3s gives more reliability. >> >> -Ross >> > > Hi Ross, > > What about old good raid10? It''s a pretty reasonable choice for > heavy loaded storages, isn''t it? > > I remember when I migrated raidz2 to 8xdrives raid10 the application > administrators were just really happy with the new access speed. (we > didn''t use stripped raidz2 though as you are suggesting).Raid10 provides excellent performance and if performance is a priority then I recommend it, but I was under the impression that resiliency was the priority, as raidz2/raidz3 provide greater resiliency for a sacrifice in performance. -Ross
> > Hi Ross, > > > > What about old good raid10? It''s a pretty > reasonable choice for > > heavy loaded storages, isn''t it? > > > > I remember when I migrated raidz2 to 8xdrives > raid10 the application > > administrators were just really happy with the new > access speed. (we > > didn''t use stripped raidz2 though as you are > suggesting). > > Raid10 provides excellent performance and if > performance is a priority > then I recommend it, but I was under the impression > that resiliency > was the priority, as raidz2/raidz3 provide greater > resiliency for a > sacrifice in performance.My experience is in line with Ross'' comments. There is no question that more independent vdevs will improve IOPS, e.g. RAID10 or even a pile of RAIDZ vdevs. I have been burnt too many times to let an array get critical (no redunancy). Never, ever, ever again. With a RAID1 or RAID10, one disk loss puts the whole pool critical, just one bad sector from disaster. One prays the hot spare can be built in time. With RAIDZ, the same is true. I think of triple (or even quad) mirroring the same way as I think of RAIDZ3: it''s like having prebuilt hot spares. I suspect that the IOPS problems of wide stripes are becoming mitigated by L2ARC/ZIL and that the trend will be toward wide stripes with ever higher parity counts. Sun''s recent storage offerings tend to confirm this trend: slower, cheaper and bigger SATA drives fronted by SSD L2ARC and ZIL. -- This message posted from opensolaris.org
On Tue, 22 Dec 2009, Ross Walker wrote:> > Raid10 provides excellent performance and if performance is a priority then I > recommend it, but I was under the impression that resiliency was the > priority, as raidz2/raidz3 provide greater resiliency for a sacrifice in > performance.Why are people talking about "RAID-5", RAID-6", and "RAID-10" on this list? This is the zfs-discuss list and zfs does not do "RAID-5", "RAID-6", or "RAID-10". Applying classic RAID terms to zfs is just plain wrong and misleading since zfs does not directly implement these classic RAID approaches even though it re-uses some of the algorithms for data recovery. Details do matter. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Bob Friesenhahn wrote:> Why are people talking about "RAID-5", RAID-6", and > "RAID-10" on this > list? This is the zfs-discuss list and zfs does not > do "RAID-5", > "RAID-6", or "RAID-10". > > Applying classic RAID terms to zfs is just plain > wrong and misleading > since zfs does not directly implement these classic > RAID approaches > even though it re-uses some of the algorithms for > data recovery. > Details do matter.That''s not entirely true, is it? * RAIDZ is RAID5 + checksum + COW * RAIDZ2 is RAID6 + checksum + COW * A stack of mirror vdevs is RAID10 + checksum + COW While there isn''t an actual one-to-one mapping, many traditional RAID concepts do seem to apply to ZFS discussions, don''t they? Marty -- This message posted from opensolaris.org
> On Tue, 22 Dec 2009, Ross Walker wrote: > Applying classic RAID terms to zfs is just plain > wrong and misleading since zfs does not directly implement these classic RAID approaches > even though it re-uses some of the algorithms for data recovery. > Details do matter. > > Bob > -- > Bob Friesenhahn > bfriesen at simple.dallas.tx.us,I wouldn''t agree. SUN introduced just another marketing names for the well known things, even adding some new functionality. raid6 is raid6, not matter how you name it: raidz2, raid-dp, raid-ADG or somehow else. Sounds nice, but it''s is just buzzwords. -- Roman roman at naumenko.ca -- This message posted from opensolaris.org
On Tue, 22 Dec 2009, Marty Scholes wrote:> > That''s not entirely true, is it? > * RAIDZ is RAID5 + checksum + COW > * RAIDZ2 is RAID6 + checksum + COW > * A stack of mirror vdevs is RAID10 + checksum + COWThese are layman''s simplifications that no one here should be comfortable with. Zfs borrows proven data recovery technologies from classic RAID but the data layout on disk is not classic RAID, or even close to it. Metadata and file data are handled differently. Metadata is always duplicated, with the most critical metadata being strewn across multiple disks. Even "mirror" disks are not really mirrors of each other. Earlier in this discussion thread someone claimed that if a raidz disk was lost that the pool was then just one data error away from total disaster, but that is not normally true due to the many other things that zfs does. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
On 22.12.09 18:42, Roman Naumenko wrote:>> On Tue, 22 Dec 2009, Ross Walker wrote: >> Applying classic RAID terms to zfs is just plain >> wrong and misleading since zfs does not directly implement these classic RAID approaches >> even though it re-uses some of the algorithms for data recovery. >> Details do matter. >> >> I wouldn''t agree. >> SUN introduced just another marketing names for the well known things, even adding some new functionality. >> >> raid6 is raid6, not matter how you name it: raidz2, raid-dp, raid-ADG or somehow else. >> Sounds nice, but it''s is just buzzwords. >> >> >>Sorry, but that isn''t correct. Or to be correct: It depends on your definition.... when you just consider RAID5 as "Stripeset with an interleaved Parity" then you may be right. But the differences of RAID5 to RAIDZ (and the same of for RAID6 to RAIDZ2) are vast enough to justify an own name. Just look at the different parity handling. Otherwise this would like denying diesel and and gasoline engines different names just because they are both internal combustion piston engines ...
On Tue, 22 Dec 2009, Roman Naumenko wrote:> > raid6 is raid6, not matter how you name it: raidz2, raid-dp, raid-ADG or somehow else. > Sounds nice, but it''s is just buzzwords.It is true that many vendors like to make their storage array seem special, but references to RAID6 when describing raidz2 are only used in order to help assist with your understanding. They are a form of analogy. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Interesting discussion. I know the bias here is generally toward enterprise users. I was wondering if the same recommendations hold for home users that are generally more price sensitive. I''m currently running OpenSolaris on a system with 12 drives. I had split them into 3 sets of 4 raidz1 arrays. This made some sense at the time as I can upgrade 4 disks at a time as new sizes come out. However, with 8 of the disks currently being 1.5TB, I''m getting concerned about this strategy. While important data is backed up, a loss of the server data would be very irritating. My next thought was to get more drives and run a single raidz3 vdev with 12x1.5TB. More space than I need for quite a while, since I can''t add just a few drives, triple parity for protection. I''d need a few extra drives to hold the data while I rebuild the main array, so I''d have cold-spares available that I would use for backing up critical data from the server. So they would see use and scrubs, not just sitting on the shelf. Access is over a gigE network, so I don''t need more performance than that. I have read that the overall speed of a vdev is approximately the speed of a single device in the vdev, and in this case that is more than fast enough. I''m curious what the experts here think of this new plan. I''m pretty sure I know what you all think of the old one. :) Do you recommend swapping spare drives into the array periodically? It seems like it wouldn''t really be any better than running scrub over the same period, but I''ve heard of people doing it on hardware raid controllers. -- This message posted from opensolaris.org
Bob Friesenhahn wrote:> On Tue, 22 Dec 2009, Marty Scholes wrote: > > > > That''s not entirely true, is it? > > * RAIDZ is RAID5 + checksum + COW > > * RAIDZ2 is RAID6 + checksum + COW > > * A stack of mirror vdevs is RAID10 + checksum + > COW > > These are layman''s simplifications that no one here > should be > comfortable with.Well, ok. They do seem to capture the essence of what the different flavors of ZFS protection do, but I''ll take you at your word. We do seem to be spinning off on a tangent, tho.> Zfs borrows proven data recovery technologies from > classic RAID but > the data layout on disk is not classic RAID, or even > close to it. > Metadata and file data are handled differently. > Metadata is always > uplicated, with the most critical metadata being > strewn across > multiple disks. Even "mirror" disks are not really > mirrors of each > other.I am having a little trouble reconciling the above statements, but again, ok. I haven''t read the official RAID spec, so again, I''ll take you at your word. Honestly, those seem like important nuances, but nuances nonetheless.> Earlier in this discussion thread someone claimed > that if a raidz disk > was lost that the pool was then just one data error > away from total disasterThat would be me. Let me substitute the phrase "user data loss in some way, shape or form which disrupts availability" for the words "total disaster." Honestly, I think we are splitting hairs here. Everyone agrees that RAIDZ takes RAID5 to a new level. -- This message posted from opensolaris.org
On 22-Dec-09, at 12:42 PM, Roman Naumenko wrote:>> On Tue, 22 Dec 2009, Ross Walker wrote: >> Applying classic RAID terms to zfs is just plain >> wrong and misleading since zfs does not directly implement these >> classic RAID approaches >> even though it re-uses some of the algorithms for data recovery. >> Details do matter. >> >> Bob >> -- >> Bob Friesenhahn >> bfriesen at simple.dallas.tx.us, > > I wouldn''t agree. > SUN introduced just another marketing names for the well known > things, even adding some new functionality. > > raid6 is raid6, not matter how you name it: raidz2, raid-dp, raid- > ADG or somehow else. > Sounds nice, but it''s is just buzzwords.The implied equivalence is wrong and confusing. That''s the kind of mislabelling that Bob was complaining about. --Toby> > -- > Roman > roman at naumenko.ca > -- > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
On Dec 22, 2009, at 11:49 AM, Toby Thain wrote:> On 22-Dec-09, at 12:42 PM, Roman Naumenko wrote: > >>> On Tue, 22 Dec 2009, Ross Walker wrote: >>> Applying classic RAID terms to zfs is just plain >>> wrong and misleading since zfs does not directly implement these >>> classic RAID approaches >>> even though it re-uses some of the algorithms for data recovery. >>> Details do matter. >>> >>> Bob >>> -- >>> Bob Friesenhahn >>> bfriesen at simple.dallas.tx.us, >> >> I wouldn''t agree. >> SUN introduced just another marketing names for the well known >> things, even adding some new functionality. >> >> raid6 is raid6, not matter how you name it: raidz2, raid-dp, raid- >> ADG or somehow else. >> Sounds nice, but it''s is just buzzwords. > > The implied equivalence is wrong and confusing. That''s the kind of > mislabelling that Bob was complaining about.Yes. Also note that the RAID levels have rather strict definitions. http://www.snia.org/education/dictionary/r IMHO the biggest difference is the dynamic nature of ZFS. For example, the definition of RAID-0 (data striping) is: A disk array data mapping technique in which fixed-length sequences of virtual disk data addresses are mapped to sequences of member disk addresses in a regular rotating pattern. ZFS implements dynamic striping, which is different in that the "fixed- length sequences" aren''t really fixed and the "regular rotating pattern" is biased towards allocations on devices which have more free space. The upshot is that the space available to a dynamic stripe is the sum of the space of the vdevs, whereas for RAID-0 it is N * min(vdev size). -- richard
ttabbal: If I understand correctly, raidz{1} is 1 drive protection and space is (drives - 1) available. Raidz2 is 2 drive protection and space is (drives - 2) etc. Same for raidz3 being 3 drive protection. Everything I''ve seen you should stay around 6-9 drives for raidz, so don''t do a raidz3 with 12 drives. Instead make two raidz3 with 6 drives each (which is (6-3)*1.5 * 2 = 9 TB array.) As for whether or not to do raidz, for me the issue is performance. I can''t handle the raidz write penalty. If I needed triple drive protection, a 3way mirror setup would be the only way I would go. I don''t yet quite understand why a 4+ drive raidz3 vdev is better than a 3 drive mirror vdev? Other than a 6 drive setup is 3 drives of space when a 6 drive setup using 3 way mirror is only 2 drive space. Adam Leventhal: If we can compare apples and oranges, would you same recommendation ("use raidz2 and/or raidz3") be the same when comparing to mirror with the same number of drives? In other words, a 2 drive mirror compares to raidz{1} the same as a 3 drive mirror compares to raidz2 and a 4 drive mirror compares to raidz3? If you were enterprise (in other words card about perf) why would you ever use raidz instead of throwing more drives at the problem and doing mirroring with identical parity? Joerg Moellenkamp: I do "consider RAID5 as ''Stripeset with an interleaved Parity''", so I don''t agree with the strong objection in this thread by many about the use of RAID5 to describe what raidz does. I don''t think many particularly care about the nuanced differences between hardware card RAID5 and raidz, other than knowing they would rather have raidz over RAID5. -- This message posted from opensolaris.org
On Tue, 22 Dec 2009, James Risner wrote:> I do "consider RAID5 as ''Stripeset with an interleaved Parity''", > so I don''t agree with the strong objection in this thread by many > about the use of RAID5 to describe what raidz does. I don''t think > many particularly care about the nuanced differences between > hardware card RAID5 and raidz, other than knowing they would rather > have raidz over RAID5.One of the "nuanced differences" is that raidz supports more data recovery mechanisms than RAID5 does since it redundantly stores its metadata and provides the option to redundantly store user data as well, in addition to what is provided by "RAID5". The COW mechanism also provides some measure of protection since if the corrupted data was recently written, a somewhat older version may still be available by rolling back a transaction group. Valid older data may also be available in a snapshot. It is not uncommon to see postings from people who report that their single-disk pool said that some data corruption was encountered, the problem was automatically corrected, and user data was not impacted. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
risner wrote:> If I understand correctly, raidz{1} is 1 drive > protection and space is (drives - 1) available. > Raidz2 is 2 drive protection and space is (drives - > 2) etc. Same for raidz3 being 3 drive protection.Yes.> Everything I''ve seen you should stay around 6-9 > drives for raidz, so don''t do a raidz3 with 12 > drives. Instead make two raidz3 with 6 drives each > (which is (6-3)*1.5 * 2 = 9 TB array.)>From what I can tell, this is purely a function of needed IOPS. Wider stripe = better storage/bandwidth utilization = less IOPS. For home usage I run a 14 drive RAIDZ3 array.> As for whether or not to do raidz, for me the > issue is performance. I can''t handle the raidz > write penalty.If there is a RAIDZ write penalty over mirroring, I am unaware of it. In fact, sequential writes are faster under RAIDZ.> If I needed triple drive protection, > a 3way mirror setup would be the only way I would > go.That will give high IOPS with 33% storage utilization and 33% bandwidth utilization. In other words, for every MB of data read/witten by an application, 3MB is read/written from/to the array and stored. Multiply all storage and bandwidth needs by three.> I don''t yet quite understand why a 3+ drive > raidz2 vdev is better than a 3 drive mirror vdev? > Other than a 5 drive setup is 3 drives of space > when a 6 drive setup using 3 way mirror is only 2 > drive space.Part of the question you answered yourself. The other part is that with a 6 drive RAIDZ3, I can lose ANY three drives and still be running. With three mirrors, I can lose the pool if the wrong two drives die. -- This message posted from opensolaris.org
> Everything I''ve seen you should stay around 6-9 > drives for raidz, so don''t do a raidz3 with 12 > drives. Instead make two raidz3 with 6 drives each > (which is (6-3)*1.5 * 2 = 9 TB array.)So the question becomes, why? If it''s performance, I can live with lower IOPS and max throughput. If it''s reliability, I''d like to hear why. I would think that the number of acceptable devices in a raidz would scale somewhat with the number of drives used for parity. So I would expect to see a sliding scale somewhat like the one mentioned before regarding disk size vs. raidz level. For example: 3-4 drives: raidz1 4-8 drives: raidz2 8+ drives: raidz3 In practice, I would expect to see some kind of chart with number of devices and size of devices used together to determine the proper raidz level. Perhaps I''m way off base though. Note that I don''t really have a problem doing 2 arrays, but I would think that perhaps raidz2 would be acceptable in that configuration. The benefit to that config for me would be that I could create a parallel array of 6 to copy my existing data to, then add the second array after the initial file copy/scrub. I would need fewer disks to complete the transition.> As for whether or not to do raidz, for me the > issue is performance. I can''t handle the raidz > write penalty. If I needed triple drive protection, > a 3way mirror setup would be the only way I would > go. I don''t yet quite understand why a 3+ drive > raidz2 vdev is better than a 3 drive mirror vdev? > Other than a 5 drive setup is 3 drives of space > when a 6 drive setup using 3 way mirror is only 2 > drive space.I''ve already stipulated that performance is not the primary concern. 100MB/sec with reasonable random I/O for a max of 5 clients is more than enough. My existing raidz is more than fast enough for my needs, and I have 5400RPM drives in there. I''d be very interested to hear an expert opinion on this. Given, say, 6 disks. What advantage in reliability, if any, would a raidz3 have vs. a striped pair of 3-way mirrors? Obviously the raidz3 has 1 disk worth of extra space, but we''re talking about reliability here. I would guess performance would be higher with the mirrors. With all of my comments, please keep in mind that I am not a huge enterprise customer with loads of money to spend on this. If I were, I''d just buy Thumpers. I''m a home user with a decent fileserver. -- This message posted from opensolaris.org
On 22-Dec-09, at 3:33 PM, James Risner wrote:> ... > Joerg Moellenkamp: > I do "consider RAID5 as ''Stripeset with an interleaved > Parity''", so I don''t agree with the strong objection in this thread > by many about the use of RAID5 to describe what raidz does. I > don''t think many particularly care about the nuanced differences > between hardware card RAID5 and raidz, other than knowing they > would rather have raidz over RAID5.These are hardly "nuanced differences". The most powerful capabilities of ZFS simply aren''t available in RAID. * Because ZFS is labelled a "filesystem", people assume it is analogous to a conventional filesystem then make misleading comparisons which fail to expose the profound differences; * or people think it''s a RAID or volume manager, assume it''s just RAID relabelled, and fail to see where it goes beyond. Of course it is neither, exactly, but a synthesis of the two which is far more capable than the two conventionally discrete layers in combination. (I know most of the list knows this :) --Toby> -- > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
On Dec 22, 2009, at 11:46 AM, Bob Friesenhahn <bfriesen at simple.dallas.tx.us > wrote:> On Tue, 22 Dec 2009, Ross Walker wrote: >> >> Raid10 provides excellent performance and if performance is a >> priority then I recommend it, but I was under the impression that >> resiliency was the priority, as raidz2/raidz3 provide greater >> resiliency for a sacrifice in performance. > > Why are people talking about "RAID-5", RAID-6", and "RAID-10" on > this list? This is the zfs-discuss list and zfs does not do > "RAID-5", "RAID-6", or "RAID-10".Because raid10 is shorter to type then pool of mirrors, and for new comers it''s easier to grasp these terms. Notice how I refer to raidz/ 2/3 and not raid5/6? Cause it''s roughly the same amount of characters. -Ross PS Really Bob?
On December 21, 2009 10:45:29 PM -0500 Ross Walker <rswwalker at gmail.com> wrote:> Scrubbing on a routine basis is good for detecting problems early, but it > doesn''t solve the problem of a double failure during resilver. As the > size of disks become huge the chance of a double failure during > resilvering increases to the point of real possibility. Due to the amount > of data, the bit error rates of the medium and the prolonged stress of > resilvering these monsters. > > For up to 1TB drives use nothing less than raidz2. For 1TB+ drives use > raidz3. Avoid raidz vdevs larger than 7 drives, better to have multiple > vdevs both for performance and reliability.Would be good fodder for the best practices doc, if you have the math to back it up. -frank
Ross Walker <rswwalker <at> gmail.com> writes:> > Scrubbing on a routine basis is good for detecting problems early, but > it doesn''t solve the problem of a double failure during resilver.Scrubbing doesn''t solve double failures, but it significantly decreases their likelihood. The assumption here is that the most common type of 2nd failures in a double failure scenario is uncorrectable errors. Not only scrubs detect and fix unc errors, but they also stress the drives as much as a resilver would do. Personally I have had my share of single-drive failures, but never any double failure. I do scrub on a weekly or monthly basis. -mrb
On Tue, 22 Dec 2009, Marty Scholes wrote:> > If there is a RAIDZ write penalty over mirroring, I am unaware of > it. In fact, sequential writes are faster under RAIDZ.There is always an IOPS penalty for raidz when writing or reading, given a particular zfs block size. There may be a write penalty for mirroring, but this depends heavily on whether the I/O paths are saturated or operate in parallel. It is true that a mirror requires a write for each mirror device, but if the I/O subsystem has the bandwidth for it, the cost of this can be astonishingly insignificant. It becomes significant when the I/O path is shared with limited bandwidth and the writes are large. As to whether sequential writes are faster under raidz, I have yet to see any actual evidence of that. Perhaps someone can provide some actual evidence? Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
On Tue, Dec 22 at 12:33, James Risner wrote:> As for whether or not to do raidz, for me the issue is performance. > I can''t handle the raidz write penalty. If I needed triple drive > protection, a 3way mirror setup would be the only way I would go. I > don''t yet quite understand why a 4+ drive raidz3 vdev is better than > a 3 drive mirror vdev? Other than a 6 drive setup is 3 drives of > space when a 6 drive setup using 3 way mirror is only 2 drive space.That''s a pretty big "other than", since the difference is 50% more space for the raidz3 in your case, and the difference grows as the number of drives increases. I concur with some of the other thoughts, that migration towards L2ARC + big slow sata pools is becoming a more recommended configuration. The recent automated pool recovery and ZIL removal improvements makes that design much more practical. --eric -- Eric D. Mudama edmudama at mail.bounceswoosh.org
>> Applying classic RAID terms to zfs is just plain >> wrong and misleading >> since zfs does not directly implement these classic >> RAID approaches >> even though it re-uses some of the algorithms for >> data recovery. >> Details do matter. > > That''s not entirely true, is it? > * RAIDZ is RAID5 + checksum + COW > * RAIDZ2 is RAID6 + checksum + COW > * A stack of mirror vdevs is RAID10 + checksum + COWOthers have noted that RAID-Z isn''t really the same as RAID-5 and RAID-Z2 isn''t the same as RAID-6 because RAID-5 and RAID-6 define not just the number of parity disks (which would have made far more sense in my mind), but instead also include in the definition a notion of how the data and parity are laid out. The RAID levels were used to describe groupings of existing implementations and conflate things like the number of parity devices with, say, how parity is distributed across devices. For example, RAID-Z1 lays out data most like RAID-3, that is a single block is carved up and spread across many disks, but distributes parity as required for RAID-5 but in a different manner. It''s an unfortunate state of affairs which is why further RAID levels should identify only the most salient aspect (the number of parity devices) or we should use unambiguous terms like single-parity and double-parity RAID.> If we can compare apples and oranges, would you same recommendation ("use raidz2 and/or raidz3") be the same when comparing to mirror with the same number of drives? In other words, a 2 drive mirror compares to raidz{1} the same as a 3 drive mirror compares to raidz2 and a 4 drive mirror compares to raidz3? If you were enterprise (in other words card about perf) why would you ever use raidz instead of throwing more drives at the problem and doing mirroring with identical parity?You''re right that a mirror is a degenerate form of raidz1, for example, but mirrors allow for specific optimizations. While the redundancy would be the same, the performance would not. Adam -- Adam Leventhal, Fishworks http://blogs.sun.com/ahl
I know I''m a bit late to contribute to this thread, but I''d still like to add my $0.02. My "gut feel" is that we (generally) don''t yet understand the subtleties of disk drive failure modes as they relate to 1.5 or 2Tb+ drives. Why? Because those large drives have not been widely available until relatively recently. There''s a tendency to extrapolate ones existing knowledge base and understanding of how/why drives fail (or degrade) by basing our expected outcome on some "extension" of our existing knowledge base. In the case of the current generation of high capacity drives, that may or may not be appropriate. We simply don''t know! Mainly because the hard drive manufacturers, those engineering gods and providers of ever increasing storage density, don''t communicate their acquired and evolving knowledge as it relates to disk reliability (or failure) mechanisms. In this case I feel, as a user, it''s best to take a very conservative approach and err on the side of safety by using raidz3 when high capacity drives are being deployed. Over time, a consensus based understanding of the failure modes will emerge and then, from a user perspective, we can have a clearer understanding of the risks of data loss and its relation to different ZFS pool configurations. Personally, I was surprised at how easily I was able to "take out" a 1Tb WD Caviar black drive by moving a 1U server with the drives spinning. Earlier drive generations (500Gb or smaller) tolerated this abuse with no signs of degradation. So I know that high capacity drives are a lot more sensitive to mechanical "abuse" - I can only assume that 2Tb drives are probably even more sensitive and that shock mounting, to reduce vibration induced by a bunch of similar drives operating in the same "box", is probably a smart move. Likewise, my previous experience has seen how a given percentage of disk drives would fail in the 2 or 3 week period following a temperature "excursion" in a data center environment. Sometimes everyone knows about that event, and sometimes the folks doing A/C work over a holiday weekend will "forget" to publish the details of what went wrong! :) Again - the same doubts continue to nag me: are the current 1.5Tb+ drives more likely to suffer degradation due to a temperature excursion over a relatively small time period? If the drive firmware does its job and remaps damaged sectors or tracks transparently, we, as the users, won''t know - until it happens one time too many!! Regards, -- Al Hopper Logical Approach Inc,Plano,TX al at logical-approach.com Voice: 972.379.2133 Timezone: US CDT OpenSolaris Governing Board (OGB) Member - Apr 2005 to Mar 2007 http://www.opensolaris.org/os/community/ogb/ogb_2005-2007/ -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20091227/80da6dc8/attachment.html>