Anyone here read the article "Why RAID 5 stops working in 2009" at http://blogs.zdnet.com/storage/?p=162 Does RAIDZ have the same chance of unrecoverable read error as RAID5 in Linux if the RAID has to be rebuilt because of a faulty disk? I imagine so because of the physical constraints that plague our hds. Granted, the chance of failure in my case shouldn''t be nearly as high as I will most likely recruit four or three 750gb drives- not in the order of 10tb. With my opensolaris NAS, I will be scrubbing every week (consumer grade drives[every month for enterprise-grade]) as recommended in the ZFS best practices guide. If I "zpool status" and I see that the scrub is increasingly fixing errors, would that mean that the disk is in fact headed towards failure or perhaps that the natural expansion of disk usage is to blame? This message posted from opensolaris.org
My take is that since RAID-Z creates a stripe for every block (http://blogs.sun.com/bonwick/entry/raid_z), it should be able to rebuild the bad sectors on a per block basis. I''d assume that the likelihood of having bad sectors on the same places of all the disks is pretty low since we''re only reading the sectors related to the block being rebuilt. It also seems that fragmentation would work in your favor here since the stripes would be distributed across more of the platter(s), hopefully protecting you from a wonky manufacturing defect that causes UREs on the same place on the disk. -Aaron On Thu, Jul 3, 2008 at 12:24 PM, Jim <techqbert at gmail.com> wrote:> Anyone here read the article "Why RAID 5 stops working in 2009" at http://blogs.zdnet.com/storage/?p=162 > > Does RAIDZ have the same chance of unrecoverable read error as RAID5 in Linux if the RAID has to be rebuilt because of a faulty disk? I imagine so because of the physical constraints that plague our hds. Granted, the chance of failure in my case shouldn''t be nearly as high as I will most likely recruit four or three 750gb drives- not in the order of 10tb. > > With my opensolaris NAS, I will be scrubbing every week (consumer grade drives[every month for enterprise-grade]) as recommended in the ZFS best practices guide. If I "zpool status" and I see that the scrub is increasingly fixing errors, would that mean that the disk is in fact headed towards failure or perhaps that the natural expansion of disk usage is to blame? > > > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
On Thu, Jul 3, 2008 at 3:09 PM, Aaron Blew <aaronblew at gmail.com> wrote:> My take is that since RAID-Z creates a stripe for every block > (http://blogs.sun.com/bonwick/entry/raid_z), it should be able to > rebuild the bad sectors on a per block basis. I''d assume that the > likelihood of having bad sectors on the same places of all the disks > is pretty low since we''re only reading the sectors related to the > block being rebuilt. It also seems that fragmentation would work in > your favor here since the stripes would be distributed across more of > the platter(s), hopefully protecting you from a wonky manufacturing > defect that causes UREs on the same place on the disk. > > -AaronThe per-block statement above is important - zfs will only rebuild the blocks that have data. A 100TB pool with 1 GB in use will rebuild 1 GB. As such, it is more a factor of the amount of data rather than the size of the RAID device. A periodic zpool scrub will likely turn up read errors before you have a drive failure AND unrelated read errors. Since ZFS merges the volume management and file system layers such an uncorrectable read would turn into zfs saying "file /a/b/c is corrupt - you need to restore it" rather than traditional RAID5 saying "this 12 TB volume is corrupt - restore it". ZFS already makes multiple copies of metadata so if you were "lucky" and the corruption happens to the metadata it should be able to get a working copy from elsewhere. Of course, raidz2 further decreases your chances of losing data. I would highly recommend reading Richard Elling''s comments in this area. For example: http://blogs.sun.com/relling/entry/raid_recommendations_space_vs_mttdl http://blogs.sun.com/relling/entry/zfs_raid_recommendations_space_performance http://blogs.sun.com/relling/entry/a_story_of_two_mttdl http://opensolaris.org/jive/thread.jspa?threadID=65564#255257 -- Mike Gerdts http://mgerdts.blogspot.com/
I''ve read various articles along those lines. My understanding is that a 500GB odd raid-z / raid-5 array has around a 1 in 10 chance of loosing at least some data during a rebuild. I''d have raid-5 arrays fail at least 4 times, twice during a rebuild. In most cases I''ve been able to recover the data (once by re-attaching the original failed drive since it proved more reliable than the 2nd one that failed). However on more than one occasion I''ve had to revert to backups. Raid-6 was something I was waiting a long time for. Now I use dual parity for everything I buy. At home I''ve a six drive raid-z2 box, at work the main server is a 16 drive 2 way mirror setup. When using SATA drives capacity is cheap enough (that work server is still 2.5TB for around ?2,500) and the peace of mind, particularly on the company servers is worth every penny. If you''re stuck with single parity raid-z, my advice would be to simply take a good set of backups and leave it at that until you can upgrade to dual parity. At the end of the day, the risk is relatively slight, and you''re data''s probably as much risk if you try to pro-actively replace a drive as if you just replace one when it fails. Just scrub every so often, and make sure you''ve got good backups. I don''t expect you''ll see too many problems. This message posted from opensolaris.org
Just re-read that and it''s badly phrased. What I meant to say is that a raid-z / raid-5 array based on 500GB drives seems to have around a 1 in 10 chance of loosing some data during a full rebuild. This message posted from opensolaris.org
Ross wrote:> Just re-read that and it''s badly phrased. What I meant to say is that a raid-z / raid-5 array based on 500GB drives seems to have around a 1 in 10 chance of loosing some data during a full rebuild. > > >Actually, I think it''s been explained already why this is actually one area where RAID-Z will really start to show some of the was it''s different than it''s RAID-5 ancestors. For one, A RAID-5 controller has no idea of the filesystem, and there for has to rebuild every bit on the disk, whether it''s used or not, and if it cant'' it will declare the whole array unusable. RAID-Z on the other hand since it is integrated with the filesystem, only needs to rebuild the *used* data, and won''t care if unused parts of the disks can''t be rebuilt. Second, a factor that the author of that article leaves out is that decent RAID-5, and RAID-Z can do ''scrubs'' of the data at regular intervals, and this will many times catch and deal with these read problems well before they have a chance to take all your data with them. The types of errors the author writes about many times are caused by how accurately the block was written and not a defect of the media, so many times they can be fixed by just rewriting the data to the same block. On ZFS this will almost never happen, because of COW it will always choose a new block to write to. I don''t think many (if any) RAID-5 implementaions can change the location of data on a drive. -Kyle> This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
Kyle McDonald writes: > Ross wrote: > > Just re-read that and it''s badly phrased. What I meant to say is that a raid-z / raid-5 array based on 500GB drives seems to have around a 1 in 10 chance of loosing some data during a full rebuild. > > > > > > > Actually, I think it''s been explained already why this is actually one > area where RAID-Z will really start to show some of the was it''s > different than it''s RAID-5 ancestors. For one, A RAID-5 controller has > no idea of the filesystem, and there for has to rebuild every bit on the > disk, whether it''s used or not, and if it cant'' it will declare the > whole array unusable. RAID-Z on the other hand since it is integrated > with the filesystem, only needs to rebuild the *used* data, and won''t > care if unused parts of the disks can''t be rebuilt. > > Second, a factor that the author of that article leaves out is that > decent RAID-5, and RAID-Z can do ''scrubs'' of the data at regular > intervals, and this will many times catch and deal with these read > problems well before they have a chance to take all your data with them. > The types of errors the author writes about many times are caused by how > accurately the block was written and not a defect of the media, so many > times they can be fixed by just rewriting the data to the same block. On > ZFS this will almost never happen, because of COW it will always choose > a new block to write to. I don''t think many (if any) RAID-5 > implementaions can change the location of data on a drive. > Moreover, ZFS stores redundant copies of metadata so even if a full raid-z stripe goes south, we can still rebuild most of pool data. It seems that at worst, such double failures would lead to a handful of un-recovered files. -r > -Kyle > > > This message posted from opensolaris.org > > _______________________________________________ > > zfs-discuss mailing list > > zfs-discuss at opensolaris.org > > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss