I suspect this will be the #1 complaint about zfs as it becomes more popular. "It worked before with ufs and hw raid, now with zfs it says my data is corrupt! zfs sux0rs!" #2 how do i grow a raid-z. The answers to these should probably be in a faq somewhere. I''d argue that the best practices guide is a good spot also, but the folks that would actually find and read that would seem likely to already understand that zfs detects errors other fs''s wouldn''t. -frank
On 11/28/06, Frank Cusack <fcusack at fcusack.com> wrote:> > I suspect this will be the #1 complaint about zfs as it becomes more > popular. "It worked before with ufs and hw raid, now with zfs it says > my data is corrupt! zfs sux0rs!"That''s not the problem, so much as "zfs says my file system is corrupt, how do I get past this?" With ufs, f''rinstance, I''d run an fsck, kiss the bad file(s) goodbye, and be on my way. With zfs, there''s this ominous message saying "destroy the filesystem and restore from tape". That''s not so good, for one corrupt file. And even better, turns out erasing the file might just be enough. Although in my case, I now have a new bad object. Sun pointed me to docs.sun.com (thanks, that helps!) but I haven''t found anything in the docs on this so far. I am assuming that my bad object 45654c is an inode number of a special file of some sort, but what? And what does the range mean? I''d love to read the docs on htis -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20061128/bbebaeca/attachment.html>
On 28-Nov-06, at 7:02 PM, Elizabeth Schwartz wrote:> On 11/28/06, Frank Cusack <fcusack at fcusack.com> wrote: > I suspect this will be the #1 complaint about zfs as it becomes more > popular. "It worked before with ufs and hw raid, now with zfs it says > my data is corrupt! zfs sux0rs!" > > That''s not the problem, so much as "zfs says my file system is > corrupt, how do I get past this?"Yes, that''s your problem right now. But Frank describes a likely general syndrome. :-)> With ufs, f''rinstance, I''d run an fsck, kiss the bad file(s) > goodbye, and be on my way.No, you still have the hardware problem.> With zfs, there''s this ominous message saying "destroy the > filesystem and restore from tape". That''s not so good, for one > corrupt file.As others have pointed out, you wouldn''t have reached this point with redundancy - the file would have remained intact despite the hardware failure. It is strictly correct that to restore the data you''d need to refer to a backup, in this case.> And even better, turns out erasing the file might just be enough. > Although in my case, I now have a new bad object. Sun pointed me to > docs.sun.com (thanks, that helps!) but I haven''t found anything in > the docs on this so far. I am assuming that my bad object 45654c is > an inode number of a special file of some sort, but what? And what > does the range mean? I''d love to read the docs on htisProblems will continue until your hardware is fixed. (Or you conceal them with a redundant ZFS configuration, but that would be a bad idea.) --Toby> > > > > > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20061128/62999692/attachment.html>
>> With zfs, there''s this ominous >> message saying "destroy the filesystem and restore >> from tape". That''s? not so good, for one corrupt >> file.> It is strictly correct that to restore the data you''d need > to refer to a backup, in this case.It is not, however, correct that to restore the data you need to destroy the entire file system and restore it. If we?re stating that the fix for a bad block in an individual data file is to reload the whole FS, there?s a documentation issue. We should say something more like ?An unrecoverable error was found in ?/homepool/johndoe/.login?. This file should be restored from backup.? This message posted from opensolaris.org
> No, you still have the hardware problem.What hardware problem? There seems to be an unspoken assumption that any checksum error detected by ZFS is caused by a relatively high error rate in the underlying hardware. There are at least two classes of hardware-related errors. One class are those which are genuinely being introduced at a high rate, as exemplified by the post earlier in this list about the bad FibreChannel port on a SAN. The other are those which are very rare events, for instance a radiation-induced bit-flip in SRAM. In this case, there is no ?problem? as such to be repaired (well, perhaps if you live in Denver you could buy radiation shielding for your computer room ;-). (There are also software errors. Errors in ZFS itself or anywhere else in the Solaris kernel, including device drivers, can result in erroneous data being written to disk. There may be a software problem, rather than a hardware problem, in any individual case.) Clearly, the existence of a high error rate (say, more than one error every two weeks on a server pushing 100 MB/second) would point to a hardware or software problem; but fewer errors may simply be ?normal? for standard hardware. This message posted from opensolaris.org
On 28-Nov-06, at 10:35 PM, Anton B. Rang wrote:>> No, you still have the hardware problem. > > What hardware problem? > > There seems to be an unspoken assumption that any checksum error > detected by ZFS is caused by a relatively high error rate in the > underlying hardware. > > There are at least two classes of hardware-related errors. One > class are those which are genuinely being introduced at a high > rate, as exemplified by the post earlier in this list about the bad > FibreChannel port on a SAN. The other are those which are very rare > events, for instance a radiation-induced bit-flip in SRAM. In this > case, there is no ?problem? as such to be repaired (well, perhaps > if you live in Denver you could buy radiation shielding for your > computer room ;-). > > (There are also software errors. Errors in ZFS itself or anywhere > else in the Solaris kernel, including device drivers, can result in > erroneous data being written to disk. There may be a software > problem, rather than a hardware problem, in any individual case.) > > Clearly, the existence of a high error rate (say, more than one > error every two weeks on a server pushing 100 MB/second) would > point to a hardware or software problem; but fewer errors may > simply be ?normal? for standard hardware.Her original configuration wasn''t redundant, so she should expect this kind of manual recovery from time to time. Seems a logical conclusion to me? Or is this one of those once-in-a-lifetime strikes? --Toby> > > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
On Tue, Nov 28, 2006 at 08:03:33PM -0500, Toby Thain wrote:> As others have pointed out, you wouldn''t have reached this point with > redundancy - the file would have remained intact despite the hardware > failure. It is strictly correct that to restore the data you''d need > to refer to a backup, in this case.Well, you could get really unlucky no matter how much redundancy you have, but now we''re splitting hairs :) (The more redundancy, the worse your luck has to be to be truly out of luck.)
Anton B. Rang wrote:> Clearly, the existence of a high error rate (say, more than one error every two weeks on a server pushing 100 MB/second) would point to a hardware or software problem; but fewer errors may simply be ?normal? for standard hardwareI currently have a server that has a much higher rate of checksum errors than what you describe to be "normal." I knew it wasn''t good but I figured if zfs is fixing it for me why mess with it? Is there anything I can do to troubleshoot where the problem might be coming from (aside from replacing hardware piece by piece) ?
Christopher Scott wrote:> > I currently have a server that has a much higher rate of checksum > errors than what you describe to be "normal." I knew it wasn''t good > but I figured if zfs is fixing it for me why mess with it? > > Is there anything I can do to troubleshoot where the problem might be > coming from (aside from replacing hardware piece by piece) ?Other than correlations between where the errors are occuring and the physical paths to things? You''ll probably want to check with your vendor (or vendors) for the various components. I don''t know if yours is Sun hardware, but if so there''s usually a test suite of some sort (VTS or others) that can exercise components individually and help to isolate such problems. This can be hard to do though, and not all problems can be caught. Still if it''s Sun stuff and under contract/warranty, open a case and they''ll give you some steps to diagnose (or in some cases, do it for you). If it''s a mix of vendors though, you may have some challenges as individual vendors may have specific requirements for any diagnosis tools, assuming they''re available. Having been through this with customers in the past, it can be quite a challenge. Consider yourself lucky that zfs is catching/correcting things! - Matt -- Matt Ingenthron - Web Infrastructure Solutions Architect Sun Microsystems, Inc. - Global Systems Practice http://blogs.sun.com/mingenthron/ email: matt.ingenthron at sun.com Phone: 310-242-6439
On Tue, Nov 28, 2006 at 10:48:46PM -0500, Toby Thain wrote:> > Her original configuration wasn''t redundant, so she should expect > this kind of manual recovery from time to time. Seems a logical > conclusion to me? Or is this one of those once-in-a-lifetime strikes?That''s not an entirely true statement. Her configuration is redundant from a traditional disk subsystem point of view. I think the problem here is that the old disk subsystem mindsets no longer apply with the way something like ZFS works. This is going to be the largest stumbling block of all of them I believe, not anything technical. If I had the money and time, I''d build a hardware RAID controller that could do ZFS natively. It would be dead simple (*I* think anyway) to make it transparent to the ZFS layer. ;) -brian
On 29-Nov-06, at 8:53 AM, Brian Hechinger wrote:> On Tue, Nov 28, 2006 at 10:48:46PM -0500, Toby Thain wrote: >> >> Her original configuration wasn''t redundant, so she should expect >> this kind of manual recovery from time to time. Seems a logical >> conclusion to me? Or is this one of those once-in-a-lifetime strikes? > > That''s not an entirely true statement. Her configuration is redundant > from a traditional disk subsystem point of view. I think the problem > here is that the old disk subsystem mindsets no longer apply with the > way something like ZFS works.That is very true from what I''ve seen. ZFS definitely has a problem cracking the old-think, but then any generational shift does, historically! (I won''t bore with other examples.)> This is going to be the largest stumbling > block of all of them I believe, not anything technical. > > If I had the money and time, I''d build a hardware RAID controller that > could do ZFS natively.We already have one: Thumper. :) But in terms of replacing the traditional RAID subsystem: I don''t see how such a design could address faults between the isolated controller and the host (in the way that software ZFS does). Am I missing something in your idea? The "old" think is that it is sufficient to have a very complex and expensive RAID controller which claims to be reliable storage. But of course it''s not: No matter how excellent your subsystem is, it''s still isolated by unreliable components (and non-checksummed RAID is inherently at risk anyway). --Toby> It would be dead simple (*I* think anyway) to make > it transparent to the ZFS layer. ;) > > -brian > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss