Howdy, I have at several times had issues with consumer grade PC hardware and ZFS not getting along. The problem is not the disks but the fact I dont have ECC and end to end checking on the datapath. What is happening is that random memory errors and bit flips are written out to disk and when read back again ZFS reports it as a checksum failure: pool: myth state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: none requested config: NAME STATE READ WRITE CKSUM myth ONLINE 0 0 48 raidz1 ONLINE 0 0 48 c7t1d0 ONLINE 0 0 0 c7t3d0 ONLINE 0 0 0 c6t1d0 ONLINE 0 0 0 c6t2d0 ONLINE 0 0 0 errors: Permanent errors have been detected in the following files: /myth/tv/1504_20080216203700.mpg /myth/tv/1509_20080217192700.mpg Note there are no disk errors, just entire RAID errors. I get the same thing on a mirror pool where both sides of the mirror have identical errors. All I can assume is that it was corrupted after the checksum was calculated and flushed to disk like that. In the past it was a motherboard capacitor that had popped - but it was enough to generate these errors under load. At any rate ZFS is doing the right thing by telling me - what I dont like is that from that point on I cant convince ZFS to ignore it. The data in question is video files - a bit flip here or there wont matter. But if ZFS reads the affected block it returns and I/O error and until I restore the file I have no option but to try and make the application skip over it. If it was UFS for example I would have never known, but ZFS makes a point of stopping anything using it - understandably, but annoyingly as well. What I would like to see is an option to ZFS in the style of the ''onerror'' for UFS i.e the ability to tell ZFS to join fight club - let what doesnt matter truely slide. For example: zfs set erroraction=[iofail|log|ignore] This would default to the current action of "iofail" but in the event you wanted to try and recover or repair data, you could set log to say generate an FMA event that there is bad checksums, or ignore, to get on with your day. As mentioned, I see this as mostly an option to help repair data after the issue is identified or repaired. Of course its data specific, but if the application can allow it or handle it, why should ZFS get in the way? Just a thought. Cheers, Adrian PS: And yes, I am now buying some ECC memory. This message posted from opensolaris.org
comment below... Adrian Saul wrote:> Howdy, > I have at several times had issues with consumer grade PC hardware and ZFS not getting along. The problem is not the disks but the fact I dont have ECC and end to end checking on the datapath. What is happening is that random memory errors and bit flips are written out to disk and when read back again ZFS reports it as a checksum failure: > > pool: myth > state: ONLINE > status: One or more devices has experienced an error resulting in data > corruption. Applications may be affected. > action: Restore the file in question if possible. Otherwise restore the > entire pool from backup. > see: http://www.sun.com/msg/ZFS-8000-8A > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > myth ONLINE 0 0 48 > raidz1 ONLINE 0 0 48 > c7t1d0 ONLINE 0 0 0 > c7t3d0 ONLINE 0 0 0 > c6t1d0 ONLINE 0 0 0 > c6t2d0 ONLINE 0 0 0 > > errors: Permanent errors have been detected in the following files: > > /myth/tv/1504_20080216203700.mpg > /myth/tv/1509_20080217192700.mpg > > Note there are no disk errors, just entire RAID errors. I get the same thing on a mirror pool where both sides of the mirror have identical errors. All I can assume is that it was corrupted after the checksum was calculated and flushed to disk like that. In the past it was a motherboard capacitor that had popped - but it was enough to generate these errors under load. > > At any rate ZFS is doing the right thing by telling me - what I dont like is that from that point on I cant convince ZFS to ignore it. The data in question is video files - a bit flip here or there wont matter. But if ZFS reads the affected block it returns and I/O error and until I restore the file I have no option but to try and make the application skip over it. If it was UFS for example I would have never known, but ZFS makes a point of stopping anything using it - understandably, but annoyingly as well. > > What I would like to see is an option to ZFS in the style of the ''onerror'' for UFS i.e the ability to tell ZFS to join fight club - let what doesnt matter truely slide. For example: > > zfs set erroraction=[iofail|log|ignore] > > This would default to the current action of "iofail" but in the event you wanted to try and recover or repair data, you could set log to say generate an FMA event that there is bad checksums, or ignore, to get on with your day. > > As mentioned, I see this as mostly an option to help repair data after the issue is identified or repaired. Of course its data specific, but if the application can allow it or handle it, why should ZFS get in the way? > > Just a thought. > > Cheers, > Adrian > > PS: And yes, I am now buying some ECC memory. >I don''t recall when this arrived in NV, but the failmode parameter for storage pools has already been implemented. From zpool(1m) failmode=wait | continue | panic Controls the system behavior in the event of catas- trophic pool failure. This condition is typically a result of a loss of connectivity to the underlying storage device(s) or a failure of all devices within the pool. The behavior of such an event is determined as follows: wait Blocks all I/O access until the device con- nectivity is recovered and the errors are cleared. This is the default behavior. continue Returns EIO to any new write I/O requests but allows reads to any of the remaining healthy devices. Any write requests that have yet to be committed to disk would be blocked. panic Prints out a message to the console and gen- erates a system crash dump. -- richard
Richard Elling wrote:> Adrian Saul wrote: >> Howdy, I have at several times had issues with consumer grade PC >> hardware and ZFS not getting along. The problem is not the disks >> but the fact I dont have ECC and end to end checking on the >> datapath. What is happening is that random memory errors and bit >> flips are written out to disk and when read back again ZFS reports >> it as a checksum failure: >> >> pool: myth state: ONLINE status: One or more devices has >> experienced an error resulting in data corruption. Applications >> may be affected. action: Restore the file in question if possible. >> Otherwise restore the entire pool from backup. see: >> http://www.sun.com/msg/ZFS-8000-8A scrub: none requested config: >> >> NAME STATE READ WRITE CKSUM myth ONLINE 0 >> 0 48 raidz1 ONLINE 0 0 48 c7t1d0 ONLINE 0 >> 0 0 c7t3d0 ONLINE 0 0 0 c6t1d0 ONLINE 0 >> 0 0 c6t2d0 ONLINE 0 0 0 >> >> errors: Permanent errors have been detected in the following files: >> >> >> /myth/tv/1504_20080216203700.mpg /myth/tv/1509_20080217192700.mpg >> >> Note there are no disk errors, just entire RAID errors. I get the >> same thing on a mirror pool where both sides of the mirror have >> identical errors. All I can assume is that it was corrupted after >> the checksum was calculated and flushed to disk like that. In the >> past it was a motherboard capacitor that had popped - but it was >> enough to generate these errors under load.I got a similar CKSUM error recently in which a block from a different file ended up in one of my files. So this was not a simple bit-flip, but 64K of the file was bad. However, I do not think any disk filesystem should tolerate even bit flips. Even in video files, I''d want to know that I hacked the ZFS source to temporarily ignore the error so I could see what was wrong. So your error(s) might be something of this kind (except I do not understand, if so, how both of your mirrors were affected in the same way - do you know this, or did ZFS simply say that the file was not recoverable - i.e. it might have had different bad bits in the two mirrors?). For me, at least on subsequent reboots, no read or write errors were reported on mine either, just CKSUM (I do seem to recall other errors listed - read or write - but they were cleared on reboot, so I cannot recall it exactly). And I would think it''s possible to get no errors if it''s simply a misdirected block write. Still, I would then wonder why I didn''t see *2* files with errors if this is what happened to me. I guess I am saying that this may not be a memory glitch, but could also be some IDE cable issue (as mine turned out to be). See my post here: http://lists.freebsd.org/pipermail/freebsd-stable/2008-February/040355.html>> At any rate ZFS is doing the right thing by telling me - what I >> dont like is that from that point on I cant convince ZFS to ignore >> it. The data in question is video files - a bit flip here or there >> wont matter. But if ZFS reads the affected block it returns and >> I/O error and until I restore the file I have no option but to try >> and make the application skip over it. If it was UFS for example I >> would have never known, but ZFS makes a point of stopping anything >> using it - understandably, but annoyingly as well.I understand your situation, and I agree that user-control might be nice (in my case, I would not have had to tweak the ZFS code). I do think that zpool status should still reveal the error, however, even if the file read does not report it (if you have set ZFS to ignore the error). I can also imagine this could be a bit dangerous if, e.g., the user forgets this option is set.>> PS: And yes, I am now buying some ECC memory.Good practice in general - I always use ECC. There is nothing worse than silent data corruption.> I don''t recall when this arrived in NV, but the failmode parameter > for storage pools has already been implemented. From zpool(1m) > failmode=wait | continue | panic > > Controls the system behavior in the event of catas- > trophic pool failure. This condition is typically a > result of a loss of connectivity to the underlying > storage device(s) or a failure of all devices within the > pool. The behavior of such an event is determined as > follows: > > wait Blocks all I/O access until the device con- > nectivity is recovered and the errors are > cleared. This is the default behavior. > > continue Returns EIO to any new write I/O requests > but allows reads to any of the remaining > healthy devices. Any write requests that > have yet to be committed to disk would be > blocked. > > panic Prints out a message to the console and gen- > erates a system crash dump.Is "wait" the default behavior now? When I had CKSUM errors, reading the file would return EIO and stop reading at that point (returning only the good data so far). Do you mean it blocks access on the errored file, or on the whole device? I''ve noticed the former, but not the latter. Also, I''m not sure I understand "continue". This also seems more severe than current behavior, in which access to any files other than the one(s) with errors still work. -Joe
On Mon, Feb 18, 2008 at 11:52:48AM -0700, Joe Peterson wrote:> > Is "wait" the default behavior now? When I had CKSUM errors, reading > the file would return EIO and stop reading at that point (returning only > the good data so far). Do you mean it blocks access on the errored > file, or on the whole device? I''ve noticed the former, but not the latter.The ''failmode'' property only applies when writes fail, or read-during-write dependies, such as the spacemaps. It does not affect normal reads. - Eric -- Eric Schrock, Fishworks http://blogs.sun.com/eschrock
On Mon, Feb 18, 2008 at 11:15:34AM -0800, Eric Schrock wrote:> > The ''failmode'' property only applies when writes fail, or > read-during-write dependies, such as the spacemaps. It does not affect^^^^^^^^^ That should read ''dependencies'', obviously ;-) - Eric -- Eric Schrock, Fishworks http://blogs.sun.com/eschrock