hi folks I''ve been running my fileserver at home with linux for a couple of years and last week I finally reinstalled it with solaris 10 u4. I borrowed a bunch of disks from a friend, copied over all the files, reinstalled my fileserver and copied the data back. Everything went fine, but after a few days now, quite a lot of files got corrupted. here''s the output: # zpool status data pool: data state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: scrub completed with 422 errors on Mon Feb 25 00:32:18 2008 config: NAME STATE READ WRITE CKSUM data ONLINE 0 0 5.52K raidz1 ONLINE 0 0 5.52K c0t0d0 ONLINE 0 0 10.72 c0t1d0 ONLINE 0 0 4.59K c0t2d0 ONLINE 0 0 5.18K c0t3d0 ONLINE 0 0 9.10K c1t0d0 ONLINE 0 0 7.64K c1t1d0 ONLINE 0 0 3.75K c1t2d0 ONLINE 0 0 4.39K c1t3d0 ONLINE 0 0 6.04K errors: 388 data errors, use ''-v'' for a list Last night I found out about this, it told me there were errors in like 50 files. So I scrubbed the whole pool and it found a lot more corrupted files. The temporary system which I used to hold the data while I''m installing solaris on my fileserver is running nv build 80 and no errors on there. What could be the cause of these errors?? I don''t see any hw errors on my disks.. # iostat -En | grep -i error c3d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 c4d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 c0t0d0 Soft Errors: 574 Hard Errors: 0 Transport Errors: 0 Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 c1t0d0 Soft Errors: 549 Hard Errors: 0 Transport Errors: 0 Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 c0t1d0 Soft Errors: 14 Hard Errors: 0 Transport Errors: 0 Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 c0t2d0 Soft Errors: 549 Hard Errors: 0 Transport Errors: 0 Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 c0t3d0 Soft Errors: 549 Hard Errors: 0 Transport Errors: 0 Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 c1t1d0 Soft Errors: 548 Hard Errors: 0 Transport Errors: 0 Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 c1t2d0 Soft Errors: 14 Hard Errors: 0 Transport Errors: 0 Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 c1t3d0 Soft Errors: 548 Hard Errors: 0 Transport Errors: 0 Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 although a lot of soft errors. Linux said that one disk had gone bad, but I figured the sata cable was somehow broken, so I replaced that before installing solaris. And solaris didn''t and doesn''t see any actual hw errors on the disks, does it? This message posted from opensolaris.org
My guess is that you have some defective hardware in the system that''s causing bit flips in the checksum or the data payload. I''d suggest running some sort of system diagnostics for a few hours to see if you can locate the bad piece of hardware. My suspicion would be your memory or CPU, but that''s just a wild guess, based on the number of errors you have and the number of devices it''s spread over. Could it be that you have been corrupting data for some time and now known it? Oh - And i''d also look around based on your disk controller and ensure that there are no newer patches for it, just in case it''s one for which there was a known problem. (which was worked around in the driver) I *think* there was an issue with at least one or two... Cheers! Nathan. Sandro wrote:> hi folks > > I''ve been running my fileserver at home with linux for a couple of years and last week I finally reinstalled it with solaris 10 u4. > > I borrowed a bunch of disks from a friend, copied over all the files, reinstalled my fileserver and copied the data back. > > Everything went fine, but after a few days now, quite a lot of files got corrupted. > here''s the output: > > # zpool status data > pool: data > state: ONLINE > status: One or more devices has experienced an error resulting in data > corruption. Applications may be affected. > action: Restore the file in question if possible. Otherwise restore the > entire pool from backup. > see: http://www.sun.com/msg/ZFS-8000-8A > scrub: scrub completed with 422 errors on Mon Feb 25 00:32:18 2008 > config: > > NAME STATE READ WRITE CKSUM > data ONLINE 0 0 5.52K > raidz1 ONLINE 0 0 5.52K > c0t0d0 ONLINE 0 0 10.72 > c0t1d0 ONLINE 0 0 4.59K > c0t2d0 ONLINE 0 0 5.18K > c0t3d0 ONLINE 0 0 9.10K > c1t0d0 ONLINE 0 0 7.64K > c1t1d0 ONLINE 0 0 3.75K > c1t2d0 ONLINE 0 0 4.39K > c1t3d0 ONLINE 0 0 6.04K > > errors: 388 data errors, use ''-v'' for a list > > Last night I found out about this, it told me there were errors in like 50 files. > So I scrubbed the whole pool and it found a lot more corrupted files. > > The temporary system which I used to hold the data while I''m installing solaris on my fileserver is running nv build 80 and no errors on there. > > What could be the cause of these errors?? > I don''t see any hw errors on my disks.. > > # iostat -En | grep -i error > c3d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 > c4d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 > c0t0d0 Soft Errors: 574 Hard Errors: 0 Transport Errors: 0 > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 > c1t0d0 Soft Errors: 549 Hard Errors: 0 Transport Errors: 0 > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 > c0t1d0 Soft Errors: 14 Hard Errors: 0 Transport Errors: 0 > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 > c0t2d0 Soft Errors: 549 Hard Errors: 0 Transport Errors: 0 > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 > c0t3d0 Soft Errors: 549 Hard Errors: 0 Transport Errors: 0 > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 > c1t1d0 Soft Errors: 548 Hard Errors: 0 Transport Errors: 0 > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 > c1t2d0 Soft Errors: 14 Hard Errors: 0 Transport Errors: 0 > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 > c1t3d0 Soft Errors: 548 Hard Errors: 0 Transport Errors: 0 > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 > > although a lot of soft errors. > Linux said that one disk had gone bad, but I figured the sata cable was somehow broken, so I replaced that before installing solaris. And solaris didn''t and doesn''t see any actual hw errors on the disks, does it? > > > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Le lundi 25 f?vrier 2008 ? 11:05 -0800, Sandro a ?crit :> hi folksHi,> I''ve been running my fileserver at home with linux for a couple of years and last week I finally reinstalled it with solaris 10 u4. > > I borrowed a bunch of disks from a friend, copied over all the files, reinstalled my fileserver and copied the data back. > > Everything went fine, but after a few days now, quite a lot of files got corrupted. > here''s the output: > > # zpool status data > pool: data > state: ONLINE > status: One or more devices has experienced an error resulting in data > corruption. Applications may be affected. > action: Restore the file in question if possible. Otherwise restore the > entire pool from backup. > see: http://www.sun.com/msg/ZFS-8000-8A > scrub: scrub completed with 422 errors on Mon Feb 25 00:32:18 2008 > config: > > NAME STATE READ WRITE CKSUM > data ONLINE 0 0 5.52K > raidz1 ONLINE 0 0 5.52K > c0t0d0 ONLINE 0 0 10.72 > c0t1d0 ONLINE 0 0 4.59K > c0t2d0 ONLINE 0 0 5.18K > c0t3d0 ONLINE 0 0 9.10K > c1t0d0 ONLINE 0 0 7.64K > c1t1d0 ONLINE 0 0 3.75K > c1t2d0 ONLINE 0 0 4.39K > c1t3d0 ONLINE 0 0 6.04K > > errors: 388 data errors, use ''-v'' for a list > > Last night I found out about this, it told me there were errors in like 50 files. > So I scrubbed the whole pool and it found a lot more corrupted files. > > The temporary system which I used to hold the data while I''m installing solaris on my fileserver is running nv build 80 and no errors on there. > > What could be the cause of these errors?? > I don''t see any hw errors on my disks.. > > # iostat -En | grep -i error > c3d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 > c4d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 > c0t0d0 Soft Errors: 574 Hard Errors: 0 Transport Errors: 0 > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 > c1t0d0 Soft Errors: 549 Hard Errors: 0 Transport Errors: 0 > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 > c0t1d0 Soft Errors: 14 Hard Errors: 0 Transport Errors: 0 > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 > c0t2d0 Soft Errors: 549 Hard Errors: 0 Transport Errors: 0 > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 > c0t3d0 Soft Errors: 549 Hard Errors: 0 Transport Errors: 0 > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 > c1t1d0 Soft Errors: 548 Hard Errors: 0 Transport Errors: 0 > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 > c1t2d0 Soft Errors: 14 Hard Errors: 0 Transport Errors: 0 > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 > c1t3d0 Soft Errors: 548 Hard Errors: 0 Transport Errors: 0 > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 > > although a lot of soft errors. > Linux said that one disk had gone bad, but I figured the sata cable was somehow broken, so I replaced that before installing solaris. And solaris didn''t and doesn''t see any actual hw errors on the disks, does it?I had the same symptoms recently. I also thought the disk were dying but I was wrong. Suspected the RAM, no. Finally it was because I mixed raid cards on different PCI buses : 2 64bits buses (no problem with these ones) and 1 32 Bits PCI bus which caused *all* the checksum errors. Kicked ou the card on the 32 bit PCI bus and all worked fine. Hope it helps, -- Nicolas Szalay Administrateur syst?mes & r?seaux -- _ ASCII ribbon campaign ( ) - against HTML email X & vCards / \ -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: Ceci est une partie de message num?riquement sign?e URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20080226/93db10cf/attachment.bin>
Hey Thanks for your answers guys. I''ll run VTS to stresstest cpu and memory. And I just checked the block diagram of my motherboard (Gigabyte M61P-S3). It doesn''t even have 64bit pci slots.. just standard old 33mhz 32bit pci .. and a couple of newer pci-e. But my two controllers are both the same vendor / version and are both connected to the same pci bus. This message posted from opensolaris.org
Le mardi 26 f?vrier 2008 ? 05:59 -0800, Sandro a ?crit :> Hey > > Thanks for your answers guys. > > I''ll run VTS to stresstest cpu and memory. > > And I just checked the block diagram of my motherboard (Gigabyte M61P-S3). > It doesn''t even have 64bit pci slots.. just standard old 33mhz 32bit pci .. and a couple of newer pci-e. > But my two controllers are both the same vendor / version and are both connected to the same pci bus.looks like 32 bits & ZFS definitively hurts :D -- Nicolas Szalay Administrateur syst?mes & r?seaux -- _ ASCII ribbon campaign ( ) - against HTML email X & vCards / \ -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: Ceci est une partie de message num?riquement sign?e URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20080227/0802ef9e/attachment.bin>
haha very funny :D Just the controllers are on a 32bit PCI bus.. solaris itself is running 64bit: [root at ragnaros] /var/tmp/ # isainfo amd64 i386 And besides, a lot of our customers are having serious problems with their thumpers and zfs and stuff... This message posted from opensolaris.org
> So I scrubbed the whole pool and it found a lot more corrupted files.My condolences :) General questions and comments about ZFS and data corruption: I thought RAIDZ would correct data errors automatically with the parity data. How wrong am I on that? Perhaps a parity correction was already tried, and there was too much corruption to be successful, implying a very significant amount of data corruption? Assuming the errors are being generated by bad hardware somewhere between the disk and the CPU (inclusively), how could ZFS be configured to handle these errors automatically? Set data copies to equal 2, I think. Anything else? This message posted from opensolaris.org
Thanks for your reassuring post, loomy :) I''m pretty sure the reason for all this is some bad hardware.. But I can''t get VTS to work, looks like its not supported for this kind of hardware. And in order to run some other stresstest software or something I would have to connect monitor, keyboard and dvd rom.. which I''m just so sick of doing :) Hopefully I can motivate myself on the weekend .. I''ll keep you all here updated when I find something. This message posted from opensolaris.org
> I thought RAIDZ would correct data errors automatically with the parity data.Right. However, if the data is corrupted while in memory (e.g. on a PC with non-parity memory), there''s nothing ZFS can do to detect that. I mean, not even theoretically. The best we could do would be to narrow the windows of vulnerability by recomputing the checksum every time we accessed an in-memory object, which would be terribly expensive. Jeff
Nathan Kroenert
2008-Mar-02 22:49 UTC
[zfs-discuss] Dealing with Single Bit Flips - WAS: Cause for data corruption?
Say, Jeff - Speaking of expensive, but interesting things we could do - From the little I know of ZFS''s checksum, it''s NOT like the ECC checksum we use in memory in that it''s not something we can use to determine which bit flipped in the event that there was a single bit flip in the data. (I could be completely wrong here... but...) What is the chance we could put a little more resilience into ZFS such that if we do get a checksum error, we systematically flip each bit in sequence and check the checksum to see if we could in fact proceed (including writing the data back correctly.). Or build into the checksum something analogous to ECC so we can choose to use NON-ZFS protected disks and paths, but still have single bit flip protection... Considering the pain that users of NON-ZFS protected systems suffer when there is minor corruption, it would be fantastic if we could attempt to work through the simple case of a single flipped bit for the user, and if we find that flipping said bit gets us to a consistent checksum, proceed. I know that on the default 128K block size, that''s a lot of bits, and a lot of operations to arrive at an answer either way but if we could log an error, and spend the cycles to try to recover and proceed without user intervention, that would have to be a huge win for ZFS, even if the re-calculation took a few seconds. What do others on the list think? Do we have enough folks using ZFS on HDS / EMC / other hardware RAID(X) environments that might find this useful? Thoughts? And of course, sorry if we already do this... :) Nathan. Jeff Bonwick wrote:>> I thought RAIDZ would correct data errors automatically with the parity data. > > Right. However, if the data is corrupted while in memory (e.g. on a PC > with non-parity memory), there''s nothing ZFS can do to detect that. > I mean, not even theoretically. The best we could do would be to > narrow the windows of vulnerability by recomputing the checksum > every time we accessed an in-memory object, which would be terribly > expensive. > > Jeff > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Bob Friesenhahn
2008-Mar-02 23:28 UTC
[zfs-discuss] Dealing with Single Bit Flips - WAS: Cause for data corruption?
On Mon, 3 Mar 2008, Nathan Kroenert wrote:> Speaking of expensive, but interesting things we could do - > > From the little I know of ZFS''s checksum, it''s NOT like the ECC > checksum we use in memory in that it''s not something we can use to > determine which bit flipped in the event that there was a single bit > flip in the data. (I could be completely wrong here... but...)It seems that the emphasis on single-bit errors may be misplaced. Is there evidence which suggests that single-bit errors are much more common than multiple bit errors?> What is the chance we could put a little more resilience into ZFS such > that if we do get a checksum error, we systematically flip each bit in > sequence and check the checksum to see if we could in fact proceed > (including writing the data back correctly.).It is easier to retry the disk read another 100 times or store the data in multiple places.> Or build into the checksum something analogous to ECC so we can choose > to use NON-ZFS protected disks and paths, but still have single bit flip > protection...Disk drives commonly use an algorithm like Reed Solomon (http://en.wikipedia.org/wiki/Reed-Solomon_error_correction) which provides forward-error correction. This is done in hardware. Doing the same in software is likely to be very slow.> What do others on the list think? Do we have enough folks using ZFS on > HDS / EMC / other hardware RAID(X) environments that might find this useful?It seems that since ZFS is intended to support extremely large storage pools, available energy should be spent ensuring that the storage pool remains healthy or can be repaired. Loss of individual file blocks is annoying, but loss of entire storage pools is devastating. Since raw disk is cheap (and backups are expensive), it makes sense to write more redundant data rather than to minimize loss through exotic algorithms. Even if RAID is not used, redundant copies may be used on the same disk to help protect against block read errors. Bob =====================================Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Jeff Bonwick
2008-Mar-03 00:28 UTC
[zfs-discuss] Dealing with Single Bit Flips - WAS: Cause for data corruption?
Nathan: yes. Flipping each bit and recomputing the checksum is not only possible, we actually did it in early versions of the code. The problem is that it''s really expensive. For a 128K block, that''s a million bits, so you have to re-run the checksum a million times, on 128K of data. That''s 128GB of data to churn through. So Bob: you''re right too. It''s generally much cheaper to retry the I/O, try another disk, try a ditto block, etc. That said, when all else fails, a 128GB computation is a lot cheaper than a restore from tape. At some point it becomes a bit philosophical. Suppose the block in question is a single user data block. How much of the machine should you be willing to dedicate to getting that block back? I mean, suppose you knew that it was theoretically possible, but would consume 500 hours of CPU time during which everything else would be slower -- and the affected app''s read() system call would hang for 500 hours. What is the right policy? There''s no one right answer. If we were to introduce a feature like this, we''d need some admin-settable limit on how much time to dedicate to it. For some checksum functions like fletcher2 and fletcher4, it is possible to do much better than brute force because you can compute an incremental update -- that is, you can compute the effect of changing the nth bit without rerunning the entire checksum. This is, however, not possible with SHA-256 or any other secure hash. We ended up taking that code out because single-bit errors didn''t seem to arise in practice, and in testing, the error correction had a rather surprising unintended side effect: it masked bugs in the code! The nastiest kind of bug in ZFS is something we call a future leak, which is when some change from txg (transaction group) 37 ends up going out as part of txg 36. It normally wouldn''t matter, except if you lost power before txg 37 was committed to disk. On reboot you''d have inconsistent on-disk state (all of 36 plus random bits of 37). We developed coding practices and stress tests to catch future leaks, and as I know we''ve never actually shipped one. But they are scary. If you *do* have a future leak, it''s not uncommon for it to be a very small change -- perhaps incrementing a counter in some on-disk structure. The thing is, if the counter is going from even to odd, that''s exactly a one-bit change. The single-bit error correction logic would happily detect these and fix them up -- not at all what you want when testing! (Of course, we could turn it off during testing -- but then we wouldn''t be testing it.) All that said, I''m still occasionally tempted to bring it back. It may become more relevant with flash memory as a storage medium. Jeff On Sun, Mar 02, 2008 at 05:28:48PM -0600, Bob Friesenhahn wrote:> On Mon, 3 Mar 2008, Nathan Kroenert wrote: > > Speaking of expensive, but interesting things we could do - > > > > From the little I know of ZFS''s checksum, it''s NOT like the ECC > > checksum we use in memory in that it''s not something we can use to > > determine which bit flipped in the event that there was a single bit > > flip in the data. (I could be completely wrong here... but...) > > It seems that the emphasis on single-bit errors may be misplaced. Is > there evidence which suggests that single-bit errors are much more > common than multiple bit errors? > > > What is the chance we could put a little more resilience into ZFS such > > that if we do get a checksum error, we systematically flip each bit in > > sequence and check the checksum to see if we could in fact proceed > > (including writing the data back correctly.). > > It is easier to retry the disk read another 100 times or store the > data in multiple places. > > > Or build into the checksum something analogous to ECC so we can choose > > to use NON-ZFS protected disks and paths, but still have single bit flip > > protection... > > Disk drives commonly use an algorithm like Reed Solomon > (http://en.wikipedia.org/wiki/Reed-Solomon_error_correction) which > provides forward-error correction. This is done in hardware. Doing > the same in software is likely to be very slow. > > > What do others on the list think? Do we have enough folks using ZFS on > > HDS / EMC / other hardware RAID(X) environments that might find this useful? > > It seems that since ZFS is intended to support extremely large storage > pools, available energy should be spent ensuring that the storage pool > remains healthy or can be repaired. Loss of individual file blocks is > annoying, but loss of entire storage pools is devastating. > > Since raw disk is cheap (and backups are expensive), it makes sense to > write more redundant data rather than to minimize loss through exotic > algorithms. Even if RAID is not used, redundant copies may be used on > the same disk to help protect against block read errors. > > Bob > =====================================> Bob Friesenhahn > bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ > GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Darren J Moffat
2008-Mar-03 10:37 UTC
[zfs-discuss] Dealing with Single Bit Flips - WAS: Cause for data corruption?
Jeff Bonwick wrote:> All that said, I''m still occasionally tempted to bring it back. > It may become more relevant with flash memory as a storage medium.Would it be worth considering bring it back as part of zdb rather than part of the core zio layer ? -- Darren J Moffat
me
2008-Mar-03 11:03 UTC
[zfs-discuss] Dealing with Single Bit Flips - WAS: Causefor data corruption?
> All that said, I''m still occasionally tempted to bring it back. > It may become more relevant with flash memory as a storage medium.How common would be single on-disk bit flips in 128K blocks? Disk manufacturers quantized it as a 1 to 10 to the power of god knows what, which practically means every few years or so. If this is just optimistic marketing crap, wouldn''t it be viable to have a bit flip checker as option to the scrub mode (with tons of warnings, yes/no confirmation and recommendation to do this in single user mode)? I''m sure people using no redundancy (e.g. future OSX users) would appreciate it, saving some grief if the bad blocks are indeed just single bit flips. -mg
Bob Friesenhahn
2008-Mar-03 16:10 UTC
[zfs-discuss] Dealing with Single Bit Flips - WAS: Causefor data corruption?
On Mon, 3 Mar 2008, me wrote:> I''m sure people using no redundancy (e.g. future OSX users) would > appreciate it, saving some grief if the bad blocks are indeed just > single bit flips.In case people have somehow forgotten, most other filesystems in common use do not checksum data blocks. In spite of this, we rarely hear users wailing about single bit flips in their files. Instead we usually hear about people who find whole chunks of their file missing or overwritten, or find that the hard disk does not spin up at all any more. As we move toward solid state storage, the typical error cases will surely differ. Since ZFS is smart and is able to perform tasks in the background, one possibility to consider is to use otherwise unused storage space to store "weak" ditto copies or even forward error correction data. However, rather than explicitly writing these blocks during normal I/O, they could be created by a background task, and reused for other purposes when required. In this way, otherwise unused disk blocks would be taken advantage of in a similar way that otherwise unused memory is used to cache filesystem data. If the filesystem becomes very full, then there would be less protection but if the filesystem has plenty of free space then there would be lots of protection. Bob =====================================Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Richard Elling
2008-Mar-03 16:19 UTC
[zfs-discuss] Dealing with Single Bit Flips - WAS: Cause for data corruption?
Darren J Moffat wrote:> Jeff Bonwick wrote: > >> All that said, I''m still occasionally tempted to bring it back. >> It may become more relevant with flash memory as a storage medium. >> > > Would it be worth considering bring it back as part of zdb rather than > part of the core zio layer ? > >I''m not convinced that single bit flips are the common failure mode for disks. Most enterprise class disks already have enough ECC to correct at least 8 bytes per block. By the time the disk sends something back that it couldn''t correct, there is no telling how many bits have been flipped, but I''ll bet a steak dinner it is more than one. There may be some benefit for path failures, but I''ve not seen any measured data on those failure modes. For paths which have framing checksums, we would expect them to be detected there. -- richard
Richard Elling
2008-Mar-03 16:27 UTC
[zfs-discuss] Dealing with Single Bit Flips - WAS: Causefor data corruption?
me wrote:>> All that said, I''m still occasionally tempted to bring it back. >> It may become more relevant with flash memory as a storage medium. >> > > How common would be single on-disk bit flips in 128K blocks? Disk > manufacturers quantized it as a 1 to 10 to the power of god knows what, > which practically means every few years or so. If this is just optimistic > marketing crap, wouldn''t it be viable to have a bit flip checker as option > to the scrub mode (with tons of warnings, yes/no confirmation and > recommendation to do this in single user mode)? I''m sure people using no > redundancy (e.g. future OSX users) would appreciate it, saving some grief > if the bad blocks are indeed just single bit flips. >Most enterprise class disks are rated at 1 uncorrectable read error for 10^15 bits(!) read. For a 1 TByte disk, that means you can expect an uncorrectable read error about once for every 175 times you read the entire disk. Contrast this to consumer class disks which are UER 1 in 10^14, or 17 times for a 1 TByte disk. I posted some of our measured field data a while back, http://blogs.sun.com/relling/entry/zfs_copies_and_data_protection -- richard
Darren J Moffat
2008-Mar-03 16:35 UTC
[zfs-discuss] Dealing with Single Bit Flips - WAS: Cause for data corruption?
Richard Elling wrote:> Darren J Moffat wrote: >> Jeff Bonwick wrote: >> >>> All that said, I''m still occasionally tempted to bring it back. >>> It may become more relevant with flash memory as a storage medium. >>> >> Would it be worth considering bring it back as part of zdb rather than >> part of the core zio layer ? >> >> > > I''m not convinced that single bit flips are the common > failure mode for disks. Most enterprise class disks already > have enough ECC to correct at least 8 bytes per block.and for consumer rather than enterprise class disks ? Which after all are the people most likely to be hit hardest because: a) their disk is cheaper quality b) less likely to have a redundant pool config eg on a laptop which can physically only have one disk c) less likely to have an off pool backup d) can''t recover easily if the filesystem doesn''t help them and are used to filesystems that give them their data even if it is corrupt. For example a few bit flips in an MP3 or MPEG4 file probably don''t matter too much to many people in a consumer system and they would rather have that then have ZFS tell them they can''t have the pool or some files in it. -- Darren J Moffat
> I''m not convinced that single bit flips are the common failure mode for disks.I think the original suggestion might be for bad RAM more than bad disks. Just about every home computer does not have ECC RAM, so as ZFS transitions from enterprise to home, this (optional) feature sounds very worthwhile. I''ve experienced some bad RAM in my days, and I''ve only noticed when applications started acting weird and crashing. When I''ve done memtest86+ on such sticks of RAM I''ve found that very few errors (maybe 2-8) are usually reported. Not sure if those errors are bad bits or something more granular. The original suggestion sounds like a useful one for the body of users outside of Sun''s usual ECC RAM-using client?le. This message posted from opensolaris.org
Gary Mills
2008-Mar-03 16:59 UTC
[zfs-discuss] Dealing with Single Bit Flips - WAS: Causefor data corruption?
On Mon, Mar 03, 2008 at 08:27:08AM -0800, Richard Elling wrote:> me wrote: > >> All that said, I''m still occasionally tempted to bring it back. > >> It may become more relevant with flash memory as a storage medium. > > > > How common would be single on-disk bit flips in 128K blocks? > > Most enterprise class disks are rated at 1 uncorrectable read error for > 10^15 > bits(!) read. For a 1 TByte disk, that means you can expect an > uncorrectable > read error about once for every 175 times you read the entire disk. > Contrast > this to consumer class disks which are UER 1 in 10^14, or 17 times for a > 1 TByte disk.I take it that that would mean that the block would be unreadable, rather than readable with incorrect data. That would be based on the CRC included with each disk block. So, the granularity is really at the block level. You probably can''t even read a bad block from a disk. -- -Gary Mills- -Unix Support- -U of M Academic Computing and Networking-
Bob Friesenhahn
2008-Mar-03 17:47 UTC
[zfs-discuss] Dealing with Single Bit Flips - WAS: Cause for data corruption?
On Mon, 3 Mar 2008, Darren J Moffat wrote:>> I''m not convinced that single bit flips are the common >> failure mode for disks. Most enterprise class disks already >> have enough ECC to correct at least 8 bytes per block. > > and for consumer rather than enterprise class disks ?You are assuming that the ECC used for "consumer" disks is substantially different than that used for "enterprise" disks. That is likely not the case since ECC is provided by a chip which costs a few dollars. The only reason to use a lesser grade algorithm would be to save a small bit of storage space. Consumer disks use essentially the same media as enterprise disks. Consumer disks store a higher bit density on similar media. Consumer disks have less precise/consistent head controllers than enterprise disks. Consumer disks are less well-specified than enterprise disks. Due to the higher bit density we can expect more wrong bits to be read since we are pushing the media harder. Due to less consistent head controllers we can expect more incidences of reading or writing the wrong track or writing something which can''t be read. Consumer disks are often used in an environment where they may be physically disturbed while they are writing or reading the data. Enterprise disks are usually used in very stable environments. The upshot of this is that we can expect more unrecoverable errors, but it seems unlikely that there will be more "single bit" errors recoverable at the ZFS level. Bob =====================================Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Richard Elling
2008-Mar-03 18:50 UTC
[zfs-discuss] Dealing with Single Bit Flips - WAS: Cause for data corruption?
Bob Friesenhahn wrote:> On Mon, 3 Mar 2008, Darren J Moffat wrote: > > >>> I''m not convinced that single bit flips are the common >>> failure mode for disks. Most enterprise class disks already >>> have enough ECC to correct at least 8 bytes per block. >>> >> and for consumer rather than enterprise class disks ? >> > > You are assuming that the ECC used for "consumer" disks is > substantially different than that used for "enterprise" disks. That > is likely not the case since ECC is provided by a chip which costs a > few dollars. The only reason to use a lesser grade algorithm would be > to save a small bit of storage space. > > Consumer disks use essentially the same media as enterprise disks. > > Consumer disks store a higher bit density on similar media. > > Consumer disks have less precise/consistent head controllers than > enterprise disks. > > Consumer disks are less well-specified than enterprise disks. > > Due to the higher bit density we can expect more wrong bits to be read > since we are pushing the media harder. Due to less consistent head > controllers we can expect more incidences of reading or writing the > wrong track or writing something which can''t be read. Consumer disks > are often used in an environment where they may be physically > disturbed while they are writing or reading the data. Enterprise > disks are usually used in very stable environments. > > The upshot of this is that we can expect more unrecoverable errors, > but it seems unlikely that there will be more "single bit" errors > recoverable at the ZFS level. >I agree, and am waiting to get the proceedings from FAST08 which has some interesting papers in the list. A while back I blogged about an Adaptec online seminar which addressed this topic. Rather than repeating what they said, I left a pointer and a recommendation. http://blogs.sun.com/relling/entry/adaptec_webinar_on_disks_and Also, note that the published reliability data from disk vendors is constantly changing. For laptop drives, we''re seeing less MTBF or UER and more head landings specs. It seems that an important failure mode for laptop disks is wear out at the landing site. This is due to power management powering or spinning down the disk. We don''t tend to see this failure mode in servers or RAID arrays. -- richard
Nathan Kroenert
2008-Mar-03 23:01 UTC
[zfs-discuss] Dealing with Single Bit Flips - WAS: Cause for data corruption?
Hey, Bob, Though I have already got the answer I was looking for here, I thought I''d at least take the time to provide my point of view as to my *why*... First: I don''t think any of us have forgotten the goodness that ZFS''s checksum *can* bring. I''m also keenly aware that we have some customers running HDS / EMC boxes who disable the ZFS checksum by default because they ''don''t want to have files break due to a single bit flip...'' and they really don''t care where the flip happens, and they don''t want to ''waste'' disks or bandwidth allowing ZFS to do it''s own protection when they already pay for it inside their zillion dollar disk box. (Some say waste, some call it insurance... ;). Oracle users in particular seem to have this mindset, though that''s another thread entirely. :) I''d suspect we don''t hear people whining about single bit flips, because they would not know if it''s happening unless the app sitting on top had it''s own protection. Or - if the error is obvious, or crashes their system... Or if they were running ZFS, but at this stage, we cannot delineate between single bit or massively crapped out errors, so what''s to say we are NOT seeing it? Also - Don''t assume bit rot on disk is the only way we can get single bit errors. Considering that until very recently (and quite likely even now to a reasonable extent), most CPU''s did not have data protection in *every* place data transited through, single bit flips are still a very real possibility, and becoming more likely as process shrinks continue. Granted, on CPU''s with Register Parity protection, undetected doubles are more likely to ''slip under the radar'', as registers are typically protected with parity at best, if at all... A single bit in the parity protected register will be detected, a double won''t. It does seem that some of us are getting a little caught up in disks and their magnificence in what they write to the platter and read back, and overlooking the potential value of a simple (though potentially computationally expensive) circus trick, which might, just might, make your broken 1TB archive useful again... I don''t think it''s a good idea for us to assume that it''s OK to ''leave out'' potential goodness for the masses that want to use ZFS in non-enterprise environments like laptops / home PC''s, or use commodity components in conjunction with the Big Stuff... (Like white box PC''s connected to an EMC or HDS box... ) Anyhoo - I''m glad we have pretty much already done this work once before. It gives me hope that we''ll see it make a comeback. ;) (And I look forward to Jeff & Co developing a hyper cool way of generating 128000000 checksums using all 64 threads of a Niagara 2, using the same source data in cache, so we don''t need to hit memory, so that it happens in the blink of an eye. or two. ok - maybe three... ;) Maybe we could also use the SPU''s as well... OK - So, I''m possibly dreaming here, but hell, if I''m dreaming, why not dream big. :) Nathan. Bob Friesenhahn wrote:> On Mon, 3 Mar 2008, me wrote: > >> I''m sure people using no redundancy (e.g. future OSX users) would >> appreciate it, saving some grief if the bad blocks are indeed just >> single bit flips. > > In case people have somehow forgotten, most other filesystems in > common use do not checksum data blocks. In spite of this, we rarely > hear users wailing about single bit flips in their files. Instead we > usually hear about people who find whole chunks of their file missing > or overwritten, or find that the hard disk does not spin up at all any > more. As we move toward solid state storage, the typical error cases > will surely differ. > > Since ZFS is smart and is able to perform tasks in the background, one > possibility to consider is to use otherwise unused storage space to > store "weak" ditto copies or even forward error correction data. > However, rather than explicitly writing these blocks during normal > I/O, they could be created by a background task, and reused for other > purposes when required. In this way, otherwise unused disk blocks > would be taken advantage of in a similar way that otherwise unused > memory is used to cache filesystem data. If the filesystem becomes > very full, then there would be less protection but if the filesystem > has plenty of free space then there would be lots of protection. > > Bob > =====================================> Bob Friesenhahn > bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ > GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Bob Friesenhahn
2008-Mar-04 00:05 UTC
[zfs-discuss] Dealing with Single Bit Flips - WAS: Cause for data corruption?
On Tue, 4 Mar 2008, Nathan Kroenert wrote:> > It does seem that some of us are getting a little caught up in disks and > their magnificence in what they write to the platter and read back, and > overlooking the potential value of a simple (though potentially > computationally expensive) circus trick, which might, just might, make your > broken 1TB archive useful again...The circus trick can be handled via a user-contributed utility. In fact, people can compete with their various repair utilities. There are only 1048576 1-bit permuations to try, and then the various two-bit permutations can be tried.> I don''t think it''s a good idea for us to assume that it''s OK to ''leave out'' > potential goodness for the masses that want to use ZFS in non-enterprise > environments like laptops / home PC''s, or use commodity components in > conjunction with the Big Stuff... (Like white box PC''s connected to an EMC or > HDS box... )It seems that "goodness for the masses" has not been left out. The forthcoming ability to request duplicate ZFS blocks is very good news indeed. We are entering an age where the entry level SATA disk is 1TB and users have more space than they know what to do with. A little replication gives these users something useful to do with their new disk while avoiding the need for unreliable "circus tricks" to recover data. ZFS goes far beyond MS-DOS''s "recover" command (which should have been called "destroy"). Bob =====================================Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Nathan Kroenert
2008-Mar-04 00:25 UTC
[zfs-discuss] Dealing with Single Bit Flips - WAS: Cause for data corruption?
Bob Friesenhahn wrote:> On Tue, 4 Mar 2008, Nathan Kroenert wrote: >> >> It does seem that some of us are getting a little caught up in disks >> and their magnificence in what they write to the platter and read >> back, and overlooking the potential value of a simple (though >> potentially computationally expensive) circus trick, which might, just >> might, make your broken 1TB archive useful again... > > The circus trick can be handled via a user-contributed utility. In > fact, people can compete with their various repair utilities. There are > only 1048576 1-bit permuations to try, and then the various two-bit > permutations can be tried.That does not sound ''easy'', and I consider that ZFS should be... :) and IMO it''s something that should really be built in, not attacked with an addon. I had (as did Jeff in his initial response) considered that we only need to actually try to flip 128KB worth of bits once... That many flips means that we in a way ''processing'' some 128GB in the worse case when re-generating checksums. Internal to a CPU, depending on Cache Aliasing, competing workloads, threadedness, etc, this could be dramatically variable... something I guess the ZFS team would want to keep out of the ''standard'' filesystem operation... hm. :\>> I don''t think it''s a good idea for us to assume that it''s OK to ''leave >> out'' potential goodness for the masses that want to use ZFS in >> non-enterprise environments like laptops / home PC''s, or use commodity >> components in conjunction with the Big Stuff... (Like white box PC''s >> connected to an EMC or HDS box... ) > > It seems that "goodness for the masses" has not been left out. The > forthcoming ability to request duplicate ZFS blocks is very good news > indeed. We are entering an age where the entry level SATA disk is 1TB > and users have more space than they know what to do with. A little > replication gives these users something useful to do with their new disk > while avoiding the need for unreliable "circus tricks" to recover data. > ZFS goes far beyond MS-DOS''s "recover" command (which should have been > called "destroy").I never have enough space on my laptop... I guess I''m a freak. But - I am sure that we are *both* right for some subsets of ZFS users, and that the more choice we have built into the filesystem, the better. Thanks again for the comments! Nathan.
Bob Friesenhahn
2008-Mar-04 00:42 UTC
[zfs-discuss] Dealing with Single Bit Flips - WAS: Cause for data corruption?
On Tue, 4 Mar 2008, Nathan Kroenert wrote:>> The circus trick can be handled via a user-contributed utility. In fact, >> people can compete with their various repair utilities. There are only >> 1048576 1-bit permuations to try, and then the various two-bit permutations >> can be tried. > > That does not sound ''easy'', and I consider that ZFS should be... :) and IMO > it''s something that should really be built in, not attacked with an addon.There are several reasons why this sort of thing should not be in ZFS itself. A big reason is that if it is in ZFS itself, it can only be updated via an OS patch or upgrade, along with a required reboot. If it is in a utility, it can be downloaded and used as the user sees fit without any additional disruption to the system. While some errors are random, others follow well defined patterns, so it may be that one utility is better than another or that user-provided options can help achieve success faster. Bob =====================================Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Boyd Adamson
2008-Mar-04 01:38 UTC
[zfs-discuss] Dealing with Single Bit Flips - WAS: Cause for data corruption?
Nathan Kroenert <Nathan.Kroenert at Sun.COM> writes:> Bob Friesenhahn wrote: >> On Tue, 4 Mar 2008, Nathan Kroenert wrote: >>> >>> It does seem that some of us are getting a little caught up in disks >>> and their magnificence in what they write to the platter and read >>> back, and overlooking the potential value of a simple (though >>> potentially computationally expensive) circus trick, which might, just >>> might, make your broken 1TB archive useful again... >> >> The circus trick can be handled via a user-contributed utility. In >> fact, people can compete with their various repair utilities. There are >> only 1048576 1-bit permuations to try, and then the various two-bit >> permutations can be tried. > > That does not sound ''easy'', and I consider that ZFS should be... :) and > IMO it''s something that should really be built in, not attacked with an > addon. > > I had (as did Jeff in his initial response) considered that we only need > to actually try to flip 128KB worth of bits once... That many flips > means that we in a way ''processing'' some 128GB in the worse case when > re-generating checksums. Internal to a CPU, depending on Cache > Aliasing, competing workloads, threadedness, etc, this could be > dramatically variable... something I guess the ZFS team would want to > keep out of the ''standard'' filesystem operation... hm. :\Maybe an option to scrub... something that says "work on bitflips for bad blocks", or "work on bitflips for bad blocks in this file" Boyd
Nathan Kroenert
2008-Mar-04 04:05 UTC
[zfs-discuss] Dealing with Single Bit Flips - WAS: Cause for data corruption?
Hey, Bob My perspective on Big reasons for it *to* be integrated would be: - It''s tested - By the folks charged with making ZFS good - It''s kept in sync with the differing Zpool versions - It''s documented - When the system *is* patched, any changes the patch brings are synced with the recovery mechanism - Being integrated, it has options that can be persistently set if required - It''s there when you actually need it - It could be integrated with Solaris FMA to take some funky actions based on the nature of the failure, including cool messages telling you what you need to run to attempt a repair etc - It''s integrated (recursive, self fulfilling benefit... ;) As for the separate utility for different failure modes, I agree, *development* of these might be faster if everyone chases their own pet failure mode and contributes it, but I still think getting them integrated either as optional actions on error, or as part of zdb or other would be far better than having to go looking for the utility and ''give it a whirl''. But - I''m sure that''s a personal preference, and I''m sure that there are those that would love the opportunity to roll their own. OK - I''m going to shutup now. I think I have done this to death, and I don''t want to end up in everyone''s kill filter. Cheers! Nathan. Bob Friesenhahn wrote:> On Tue, 4 Mar 2008, Nathan Kroenert wrote: >>> The circus trick can be handled via a user-contributed utility. In fact, >>> people can compete with their various repair utilities. There are only >>> 1048576 1-bit permuations to try, and then the various two-bit permutations >>> can be tried. >> That does not sound ''easy'', and I consider that ZFS should be... :) and IMO >> it''s something that should really be built in, not attacked with an addon. > > There are several reasons why this sort of thing should not be in ZFS > itself. A big reason is that if it is in ZFS itself, it can only be > updated via an OS patch or upgrade, along with a required reboot. If > it is in a utility, it can be downloaded and used as the user sees fit > without any additional disruption to the system. While some errors > are random, others follow well defined patterns, so it may be that one > utility is better than another or that user-provided options can help > achieve success faster. > > Bob > =====================================> Bob Friesenhahn > bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ > GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Mario Goebbels (Webmail)
2008-Mar-04 10:56 UTC
[zfs-discuss] Dealing with Single Bit Flips - WAS: Cause fordata corruption?
> Maybe an option to scrub... something that says "work on bitflips for > bad blocks", or "work on bitflips for bad blocks in this file"I''ve suggested this, too. But in retrospect, there''s no way to detect whether a bad block is indeed due a bitflip or not. So each checksum error, ZFS might just spent several hours on. You would work some idiot detection into ZFS by having it sum the values of each byte in the filesystem block and store it in a 32bit value, and when scrubbing with bitflip correction, see if the difference in sum didn''t deviate by more than -/+ 127. BUt that''d incur more computing power required on writes and actually the availability of a 32bit field in the metadata block referring to the FS block. -mg
Richard Elling
2008-Mar-04 18:30 UTC
[zfs-discuss] Dealing with Single Bit Flips - WAS: Cause for data corruption?
[slightly different angle below...] Nathan Kroenert wrote:> Hey, Bob, > > Though I have already got the answer I was looking for here, I thought > I''d at least take the time to provide my point of view as to my *why*... > > First: I don''t think any of us have forgotten the goodness that ZFS''s > checksum *can* bring. > > I''m also keenly aware that we have some customers running HDS / EMC > boxes who disable the ZFS checksum by default because they ''don''t want > to have files break due to a single bit flip...'' and they really don''t > care where the flip happens, and they don''t want to ''waste'' disks or > bandwidth allowing ZFS to do it''s own protection when they already pay > for it inside their zillion dollar disk box. (Some say waste, some call > it insurance... ;). Oracle users in particular seem to have this > mindset, though that''s another thread entirely. :) >If you look at the zfs-discuss archives, you will find anecdotes of failing raid arrays (yes, even expensive ones) and SAN switches causing corruption which was detected by ZFS. A telltale sign of borken hardware is someone complaining that ZFS checksums are borken, only to find out their hardware is at fault. As for Oracle, modern releases of the Oracle database also have checksumming enabled by default, so there is some merit to the argument that ZFS checksums are redundant. IMNSHO, ZFS is not being designed to replace ASM.> I''d suspect we don''t hear people whining about single bit flips, because > they would not know if it''s happening unless the app sitting on top had > it''s own protection. Or - if the error is obvious, or crashes their > system... Or if they were running ZFS, but at this stage, we cannot > delineate between single bit or massively crapped out errors, so what''s > to say we are NOT seeing it? > > Also - Don''t assume bit rot on disk is the only way we can get single > bit errors. > > Considering that until very recently (and quite likely even now to a > reasonable extent), most CPU''s did not have data protection in *every* > place data transited through, single bit flips are still a very real > possibility, and becoming more likely as process shrinks continue. > Granted, on CPU''s with Register Parity protection, undetected doubles > are more likely to ''slip under the radar'', as registers are typically > protected with parity at best, if at all... A single bit in the parity > protected register will be detected, a double won''t. >It depends on the processor. Most of the modern SPARC processors have extensive error detection and correction inside. But processors are still different than memories in that the time a datum resides in a single location is quite short. We worry more about random data losses when the datum is stored in one place for a long time, which is why you see different sorts of data protection at the different layers of a system design. To put this in more mathematical terms, there is a failure rate for each failure mode, but your exposure to the failure mode is time bounded.> It does seem that some of us are getting a little caught up in disks and > their magnificence in what they write to the platter and read back, and > overlooking the potential value of a simple (though potentially > computationally expensive) circus trick, which might, just might, make > your broken 1TB archive useful again... > > I don''t think it''s a good idea for us to assume that it''s OK to ''leave > out'' potential goodness for the masses that want to use ZFS in > non-enterprise environments like laptops / home PC''s, or use commodity > components in conjunction with the Big Stuff... (Like white box PC''s > connected to an EMC or HDS box... ) > > Anyhoo - I''m glad we have pretty much already done this work once > before. It gives me hope that we''ll see it make a comeback. ;) > > (And I look forward to Jeff & Co developing a hyper cool way of > generating 128000000 checksums using all 64 threads of a Niagara 2, > using the same source data in cache, so we don''t need to hit memory, so > that it happens in the blink of an eye. or two. ok - maybe three... ;) > Maybe we could also use the SPU''s as well... OK - So, I''m possibly > dreaming here, but hell, if I''m dreaming, why not dream big. :) >I sense that the requested behaviour here is to be able to get to the corrupted contents of a file, even if we know it is corrupted. I think this is a good idea because: 1. The block is what is corrupted, not necessarily my file. A single block may contain several files which are grouped together, checksummed, and written to disk. 2. The current behaviour of returning EIO when read()ing a file up to the (possible) corruption point is rather irritating, but probably the right thing to do. Since we know the files affected, we could write a savior, providing we can get some reasonable response other than EIO. As Jeff points out, I''m not sure that automatic repair is the right answer, but a manual savior might work better than restore from backup. Note: some apps can handle partially missing files. Others do things like zip everything together (eg. StarOffice), which makes manual recover difficult. Also note: the checksums don''t have enough information to recreate the data for very many bit changes. Hashes might, but I don''t know anyone using sha256. now, where was that intern hiding? ... :-) -- richard
Bob Friesenhahn
2008-Mar-04 19:00 UTC
[zfs-discuss] Dealing with Single Bit Flips - WAS: Cause for data corruption?
On Tue, 4 Mar 2008, Richard Elling wrote:> > Also note: the checksums don''t have enough information to > recreate the data for very many bit changes. Hashes might, > but I don''t know anyone using sha256.It is indeed important to recognize that the checksums are a way to detect that the data is incorrect rather than a way to tell that the data is correct. There may be several permutations of wrong data which can result in the same checksum, but the probability of encountering those permutations due to natural causes is quite small. Bob =====================================Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Mario Goebbels (Webmail)
2008-Mar-04 21:58 UTC
[zfs-discuss] Dealing with Single Bit Flips - WAS: Cause fordata corruption?
> Also note: the checksums don''t have enough information to > recreate the data for very many bit changes. Hashes might, > but I don''t know anyone using sha256.My ~/Documents uses sha256 checksums, but then again, it also uses copies=2 :) -mg