Hello all, I have a new "crazy idea" of the day ;) Some years ago there was an idea proposed in one of ZFS developers'' blogs (maybe Jeff''s? sorry, can''t find and link it now) that went somewhat along these lines: Modern disks have some ECC/CRC codes for each sector, and uses them to test read-in data. If the disk fails to produce a sector correctly, it tries harder to read it and reallocates the LBA from a spare-sector region, if possible. This leads to some more random IO for linearly-numbered LBA sectors, as well as waste of disk space for spare sectors and checksums - at least in comparison to better error-detection and redundancy of ZFS checksums. Besides, attempts to re-read a faulty sector may succeed or they may produce undeteced garbage, and take some time (maybe seconds) if the retries fail consistently. Then the block is marked bad and data is lost. The article went on to suggest "let''s get an OEM vendor to give us same disks without their kludges, and we''ll get (20%?) more platter-speed and volume, better used by ZFS error-detection and repair mechanisms". I''ve recently had a sort of an opposite thought: yes, ZFS redundancy is good - but also expensive in terms of raw disk space. This is especially bad for hardware space-constrained systems like laptops and home-NASes, where doubling the number of HDDs (for mirrors) or adding tens of percent of storage for raidZ is often not practical for whatever reason. Current ZFS checksums allow us to detect errors, but in order for recovery to actually work, there should be a redundant copy and/or parity block available and valid. Hence the question: why not put ECC info into ZFS blocks? IMHO, pluggable ECC (like pluggable compression or varied checksums - in this case ECC algorithms allowing for recovery of 1 or 2 bits, for example) would be cheaper on disk space than redundancy (few % instead of 25-50% of disk space), and still allow for recovery of certain errors, such as on-disk or on-wire bit rot, even in single-disk ZFS pools. This could be an inheritable per-dataset attribute like compression, encryption, dedup or checksum algorithms. Replacement of recovered "faulted" blocks into currently free space is already part of ZFS, except that now it might have to track the notion of "permanently-bad block lists" and decreasing space available for addressing on each leaf VDEV. There should also be a mechanism to retest and clear such blocks, i.e. when a faulty drive or LUN is replaced by a new one (perhaps with DD''ing of an old hardware drive to a new one, and replacement, while the pool is offline) - probably as a special scrub-like command to zpool, also invoked during scrub. This may be combined with the wish for OEM disks that lack hardware ECC/spare sectors in return for more performance; although I''m not sure how good that would be in practice - the hardware creator''s in-depth knowledge of how to retry reading initially "faulty" blocks, i.e. by changing voltage or platter speeds or whatever, may be invaluable and not replaceable by software. What do you think? Doable? Useful? Why not, if not? ;) Thanks, //Jim Klimov
On Wed, Jan 11, 2012 at 9:16 AM, Jim Klimov <jimklimov at cos.ru> wrote:> I''ve recently had a sort of an opposite thought: yes, > ZFS redundancy is good - but also expensive in terms > of raw disk space. This is especially bad for hardware > space-constrained systems like laptops and home-NASes, > where doubling the number of HDDs (for mirrors) or > adding tens of percent of storage for raidZ is often > not practical for whatever reason.Redundancy through RAID-Z and mirroring is expensive for home systems and laptops, but mostly due to the cost of SATA/SAS ports, not the cost of the drives. The drives are cheap, but getting an extra disk in a laptop is either impossible or expensive. But that doesn''t mean you can''t mirror slices or use ditto blocks. For laptops just use ditto blocks and either zfs send or external mirror that you attach/detach.> Current ZFS checksums allow us to detect errors, but > in order for recovery to actually work, there should be > a redundant copy and/or parity block available and valid. > > Hence the question: why not put ECC info into ZFS blocks?RAID-Zn *is* an error correction system. But what you are asking for is a same-device error correction method that costs less than ditto blocks, with error correction data baked into the blkptr_t. Are there enough free bits left in the block pointer for error correction codes for large blocks? (128KB blocks, but eventually ZFS needs to support even larger blocks, so keep that in mind.) My guess is: no. Error correction data might have to get stored elsewhere. I don''t find this terribly attractive, but maybe I''m just not looking at it the right way. Perhaps there is a killer enterprise feature for ECC here: stretching MTTDL in the face of a device failure in a mirror or raid-z configuration (but if failures are typically of whole drives rather than individual blocks, then this wouldn''t help). But without a good answer for where to store the ECC for the largest blocks, I don''t see this happening. Nico --
2012-01-11 20:40, Nico Williams ?????:> On Wed, Jan 11, 2012 at 9:16 AM, Jim Klimov<jimklimov at cos.ru> wrote: >> I''ve recently had a sort of an opposite thought: yes, >> ZFS redundancy is good - but also expensive in terms >> of raw disk space. This is especially bad for hardware >> space-constrained systems like laptops and home-NASes, >> where doubling the number of HDDs (for mirrors) or >> adding tens of percent of storage for raidZ is often >> not practical for whatever reason. > > Redundancy through RAID-Z and mirroring is expensive for home systems > and laptops, but mostly due to the cost of SATA/SAS ports, not the > cost of the drives. The drives are cheap, but getting an extra disk > in a laptop is either impossible or expensive. But that doesn''t mean > you can''t mirror slices or use ditto blocks. For laptops just use > ditto blocks and either zfs send or external mirror that you > attach/detach.Yes, basically that''s what we do now, and it halves the available disk space and increases latency (extra seeks) ;) I get (and share) your concern about ECC entry size for larger blocks. NOTE: I don''t know the ECC algorithms deeply enough to speculate about space requirements, except that as they are used in networking/RAM, an ECC correction code for 4-8 bits of userdata is 1-2 bits long. I''m reading the "ZFS On-disk Format" PDF (dated 2006 - are there newer releases?), and on page 15 the blkptr_t structure has 192 bits of padding before TXG. Can''t that be used for a reasonably large ECC code? Besides, I see that blkptr_t is 128 bytes in size. This leaves us with some slack space in a physical sector, which can be "abused" without extra costs - (512-128) or (4096-128) bytes worth of {ECC} data. Perhaps the padding space (near TXG entry) could be used to specify that the blkptr_t bytes are immediately followed by ECC bytes (and their size, probably dependent on data block length), so that larger on-disk block pointer blocks could be used on legacy systems as well (using several contiguous 512 byte sectors). After successful reads from disk, this ECC data can be discarded to save space in ARC/L2ARC allocation (especially if every byte of memory is ECC protected anyway). Even if the ideas/storage above is not practical, perhaps ECC codes can be used for smaller blocks (i.e. {indirect} block pointer contents and metadata might be "guaranteed" to be small enough). If nothing else, this could save mechanical seek times if a CKSUM error is detected as is normal for ZFS reads, but a built-in/referring block''s ECC code infromation is enough to repair this block. In this case we don''t need to re-request data from another disk... and we have some more error-resiliency beside ditto blocks (already enforced for metadata) or raidz/mirrors. While it is (barely) possible that all ditto replicas are broken, there''s a non-zero chance that at least one is recoverable :)> >> Current ZFS checksums allow us to detect errors, but >> in order for recovery to actually work, there should be >> a redundant copy and/or parity block available and valid. >> >> Hence the question: why not put ECC info into ZFS blocks? > > RAID-Zn *is* an error correction system. But what you are asking for > is a same-device error correction method that costs less than ditto > blocks, with error correction data baked into the blkptr_t. Are there > enough free bits left in the block pointer for error correction codes > for large blocks? (128KB blocks, but eventually ZFS needs to support > even larger blocks, so keep that in mind.) My guess is: no. Error > correction data might have to get stored elsewhere. > > I don''t find this terribly attractive, but maybe I''m just not looking > at it the right way. Perhaps there is a killer enterprise feature for > ECC here: stretching MTTDL in the face of a device failure in a mirror > or raid-z configuration (but if failures are typically of whole drives > rather than individual blocks, then this wouldn''t help). But without > a good answer for where to store the ECC for the largest blocks, I > don''t see this happening.Well, it is often mentioned that (by Murphy''s Law if nothing else) device failures in RAID often are not single-device failures. So traditional RAID5s tended to die while replacing a dead disk onto a spare and detecting an error on an existing unreplicated disk. Per-block ECC could be used in this case to recover from bit-rot errors on remaining alive disks when RAID-Zn or mirror don''t help, decreasing the chance that tape backup is the only recovery option remaining... //Jim Klimov
On Wed, January 11, 2012 11:40, Nico Williams wrote:> I don''t find this terribly attractive, but maybe I''m just not looking > at it the right way. Perhaps there is a killer enterprise feature for > ECC here: stretching MTTDL in the face of a device failure in a mirror > or raid-z configuration (but if failures are typically of whole drives > rather than individual blocks, then this wouldn''t help). But without > a good answer for where to store the ECC for the largest blocks, I > don''t see this happening.Not so much for blocks, but talking more with sectors, there''s the T10 (SCSI) Data Integrity Field (DIF): http://www.usenix.org/event/lsf07/tech/petersen.pdf This is a controller-drive specification. For host-controller communication, the Data Integrity Extensions (DIX) have been define: http://oss.oracle.com/~mkp/docs/ols2008-petersen.pdf It''s a pity that the field is only eight bytes, as if it was larger, a useful cryptographic [HCUG]MAC could be saved there by disk encryption software. Perhaps with 4K-sector "Advanced Format" drives a similar field will be defined that''s larger.
I guess I have another practical rationale for a second checksum, be it ECC or not: my scrubbing pool found some "unrecoverable errors". Luckily, for those files I still have external originals, so I rsynced them over. Still, there is one file whose broken prehistory is referenced in snapshots, and properly fixing that would probably require me to resend the whole stack of snapshots. That''s uncool, but a subject for another thread. This thread is about checksums - namely, now, what are our options when they mismatch the data? As has been reported by many blog-posts researching ZDB, there do happen cases when checksums are broken (i.e. bitrot in block pointers, or rather in RAM while the checksum was calculated - so each ditto copy of BP has the error), but the file data is in fact intact (extracted from disk with ZDB or DD, and compared to other copies). For these cases bloggers asked (in vain) - why is it not allowed for an admin to confirm validity of end-user data and have the system reconstruct (re-checksum) the metadata for it?.. IMHO, that''s a valid RFE. While the system is scrubbing, I was reading up on theory. Found a nice text "Keeping Bits Safe: How Hard Can It Be?" by David Rosenthal [1], where I stumbled upon an interesting thought: The bits forming the digest are no different from the bits forming the data; neither is magically incorruptible. ...Applications need to know whether the digest has been changed. In our case, where original checksum in the blockpointer could be corrupted in (non-ECC) RAM of my home-NAS just before it was dittoed to disk, another checksum - copy of this same one, or a differently calculated one, could provide ZFS with the means to determine whether the data or one of the checksums got corrupted (or all of them). Of course, this is not an absolute protection method, but it can reduce the cases where pools have to be "destroyed, recreated and recovered from tape". It is my belief that using dedup contributed to my issue - there''s lots more of updating the block pointers and their checksums, so it gradually becomes more likely that the metadata (checksum) blocks gets broken (i.e. in non-ECC RAM), while the written-once userdata remains intact... -- [1] http://queue.acm.org/detail.cfm?id=1866298 While the text discusses what all ZFSers mostly know already - about bit-rot, MTTDL and such, it does so with great detail and many examples, and gave me a better understanding of it all even though I deal with this for several years now. A good read, I suggest it to others ;) //Jim Klimov
2012-01-13 2:34, Jim Klimov wrote:> I guess I have another practical rationale for a second > checksum, be it ECC or not: my scrubbing pool found some > "unrecoverable errors". > ...Applications need to know whether the digest has > been changed.As Richard reminded me in another thread, both metadata and DDT can contain checksums, hopefully of the same data block. So for deduped data we may already have a means to test whether the data or the checksum is incorrect... Incdentally, the problem also seems more critical for the deduped data ;) Just a thought... //Jim
On Fri, Jan 13, 2012 at 04:48:44AM +0400, Jim Klimov wrote:> As Richard reminded me in another thread, both metadata > and DDT can contain checksums, hopefully of the same data > block. So for deduped data we may already have a means > to test whether the data or the checksum is incorrect...It''s the same chksum, calculated once - this is why turning dedup=on implies setting checksum=sha256> Incdentally, the problem also seems more critical for > the deduped data ;)Yes. Add this to the list of reasons to use ECC, and add ''have ECC'' to the list of constraints to circumstances where using dedup is appropriate. -- Dan. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 194 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20120113/26df0a22/attachment.bin>
On Jan 12, 2012, at 2:34 PM, Jim Klimov wrote:> I guess I have another practical rationale for a second > checksum, be it ECC or not: my scrubbing pool found some > "unrecoverable errors". Luckily, for those files I still > have external originals, so I rsynced them over. Still, > there is one file whose broken prehistory is referenced > in snapshots, and properly fixing that would probably > require me to resend the whole stack of snapshots. > That''s uncool, but a subject for another thread. > > This thread is about checksums - namely, now, what are > our options when they mismatch the data? As has been > reported by many blog-posts researching ZDB, there do > happen cases when checksums are broken (i.e. bitrot in > block pointers, or rather in RAM while the checksum was > calculated - so each ditto copy of BP has the error), > but the file data is in fact intact (extracted from > disk with ZDB or DD, and compared to other copies).Metadata is at least doubly redundant and checksummed. Can you provide links to posts that describe this failure mode?> For these cases bloggers asked (in vain) - why is it > not allowed for an admin to confirm validity of end-user > data and have the system reconstruct (re-checksum) the > metadata for it?.. IMHO, that''s a valid RFE.Metadata is COW, too. Rewriting the data also rewrites the metadata.> While the system is scrubbing, I was reading up on theory. > Found a nice text "Keeping Bits Safe: How Hard Can It Be?" > by David Rosenthal [1], where I stumbled upon an interesting > thought: > The bits forming the digest are no different from the > bits forming the data; neither is magically incorruptible. > ...Applications need to know whether the digest has > been changed.Hence for ZFS, the checksum (digest) is kept in the parent metadata. The condition described above can affect T10 DIF-style checksums, but not ZFS.> In our case, where original checksum in the blockpointer > could be corrupted in (non-ECC) RAM of my home-NAS just > before it was dittoed to disk, another checksum - copy > of this same one, or a differently calculated one, could > provide ZFS with the means to determine whether the data > or one of the checksums got corrupted (or all of them). > Of course, this is not an absolute protection method, > but it can reduce the cases where pools have to be > "destroyed, recreated and recovered from tape".Nope.> It is my belief that using dedup contributed to my issue - > there''s lots more of updating the block pointers and their > checksums, so it gradually becomes more likely that the > metadata (checksum) blocks gets broken (i.e. in non-ECC > RAM), while the written-once userdata remains intact... > > -- > [1] http://queue.acm.org/detail.cfm?id=1866298 > While the text discusses what all ZFSers mostly know > already - about bit-rot, MTTDL and such, it does so with > great detail and many examples, and gave me a better > understanding of it all even though I deal with this for > several years now. A good read, I suggest it to others ;) > > //Jim Klimov > _______________________________________________-- richard -- ZFS and performance consulting http://www.RichardElling.com SCALE 10x, Los Angeles, Jan 20-22, 2012
On Thu, Jan 12, 2012 at 05:01:48PM -0800, Richard Elling wrote:> > This thread is about checksums - namely, now, what are > > our options when they mismatch the data? As has been > > reported by many blog-posts researching ZDB, there do > > happen cases when checksums are broken (i.e. bitrot in > > block pointers, or rather in RAM while the checksum was > > calculated - so each ditto copy of BP has the error), > > but the file data is in fact intact (extracted from > > disk with ZDB or DD, and compared to other copies). > > Metadata is at least doubly redundant and checksummed.The implication is that the original calculation of the checksum was bad in ram (undetected due to lack of ECC), and then written out redundantly and fed as bad input to the rest of the merkle construct. The data blocks on disk are correct, but they fail to verify against the bad metadata. The complaint appears to be that ZFS makes this ''worse'' because the (independently verified) valid data blocks are inaccessible. Worse than what? Corrupted file data that is then accurately checksummed and readable as valid? Accurate data that is read without any assertion of validity, in a traditional filesystem? There''s an inherent value judgement here that will vary by judge, but in each case it''s as much a judgement on the value of ECC and reliable hardware, and your data and time enacting various kinds of recovery, as it is the value of ZFS. The same circumstance could, in principle, happen due to bad CPU even with ECC. In either case, the value of ZFS includes that an error has been detected you would otherwise have been unaware of, and you get a clue that you need to fix hardware and spend time. -- Dan. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 194 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20120113/b8fbefa1/attachment.bin>
2012-01-13 5:30, Daniel Carosone wrote:> On Thu, Jan 12, 2012 at 05:01:48PM -0800, Richard Elling wrote: >>> This thread is about checksums - namely, now, what are >>> our options when they mismatch the data? As has been >>> reported by many blog-posts researching ZDB, there do >>> happen cases when checksums are broken (i.e. bitrot in >>> block pointers, or rather in RAM while the checksum was >>> calculated - so each ditto copy of BP has the error), >>> but the file data is in fact intact (extracted from >>> disk with ZDB or DD, and compared to other copies). >> >> Metadata is at least doubly redundant and checksummed. > > The implication is that the original calculation of the checksum was > bad in ram (undetected due to lack of ECC), and then written out > redundantly and fed as bad input to the rest of the merkle construct. > The data blocks on disk are correct, but they fail to verify against > the bad metadata.Implication is correct, that was the outlined scenario :)> The complaint appears to be that ZFS makes this ''worse'' because the > (independently verified) valid data blocks are inaccessible.Also correct, a frequent "woe" (generally in the context of discussions about lack of ZFS fsck, though many of these discussions tend to descend into flame wars and.or detailed descriptions of how the COW and transaction engine keep {meta}data intact - just until some such fatal bit rot that the pool must be recreated as the only "recovery" option).> Worse than what?Worse than not having a (relatively easy-to-use) ability to confirm to the system, which part to trust - data or the checksum (which returns us to the subject of automating this with ECC and/or other checksums). My data, my checks into it, my word should be final in case of dispute ;) > Corrupted file data that is then accurately> checksummed and readable as valid? Accurate data that is read without > any assertion of validity, in a traditional filesystem?If by ZFS automata itself - without my ability to intervene - then probably not. It would make ZFS no better than others. > There''s> an inherent value judgement here that will vary by judge, but in each > case it''s as much a judgement on the value of ECC and reliable > hardware, and your data and time enacting various kinds of recovery, > as it is the value of ZFS.Perhaps so. I might read through a text file to see if it is garbage or text. I might parse or display image files and many other formats. I might compare to another copy, if available. I just don''t have a mechanism to do so with ZFS. Apparently, a view into the data "as it seems to be" without checksums would speed up the process of data comparison, eye-reading and other methods of validation. People do that with LOST+FOUND and such directories on other FSes, but usually after an unreversible attempt of recovery, correct or not... Heck, with ZFS I might have a snapshot-like view at my recovery options (accessible to programs like image viewers) without changing on-disk data until I pick a variant. Yes, okay, ZFS did inform me of some inconsistency (even then it is not necessarily the data that is bad) and perhaps prompted me to fix the hardware and find other copies of data. Kudos to the team, really! But then it stops here, without providing me with options ro recover whatever is on disk (at my risk). As a Solaris example, admins are allowed to confirm which part of a broken USF+SVM mirror to trust, even if there is not a quorum set of metadb replicas. This trust in the human is common in the industry, and allows to account for whatever could not be done in the software as a one-size-fits-all solution. Also it is the user''s final chioce to kill or save the data, not the programmers with whatever cryptic intentions he had.> > The same circumstance could, in principle, happen due to bad CPU even > with ECC. In either case, the value of ZFS includes that an error has > been detected you would otherwise have been unaware of, and you get a > clue that you need to fix hardware and spend time.True, whenever that is possible. Hardware will be faulty, always. We can only decrease the extent of that. Not all implementation options (see laptops and ECC RAM) or budgets can fix it to "reasonable" levels, though. Software must be the more resilient part, I guess - as long as its error-detection algorithm can execute on that CPU... :) //Jim
2012-01-13 5:01, Richard Elling wrote:> On Jan 12, 2012, at 2:34 PM, Jim Klimov wrote:> Metadata is at least doubly redundant and checksummed.True, and this helps if it is valid in the first place (in RAM). >> As has been >> reported by many blog-posts researching ZDB, there do >> happen cases when checksums are broken ... >> but the file data is in fact intact> Can you provide links to posts that describe this failure mode?I''ll try in another message. That would take some googling time ;) I think the most apparent ones are the tutorials on ZDB where authors poisoned their VDEVs in those sectors where metadata was (all copies), so that filedata is factually intact but not accessible due to mismatching checksums along the metadata path. Right now I can''t think of any other posts like that, but nature can produce the same phenomonons and I think it could have been discussed on-line. I''ve read too much during the past weeks :(> >> For these cases bloggers asked (in vain) - why is it >> not allowed for an admin to confirm validity of end-user >> data and have the system reconstruct (re-checksum) the >> metadata for it?.. IMHO, that''s a valid RFE. > > Metadata is COW, too. Rewriting the data also rewrites the metadata.COW does not help well against mis-targeted hardware writes, bit rot, solar storms, etc. that would break existing on-disk data. Random bit errors can happen anywhere, RAM buffers or committed disks alike. It is a fact (since the first blogposts about ZDB and ZFS internals by Marcelo Leal, Max bruning, Ben Rockwood and countless other kind samaritans) that inquisitive users - or those repairing their systems - can determine DVA and ultimately LBA addresses of their data, extract the userdata blocks and confirm (sometimes) that their data is intact, and the problem is in metadata paths.> >> While the system is scrubbing, I was reading up on theory. >> Found a nice text "Keeping Bits Safe: How Hard Can It Be?" >> by David Rosenthal [1], where I stumbled upon an interesting >> thought: >> The bits forming the digest are no different from the >> bits forming the data; neither is magically incorruptible. >> ...Applications need to know whether the digest has >> been changed. > > Hence for ZFS, the checksum (digest) is kept in the parent metadata.But it can still rot. And for a while they are in the same RAM, which might lie. Probably the one good effect there is - checksum is stored away from the data and *likely* both at once won''t get scratched by HDD head crash ;) Unless they were coalesced to storage near each other... Hm... so if the checksum in metadata has bit-rotted on-disk, this metadata block would first not match its parent block (as it is the parent''s checksummed data), and would cause reread of a ditto copy. But if the checksum got broken in-RAM just before the write, so both ditto blocks have bad checksum values - but they match their metadata-parents - currently the data is considered bad :( Granted, data is larger so there is seemingly a higher chance that it would get a 1-bit error; but as I wrote, metadata blocks are rewritten more often - so in fact they could suffer errors more frequently. Does your practice or theory prove this statement of mine fundamentally wrong?> > The condition described above can affect T10 DIF-style checksums, but not ZFS. > >> In our case, where original checksum in the blockpointer >> could be corrupted in (non-ECC) RAM of my home-NAS just >> before it was dittoed to disk, another checksum - copy >> of this same one, or a differently calculated one, could >> provide ZFS with the means to determine whether the data >> or one of the checksums got corrupted (or all of them). >> Of course, this is not an absolute protection method, >> but it can reduce the cases where pools have to be >> "destroyed, recreated and recovered from tape". > > Nope.Maybe so... as I elaborate below, there are indeed some scenarios with using several checksums of data, where we can not unambiguously determine correctness of either. Say, we have a data block D in RAM, which can fail always (more probable without ECC - as is probable on consumer devices like laptops or home-NASes). We produce two checksums D'' and then D" with different algorithms while preparing to write (these checksum values would go to all ditto blocks). During this time a bit flopped, or whatever undetected (non-ECC) RAM failure happened at least once. Variants: 1) Block D got broken before checksum calcs - we''re out of luck, checksums would probably match, but the data is still wrong. 2) Block D got broken between checksum calcs - one of checksums (always D") matches the data, another one (always D'') doesn''t. 3) Block D is okay, but one of checksums broke - one of checksums matches the data, another one doesn''t. About 50% similarity to case (2). 4) Block D is okay, and both checksums broke - block is considered broken even if it is not... The idea needs to be rethought, indeed ;) Perhaps we can checksum or ECC the checksums, or a digest of a (primary) checksum and the data? Maybe we can presume that bitflips would produce small (1-few bits at random location, 0xdeadbeef -> 0xdeafbeef) differences, and with fuzzy logic the data would still "likely match" the checksum? I refuse to easily believe tehre is no solution, no hope! ;) //Jim
> 2012-01-13 5:30, Daniel Carosone wrote: > Corrupted file data that is then accurately > checksummed and readable as valid?Speaking of which, is there currently any simple way to disable checksum validation during data reads (and not cause a kernel panic when reading garbage under the guise of metadata)? Some posts suggested that I try setting checksum=off on a dataset. It doesn''t work, reads of files with blocks mismatching checksums still provide IO errors ;) Thanks, //Jim