As a heads up, Fletcher class of checksum algorithms are reasonably strong because they are specified to perform 1''s complement sums (i.e. end around carries [(a + b) mod (2^n -1)]) such that sums don''t overflow into oblivion and thereby loose critical information. Unsigned additions in "C" are not 1''s complement additions. As presently implemented (and arguably not even Fletcher checksums), they are incredibly weak, and not even capable of generating unique checksums across large blocks of zero valued data for example, containing single bit errors which are arguably not uncommon. If one reviews carefully correct Fletcher checksum implementations, it will become apparent that either adds with carry are utilized if implemented in assembly language, or larger accumulators are used to compute the sum (such that upper carries are not lost), and then the upper portion of the accumulator are iteratively then added to the lower portion of the sum until no further carries are generated, and thereby effectively performing a 1''s complement modular addition. Please consider fixing the implementation, as it''s presently deceptively weak, although perceived to be otherwise riding on the coattails of the relative strength of Fletcher checksums, which the implementation is not. [As an aside, as I believe that most otherwise undetected checksum miss matches are most likely caused by the introduction of single bit errors during the transport of data to or from storage or otherwise, having a stronger checksum implementation having a hamming distance of at least 2 although more ideally 3, provides the opportunity to potentially enable the correction of single bit errors with reasonable probability if all else fails in lieu of relying on the data being hopefully archived somewhere else or resulting in catastrophic otherwise.] -- This messages posted from opensolaris.org
paul
2008-Aug-18 23:41 UTC
[zfs-code] fletcher2/4 implementations fundamentally flawed (correction)
CORRECTION: As fletcher4 actually sums 32bit (not 64bit) data using 64bit accumulators, the first two checksum terms (a and b) will not overflow for data block sizes as large as 128KB, and therefore should be considered at least as strong as that expected of a 32/64bit Fletcher checksum (although the remaining terms (c and d) may overflow); and thereby have a worst case hamming distance of at least 3 for zfs''s maximum 128KB block size; which is a good thing. (sorry for my earlier miss-diagnosis) fletcher2 (zfs''s default) however remains arguably flawed, however may be easily strengthened by correspondingly being restricted to summing 32bit data using 64bit accumulators, or possibly alternatively summing 64bit data using 128bit accumulators if it''s viewed reasonably supportable by target architectures/compilers likely to host zfs. -- This messages posted from opensolaris.org
Jonathan Adams
2008-Aug-22 21:24 UTC
[zfs-code] fletcher2/4 implementations fundamentally flawed (correction)
On Mon, Aug 18, 2008 at 04:41:35PM -0700, paul wrote:> CORRECTION: As fletcher4 actually sums 32bit (not 64bit) data using 64bit accumulators, > the first two checksum terms (a and b) will not overflow for data block sizes as large as > 128KB, and therefore should be considered at least as strong as that expected of a 32/64bit > Fletcher checksum (although the remaining terms (c and d) may overflow); and thereby have > a worst case hamming distance of at least 3 for zfs''s maximum 128KB block size; which is a > good thing. (sorry for my earlier miss-diagnosis) > > fletcher2 (zfs''s default) however remains arguably flawed, however may be easily strengthened > by correspondingly being restricted to summing 32bit data using 64bit accumulators, or possibly > alternatively summing 64bit data using 128bit accumulators if it''s viewed reasonably supportable > by target architectures/compilers likely to host zfs.Thanks for the report. I''ve filed: 6740597 zfs fletcher-2 is losing its carries to track this. Cheers, - jonathan> -- > This messages posted from opensolaris.org > _______________________________________________ > zfs-code mailing list > zfs-code at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-code
Mark Butler
2009-Mar-28 03:42 UTC
[zfs-code] fletcher2/4 implementations fundamentally flawed
As I understand it, there is no way to fix a problem with the algorithm of a defined checksum without invalidating existing zfs filesystems. Any fix to to the fletcher2 will have to be given a new name. Given how incredibly weak the current fletcher2 is, perhaps the first thing that should be done is to change the default to fletcher4. The flawed fletcher2 appears to be 32768 times weaker than the 16 bit TCP checksum algorithm, i.e. it appears to only have a 50% chance of catching a single bit error or any series of single bit errors in the most significant bit of any 64 bit word in a disk block. For those bits, it is equivalent to a *1 bit* checksum. -- This message posted from opensolaris.org
Richard Elling
2009-Mar-28 04:02 UTC
[zfs-code] fletcher2/4 implementations fundamentally flawed
Mark Butler wrote:> As I understand it, there is no way to fix a problem with the algorithm of a defined checksum without invalidating existing zfs filesystems. Any fix to to the fletcher2 will have to be given a new name. >Other than the name, fletcher, is there actually a problem here?> Given how incredibly weak the current fletcher2 is, perhaps the first thing that should be done is to change the default to fletcher4. The flawed fletcher2 appears to be 32768 times weaker than the 16 bit TCP checksum algorithm, i.e. it appears to only have a 50% chance of catching a single bit error or any series of single bit errors in the most significant bit of any 64 bit word in a disk block. For those bits, it is equivalent to a *1 bit* checksum. >The way I see it, the current "fletcher"2 is better than a simple xor, but has the same computational cost. Anecdotal evidence suggests that the current fletcher2 catches a large number of faults. This makes some sense for magnetic disk drives, which already have significant single bit correction. If you know of a formal study which shows the distribution of errors in the current population, I''d be very interested in seeing it. NB, there are already some RFEs for improving the reporting of checksum mismatches, but there may be more work we can do there, too. -- richard
Mark Butler
2009-Mar-28 05:41 UTC
[zfs-code] fletcher2/4 implementations fundamentally flawed
The problem is that for 1 out of 64 bits, "fletcher2" is [i]not[/i] better than a simple XOR. 50% probability of a false negative for any number of bit errors in those bits. If you have a phantom write to a previously used block that differs only in bits that occupy that bit position, you only have a 50% chance of catching it. That is pathetic. By way of suggestion, considering that hardware support is on its way in next generation Intel x86 processors, CRC32-C checksum support (as used in iSCSI and SCTP) would be an excellent addition. There are decent software implementations as well that process 8 bits at a time using a 8KB lookup table. -- This message posted from opensolaris.org
Mark Butler
2009-Mar-29 22:32 UTC
[zfs-code] fletcher2/4 implementations fundamentally flawed
Here is a better example. Suppose you are checksumming an 8 KB block that consists of all one bits. fletcher2 runs two parallel quasi-fletcher checksums on alternating 64 bit words, such that the linear term of each checksum over an 8KB block is the sum of 512 additions. The highest linear term you can get for an 8KB block this way is 512 * (2^32-1), which is somewhat less than 2^73. However, fletcher2 uses a 64bit accumulator, so those extra nine bits are effectively discarded. The equivalent bits are discarded even quicker from the non-linear term. Because of that weakness, with fletcher2 errors in any of the _eight_ most significant bits in any of the 64bit words in the first half of an 8KB block consisting of all one bits will be completely ignored. In addition the errors in the seven most significant bits in the next quarter of the block will be ignored, errors in the next eighth of the block, and so on until you reach the end of the block. All told, errors in just under 8192 bit positions of an 8KB block consisting of all ones will be completely ignored due to carry overflow. With larger blocks, the problem is naturally worse. I therefore propose an alternative to fletcher2 that I call "fletcher8" that doesn''t have the same weakness with detecting problems near the MSB of each 64 bit word that fletcher2 does. The only difference here is that the high 32 bits of each 64 bit word are added twice, first in their normal position, and then shifted downward 32 bits. On an Intel Core 2 Q6600 (2.3 Ghz) in 32 bit mode with gcc 4.1.2, "fletcher8" runs 3.3x faster than fletcher4, and about twice as slow as the flawed fletcher2. Of course, if you don''t care very much about detecting errors near the MSB of each 64 bit word, the flawed fletcher2 is much preferable. I ran each checksum 500,000 times on an 8K block of quasi-random data and measured the following: fletcher2: 4137 MB/s fletcher4: 612 MB/s fletcher8 (proposed): 2038 MB/s --- void fletcher_8_native(const void *buf, uint64_t size, zio_cksum_t *zcp) { const uint64_t *ip = buf; const uint64_t *ipend = ip + (size / sizeof (uint64_t)); uint64_t a0, b0, a1, b1; for (a0 = b0 = a1 = b1 = 0; ip < ipend; ip += 2) { a0 += ip[0] + (ip[0] >> 32); a1 += ip[1] + (ip[1] >> 32); b0 += a0; b1 += a1; } ZIO_SET_CHECKSUM(zcp, a0, a1, b0, b1); } -- This message posted from opensolaris.org
Mark Butler
2009-Mar-29 22:54 UTC
[zfs-code] fletcher2/4 implementations fundamentally flawed
The code didn''t come out right (presumably due to angle bracket handling), so I have included it as an attachment here. -- This message posted from opensolaris.org -------------- next part -------------- A non-text attachment was scrubbed... Name: prop_fletcher8.c Type: application/octet-stream Size: 371 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-code/attachments/20090329/36e03cc3/attachment.obj>
Jeff Bonwick
2009-Mar-29 23:13 UTC
[zfs-code] fletcher2/4 implementations fundamentally flawed
On Sun, Mar 29, 2009 at 03:32:20PM -0700, Mark Butler wrote:> > a0 += ip[0] + (ip[0] >> 32);That has certain weaknesses too. What we really want is a 128-bit add across two 64-bit registers. If you write the code like this: value = ip[0]; a0 += value; if (a0 < value) /* 64-bit overflow implies need to carry */ a1++; b0 += a0; if (b0 < a0) b1++; then you get the desired effect. The pair a1:a0 is the 128-bit sum of the 64-bit ip[] values, and the pair b1:b0 is the 128-bit sum of the a1:a0 values. Best of all, the compiler (at least, our compiler) is smart enough to detect the carry-detection construct and turn it into branchless add-with-carry instructions. Very efficient. We''ve been meaning to introduce this "fletcher2c" for some time -- it just got lost in the sea of things to do. Thanks for the reminder. Jeff
Mark Butler
2009-Mar-30 01:37 UTC
[zfs-code] fletcher2/4 implementations fundamentally flawed
The first thing I tried was 128 bit additions of 64 bit quantities. The problem is that it doesn''t run any faster than fletcher4 on a 32 bit machine, and fletcher4 on 32bit machines is almost 7 times slower than fletcher2. I did some further testing, and fletcher2 does not look as bad as in my last estimate - no single bit error detection failures in my tests. I have confirmed failures where two 64 bit MSBs are flipped. I expect an error detection failure when any even number of 64 bit MSBs in the same vertical stripe are flipped from what they should be. With "fletcher8", I have yet to discover any dual bit failures and I don''t expect any. -- This message posted from opensolaris.org
Mark Butler
2009-Mar-31 07:36 UTC
[zfs-code] fletcher2/4 implementations fundamentally flawed
I ran tests of every dual bit error case on an 8KB block containing the first 8K of a typical ELF executable with six checksum algorithms. I included two trivial checksums for comparison purposes. The undetected error counts are as follows, out of a total of 214783648 cases: xor128: 16744448 (~3 x 10^-3 false negative rate) add64: 15763070 (~3 x 10^-3 false negative rate) fletcher2: 187606 (~4 x 10^-5 false negative rate) fletcher4: 0 crc32c: 0 "fletcher8": 0 The comparative single threaded bandwidth for these six checksums on a recent 32 bit x86 and gcc 4.1.2 w/ -O2 is as follows: add64: 8096 MB/s xor128: 5178 MB/s fletcher2: 4137 MB/s "fletcher8": 2038 MB/ crc32c: 1228 MB/s fletcher4: 612 MB/s Notes: 1. xor128 is a 128 bit exclusive or sum. 2. Add64 is a 64 bit two?s complement sum 3. The crc32c implementation was software implementation using an 8KB lookup table - no special instructions. 4. The processor was an Intel Q6600 running at 2.4 Ghz in 32 bit mode. 5. 130560 of those fletcher2 false negatives appear to be data independent error cases that occur due to the flipped parity of two 64 bit MSBs. You get exactly that many on a block consisting of all ones or all zeroes. 6. Despite its flaws, fletcher2 is two orders of magnitude better than weaker checksums of comparable speed. -- This message posted from opensolaris.org
Jeff Bonwick
2009-Mar-31 08:45 UTC
[zfs-code] fletcher2/4 implementations fundamentally flawed
Very cool, Mark -- thanks for posting this! I was discussing this general issue with Jonathan Adams this afternoon. He''s currently coding and measuring the performance and effectiveness of a number of fletcher2 variants, which I''ll let him describe in exquisite detail. Something we observed along the way: all orders of fletcher linear combinations of the data words, so it shouldn''t matter what initial data block you use when checking for two-bit error detection. Also, note that the fletcher checksum of an all-zero block is zero. Therefore, you can find two-bit vulnerabilities by setting all possible pairs of bits in an otherwise all-zero block, and checking whether any of the resulting fletcher checksums are zero. Jeff On Tue, Mar 31, 2009 at 12:36:48AM -0700, Mark Butler wrote:> I ran tests of every dual bit error case on an 8KB block containing the first 8K of a typical ELF executable with six checksum algorithms. I included two trivial checksums for comparison purposes. The undetected error counts are as follows, out of a total of 214783648 cases: > > xor128: 16744448 (~3 x 10^-3 false negative rate) > add64: 15763070 (~3 x 10^-3 false negative rate) > fletcher2: 187606 (~4 x 10^-5 false negative rate) > fletcher4: 0 > crc32c: 0 > "fletcher8": 0 > > The comparative single threaded bandwidth for these six checksums on a recent 32 bit x86 and gcc 4.1.2 w/ -O2 is as follows: > > add64: 8096 MB/s > xor128: 5178 MB/s > fletcher2: 4137 MB/s > "fletcher8": 2038 MB/ > crc32c: 1228 MB/s > fletcher4: 612 MB/s > > > Notes: > 1. xor128 is a 128 bit exclusive or sum. > 2. Add64 is a 64 bit two?s complement sum > 3. The crc32c implementation was software implementation using an 8KB lookup table - no special instructions. > 4. The processor was an Intel Q6600 running at 2.4 Ghz in 32 bit mode. > 5. 130560 of those fletcher2 false negatives appear to be data independent error cases that occur due to the flipped parity of two 64 bit MSBs. You get exactly that many on a block consisting of all ones or all zeroes. > 6. Despite its flaws, fletcher2 is two orders of magnitude better than weaker checksums of comparable speed. > -- > This message posted from opensolaris.org > _______________________________________________ > zfs-code mailing list > zfs-code at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-code
Nice. As a minor thought with respect to naming conventions, as most aren''t likely to bother attempting to understand the relative merits of available checksum implementations, I can''t help but wonder if the naming conventions should to be ordered relative to their performance/goodness. For example: fletcher1 (future, 64b add-w/c or 128b sums, 64b data, fast/robust)* fletcher2 (present, using 64b adds, 64b data, fast/less-robust) fletcher3 (proposed, using 64b adds, 32b data, slower/robust) fletcher4 (present, using 64b adds, 32b data, very-slow/robust) *where a future fletcher1 may possibly use either 128b sums or 64b add-with-carry implementation (coded in assembly or leverage compiler intrinsics if/when available), and thereby be as fast as the current fletcher2 and as robust as the proposed fletcher3 (fletcher8). (lastly, it seems reasonable to allow existing checksums to be checked/converted to another if later deemed more desirable.) -- This message posted from opensolaris.org
Jonathan Adams
2009-Apr-13 23:18 UTC
[zfs-code] fletcher2/4 implementations fundamentally flawed
On Tue, Mar 31, 2009 at 09:07:31AM -0700, paul wrote:> Nice. > > As a minor thought with respect to naming conventions, as most > aren''t likely to bother attempting to understand the relative merits > of available checksum implementations, I can''t help but wonder > if the naming conventions should to be ordered relative to their > performance/goodness. > > For example: > > fletcher1 (future, 64b add-w/c or 128b sums, 64b data, fast/robust)* > fletcher2 (present, using 64b adds, 64b data, fast/less-robust) > fletcher3 (proposed, using 64b adds, 32b data, slower/robust) > fletcher4 (present, using 64b adds, 32b data, very-slow/robust) > > *where a future fletcher1 may possibly use either 128b sums or 64b > add-with-carry implementation (coded in assembly or leverage compiler > intrinsics if/when available), and thereby be as fast as the current > fletcher2 and as robust as the proposed fletcher3 (fletcher8).I''ve been investigating this. My main performance results are that the checksum computation is overwhelmingly dominated by the data motion; from a performance standpoint, fletcher2 and fletcher4 are completely equivalent, despite the fact that on cached data, fletcher4 is ~4x slower than fletcher2. (checksum=none is faster, since it doesn''t touch the data at all, and checksum=sha256 is slower, since it does significant amounts of computation per word) fletcher2 has two issues: 1. Since it is not pulling around its carries, it is vulnerable to 2-bit errors in the higher bits. It is (as is any 2nd-order fletcher) vulnerable to three-bit errors in a variety of spacings. 2. Since there is no performance boost associated with using fletcher2, there is no motivation to continue using it, given its various vulnerabilities. Fletcher-4 is still somewhat vulnerable; here''s part of a block-quoted comment I''ve written: * Fletcher-4 is defined by the recurrence relation: * a_i = a_(i-1) + f_(i-1) * b_i = b_(i-1) + a_i * c_i = c_(i-1) + b_i * d_i = d_(i-1) + c_i * * Where: * a_0 = b_0 = c_0 = d_0 = 0 * and * f_0 .. f_(n-1) are the input data. * * Using standard series transformations, these become the following sums: * * __n_ __n_ * \ | \ | * a = > f b = > i * f * n /___| n - i n /___| n - i * i = 1 i = 1 * * __n_ __n_ * \ | i*(i+1) \ | i*(i+1)*(i+2) * c = > ------- f d = > ------------- f * n /___| 2 n - i n /___| 6 n - i * i = 1 i = 1 * * * Now, f_i are 32-bit values, and [abcd]_n are 64-bit accumulators. In * order to not loose data, we need to make sure that we don''t overflow. * The worst-case for this is if f_i = 0xffffffff for all i. We can * conservatively estimate how big n can get before we overflow by running: * * % bc * a=2^32-1;for (i=1;a < 2^64; i++) {a += i*(i+1)*(i+2)/6*(2^32-1)};i-1 * 566 * quit 566 * 4bytes/datum is slightly above 2k of data, so a 128k block is ~64 times longer than the earliest truncation. There are two ways of dealing with this: 1. Leave fletcher4 alone. Our largest buffer is 128k; that''s 32k*sizeof (uint32_t). That means that n = 32768, and we want to look at the factorization of: D_i = (i*(i+1)*(i+2))/6 i = 1..n. Writing it out as: D_i = (2^(E_i)) * (F_i) where F_i is odd, we can ask about the distribution of E_i. Some scripting later, we get: for the Ds: COUNT 2-FACTOR 8192 2^0 4096 2^1 10240 2^2 5120 2^3 2560 2^4 1280 2^5 640 2^6 320 2^7 160 2^8 80 2^9 40 2^10 20 2^11 10 2^12 5 2^13 3 2^14 2 2^15 And the Cs: 16384 2^0 8192 2^1 4096 2^2 2048 2^3 1024 2^4 512 2^5 256 2^6 128 2^7 64 2^8 32 2^9 16 2^10 8 2^11 4 2^12 2 2^13 1 2^14 This means that even in the worst case, f_i will be added at 2^15 * (32-bit quantity), which fits in a 64-bit accumulator. This does mean that early-on words in the buffer have *less* of an effect on the result. Of course, without more math, I don''t have a compelling case for this being effective. 2. Invent a fletcher4c, which does something about the overflow. My suggestion would be to add a cleanup pass, run every 512 iterations and at the end, which does: c_a += (a >> 32) a = (a & 0xffffffff) + (a >> 32); if ((a >> 32) != 0) { a = (a & 0affffffff) + 1; c_a++; } (and similarly for b, c, and d), then at the end, do: while ((c_a >> 32) != 0) c_a = (c_a & 0xffffffff) + (c_a >> 32); a |= (c_a) << 32; This computes a, b, c, and d (mod 2^32 - 1), then puts the carry count in the top bits. The advantage of putting the carry count there is that it counteracts a weakness of computing (mod 2^32 - 1), which is that 0 == 2^32 - 1 (mod 2^32 - 1). But 0 won''t add a carry, and 2^32 - 1 will, so the final counts will end up different. The main complication in introducing fletcher4c is there are several places which hardcode the use of fletcher4 (the send/receive code in particular), and so versioning could be an issue. Thoughts? Cheers, - jonathan
Nice analysis. I understand you''ve found on the processor/memory-subsystem architecture you''ve experimented with, that if the data being summed isn''t already cached, it''s access will likely dominate it''s use in calculations: - Although this makes sense (presuming no mechanism may used to pre-cache data to be only then be subsequently check-summed when cache resident in the future), as processor/memory system architectures are still evolving, it may be prudent to account for that possibility; for example possible future use of streaming vector/memory units, where multiple data streams may be potentially summed in parallel and/or possibly integrated into the I/O channel such that while data is being retrieved its checksum may be computed without processor intervention may be desirable. - With respect to Fletcher4, as the first two running sums (representing a traditional Fletcher checksum implementation) are sufficient to yield a hamming distance of at least 3 (meaning all 1 and 2 bit errors are detectable, and thereby 1 bit errors also potentially correctable if ever desired) for block sizes somewhat larger than 256KB, being larger than required by ZFS; I can''t help but wonder if this may be sufficient rather than worrying about trying to maintain two more running sums each dependent on the previous without potential overflow (or even potential algorithm implementation issues which may impede performance in the future by having to maintain 4 sums with the later 3 being dependant on it''s predecessor''s sum, rather than just having to schedule only 1 such dependency if only maintaining 2 sums, and presuming support for potentially more efficient data access on future platform implementations)? (As a caveat, it''s not clear to me how much more resilient the checksum will be by maintaining 2 more sums than traditionally used, and thereby when I previously observed that fletcher4 was fine, it was based on it''s first two terms not overflowing for data sizes required, without regard to the possibility that the upper two terms may overflow, as I didn''t consider them significant, or rather didn''t rely on their significance). - However in conclusion, although without pre-fetching, streaming, and/or channel integration, fletcher2 (corrected to use 32b data with 64b sums) may be no faster than the current implementation of fletcher4 (being possibly not significantly more resilient than a corrected flecther2, but which may be refined to warrant the upper two term also do not overflow, and thereby improving it''s resilience to some degree); I personally suspect it''s best to BOTH refine fletcher4 to warrant the upper two sums do not overflow by wrapping the upper bits of the upper two sums every N (pick your N) iterations; AND fix fletcher2 because when fixed it has the potential to be significantly faster than the fixed fletcher4 on future or other platforms leveraging mechanizes to pre-fetch data, and/or synchronize computations with access. (Both being relatively easily fixes, so why not?) Again, merely IMHO. -- This message posted from opensolaris.org
Jonathan Adams
2009-Apr-14 19:41 UTC
[zfs-code] fletcher2/4 implementations fundamentally flawed
On Mon, Apr 13, 2009 at 06:34:36PM -0700, paul wrote:> Nice analysis. I understand you''ve found on the > processor/memory-subsystem architecture you''ve experimented with, that > if the data being summed isn''t already cached, it''s access will likely > dominate it''s use in calculations: > > - Although this makes sense (presuming no mechanism may used to > pre-cache data to be only then be subsequently check-summed when cache > resident in the future), as processor/memory system architectures are > still evolving, it may be prudent to account for that possibility; for > example possible future use of streaming vector/memory units, where > multiple data streams may be potentially summed in parallel and/or > possibly integrated into the I/O channel such that while data is being > retrieved its checksum may be computed without processor intervention > may be desirable.True; my experience with the performance was that any fix to fletcher2 to make it stronger slowed things down by ~2-2.5x *when the data was in-cache*, which is most of the way to fletcher4 (~3-3.5x slower). And that was with hand-optimized assembly, because the compiler''s inline assembly has bugs. (that said, the bugs are being fixed; even with them fixed, the algorithms will be slower by a significant amount)> - With respect to Fletcher4, as the first two running sums > (representing a traditional Fletcher checksum implementation) are > sufficient to yield a hamming distance of at least 3 (meaning all > 1 and 2 bit errors are detectable, and thereby 1 bit errors also > potentially correctable if ever desired) for block sizes somewhat > larger than 256KB, being larger than required by ZFS; I can''t help but > wonder if this may be sufficient rather than worrying about trying to > maintain two more running sums each dependent on the previous without > potential overflow (or even potential algorithm implementation issues > which may impede performance in the future by having to maintain 4 > sums with the later 3 being dependant on it''s predecessor''s sum, > rather than just having to schedule only 1 such dependency if only > maintaining 2 sums, and presuming support for potentially more > efficient data access on future platform implementations)?I''ve verified by exhaustive search that fletcher4 has a hamming distance of at least 6 for 128-byte buffers, and my intuition is it is much higher. The i*(i+1)/2 and i*(i+1)*(i+2)/6 terms in the third and fourth accumulators make it much harder to have cancellations line up correctly, compared to the first two. As an aside, the way to model undetectable errors is to split the errors into bit clears and bit sets, and make two mostly-zero data buffers: f_0..(n-1), with the "bit clears" bits set, and g_0..(n-1), with the "bit sets" bit set. Since the fletcher checksums are linear in the data being checksummed, if the two checksums are equal, then the error will be undetected. You can further refine this by noting: 1. Since all words before the first error word are zero, they will have a checksum of zero and no effect on the result. 2. If the error is undetected, the two checksums will be equal after the final error word is processed, and will remain equal through the end of the buffer. If the error is detected, the checksums will not be equal after the final error word is processed, and will remain unequal throughout the remaining processing. Since we only care whether or not the checksums will be equal, we can shorten the buffers to just big enough to cover the errors without any loss of generality. For fletcher2, it''s fairly easy to show that at least three different error words are required, and the precise position and value constraints. For fletcher4, you need at least five different error words, and the math quickly overwhelms my ability to pen-and-paper it. Without any access to Maple/Mathematica/etc, I''m not going to make much more process.> (As a caveat, it''s not clear to me how much more resilient the > checksum will be by maintaining 2 more sums than traditionally used, > and thereby when I previously observed that fletcher4 was fine, it > was based on it''s first two terms not overflowing for data sizes > required, without regard to the possibility that the upper two terms > may overflow, as I didn''t consider them significant, or rather didn''t > rely on their significance).Understood.> - However in conclusion, although without pre-fetching, streaming, > and/or channel integration, fletcher2 (corrected to use 32b data > with 64b sums) may be no faster than the current implementation of > fletcher4 (being possibly not significantly more resilient than a > corrected flecther2, but which may be refined to warrant the upper > two term also do not overflow, and thereby improving it''s resilience > to some degree); I personally suspect it''s best to BOTH refine > fletcher4 to warrant the upper two sums do not overflow by wrapping > the upper bits of the upper two sums every N (pick your N) iterations; > AND fix fletcher2 because when fixed it has the potential to be > significantly faster than the fixed fletcher4 on future or other > platforms leveraging mechanizes to pre-fetch data, and/or synchronize > computations with access. (Both being relatively easily fixes, so why > not?)It''s mainly a question of defaults and not giving dangerous options. If fletcher4 is around the same speed as our fixed fletcher2, and provides better fault detection (and possibly correction), why offer them both? Perhaps what we should do is: rename fletcher2 and fletcher4 to fletcher2-old and fletcher4-old. introduce new "fletcher2" and "fletcher4" algorithms, fixed as above. repoint "checksum=on" to "fletcher4". So old data will continue to be checked using the old algorithms, new data will use the new fletcher2 and 4, and we''ll default to fletcher4. Thoughts? - jonathan