Peter Taps
2011-Jan-06 19:44 UTC
[zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)
Folks, I have been told that the checksum value returned by Sha256 is almost guaranteed to be unique. In fact, if Sha256 fails in some case, we have a bigger problem such as memory corruption, etc. Essentially, adding verification to sha256 is an overkill. Perhaps (Sha256+NoVerification) would work 99.999999% of the time. But (Fletcher+Verification) would work 100% of the time. Which one of the two is a better deduplication strategy? If we do not use verification with Sha256, what is the worst case scenario? Is it just more disk space occupied (because of failure to detect duplicate blocks) or there is a chance of actual data corruption (because two blocks were assumed to be duplicate although they are not)? Or, if I go with (Sha256+Verification), how much is the overhead of verification on the overall process? If I do go with verification, it seems (Fletcher+Verification) is more efficient than (Sha256+Verification). And both are 100% accurate in detecting duplicate blocks. Thank you in advance for your help. Peter -- This message posted from opensolaris.org
David Magda
2011-Jan-06 20:32 UTC
[zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)
On Thu, January 6, 2011 14:44, Peter Taps wrote:> I have been told that the checksum value returned by Sha256 is almost > guaranteed to be unique. In fact, if Sha256 fails in some case, we have a > bigger problem such as memory corruption, etc. Essentially, adding > verification to sha256 is an overkill. > > Perhaps (Sha256+NoVerification) would work 99.999999% of the time. But > (Fletcher+Verification) would work 100% of the time. > > Which one of the two is a better deduplication strategy?The ZFS default is what you should be using unless you can explain (technically, and preferably mathematically) why you should use something else. I''m guessing you''re using "99.999999%" as a ''literary gesture'', and haven''t done the math. The above means that you have a 0.0000001% or 10^-7 chance of having a collision. The reality is that the odds are actually 10^-77 (~ 2^-256; see [1] though): http://blogs.sun.com/bonwick/entry/zfs_dedup As a form of comparison, the odds of having an non-recoverable bit error from a hard disk is about 10^15 for SAS disks and 10^-14 for SATA disks. So you''re about sixty times more likely to have a disk read error than get a collision from SHA-256. If you''re not worried about disk read errors (and/or are not experiencing them), then you shouldn''t be worried about has collisions. TL;DR: do a "dedupe=on" and forget about it. Some more discussion as it relates to some backup dedupe appliances (the principles are the same): http://tinyurl.com/36369pb http://www.backupcentral.com/mr-backup-blog-mainmenu-47/13-mr-backup-blog/145-de-dupe-hash-collisions.html [1] It may actually be 10^-38 (2^-128) or so because of the birthday paradox, but we''re still talking unlikely. You have a better chance of dying from lightning or being attacked by a mountain lion: http://www.blog.joelx.com/odds-chances-of-dying/877/
Robert Milkowski
2011-Jan-06 20:49 UTC
[zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)
On 01/ 6/11 07:44 PM, Peter Taps wrote:> Folks, > > I have been told that the checksum value returned by Sha256 is almost guaranteed to be unique. In fact, if Sha256 fails in some case, we have a bigger problem such as memory corruption, etc. Essentially, adding verification to sha256 is an overkill. > > Perhaps (Sha256+NoVerification) would work 99.999999% of the time. But (Fletcher+Verification) would work 100% of the time. > > Which one of the two is a better deduplication strategy? > > If we do not use verification with Sha256, what is the worst case scenario? Is it just more disk space occupied (because of failure to detect duplicate blocks) or there is a chance of actual data corruption (because two blocks were assumed to be duplicate although they are not)? >Yes, there is a possibility of data corruption.> Or, if I go with (Sha256+Verification), how much is the overhead of verification on the overall process? >It really depends on your specific workload. If your application is mostly reading data then it well might be you won''t even notice verify. Sha256 is supposed to be almost bullet proof but... At the end of a day it is all about how much you value your data. But as I wrote before, try with verify and see if performance is acceptable. It well might be the case. You can always disable verify at any time.> If I do go with verification, it seems (Fletcher+Verification) is more efficient than (Sha256+Verification). And both are 100% accurate in detecting duplicate blocks. >I don''t believe that fletcher is still allowed for dedup - right now it is only sha256. -- Robert Milkowski http://milek.blogspot.com
Richard Elling
2011-Jan-06 20:57 UTC
[zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)
On Jan 6, 2011, at 11:44 AM, Peter Taps wrote:> Folks, > > I have been told that the checksum value returned by Sha256 is almost guaranteed to be unique. In fact, if Sha256 fails in some case, we have a bigger problem such as memory corruption, etc. Essentially, adding verification to sha256 is an overkill.I disagree. I do not believe you can uniquely identify all possible permutations of 1 million bits using only 256 bits.> Perhaps (Sha256+NoVerification) would work 99.999999% of the time. But (Fletcher+Verification) would work 100% of the time. > > Which one of the two is a better deduplication strategy?If you love your data, always use verify=on> If we do not use verification with Sha256, what is the worst case scenario? Is it just more disk space occupied (because of failure to detect duplicate blocks) or there is a chance of actual data corruption (because two blocks were assumed to be duplicate although they are not)?If you do not use verify=on, you risk repeatable data corruption. In some postings you will find claims of the "odds being 1 in 2^256 +/-" for a collision. This is correct. However, they will then compare this to the odds of a disk read error. There is an important difference however -- the disk error is likely to be noticed, but a collision is completely silent without the verify option. This is why it is a repeatable problem, different than hardware failures which are not repeatable. Accepting repeatable and silent data corruption is a very bad tradeoff, IMNSHO.> Or, if I go with (Sha256+Verification), how much is the overhead of verification on the overall process?In my experience, I see little chance that a verification will be used. As above, you might run into a collision, but it will be rare.> If I do go with verification, it seems (Fletcher+Verification) is more efficient than (Sha256+Verification). And both are 100% accurate in detecting duplicate blocks.Yes. Fletcher with verification will be more performant than sha-256. However, that option is not available in the Solaris releases. -- richard
Nicolas Williams
2011-Jan-06 20:57 UTC
[zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)
On Thu, Jan 06, 2011 at 11:44:31AM -0800, Peter Taps wrote:> I have been told that the checksum value returned by Sha256 is almost > guaranteed to be unique.All hash functions are guaranteed to have collisions [for inputs larger than their output anyways].> In fact, if Sha256 fails in some case, we > have a bigger problem such as memory corruption, etc. Essentially, > adding verification to sha256 is an overkill.What makes a hash function cryptographically secure is not impossibility of collisions, but computational difficulty of finding arbitrary colliding input pairs, collisions for known inputs, second pre-images, and first pre-images. Just because you can''t easily find collisions on purpose doesn''t mean that you can''t accidentally find collisions. That said, if the distribution of SHA-256 is even enough then your chances of finding a collision by accident are so remote (one in 2^128) that you could reasonably decide that you don''t care.> Perhaps (Sha256+NoVerification) would work 99.999999% of the time. But > (Fletcher+Verification) would work 100% of the time.Fletcher is faster than SHA-256, so I think that must be what you''re asking about: "can Fletcher+Verification be faster than Sha256+NoVerification?" Or do you have some other goal? Assuming I guessed correctly... The speed of the hash function isn''t significant compared to the cost of the verification I/O, period, end of story. So, SHA-256 w/o verification will be faster than Fletcher + Verification -- lots faster if you have particularly deduplicatious data to write. Moreorever, SHA-256 + verification will likely be somewhat faster than Fletcher + verification because SHA-256 will likely have fewer collisions than Fletcher, and the cost of I/O dominates the cost of the hash functions.> Which one of the two is a better deduplication strategy? > > If we do not use verification with Sha256, what is the worst case > scenario? Is it just more disk space occupied (because of failure to > detect duplicate blocks) or there is a chance of actual data > corruption (because two blocks were assumed to be duplicate although > they are not)?If you don''t verify then you run the risk of corruption on collision, NOT the risk of using too much disk space.> Or, if I go with (Sha256+Verification), how much is the overhead of > verification on the overall process? > > If I do go with verification, it seems (Fletcher+Verification) is more > efficient than (Sha256+Verification). And both are 100% accurate in > detecting duplicate blocks.You''re confused. Fletcher may be faster to compute than SHA-256, but the run-time of both is as nothing compared to latency of the disk I/O needed for verification, which means that the hash function''s rate of collisions is more important than its computational cost. (Now, Fletcher is thought to not be a cryptographically secure hash function, while SHA-256 is, for now, considered cryptographically secure. That probably means that the distribution of Fletcher''s outputs over random inputs is not as even as that of SHA-256, which probably means you can expect more collisions with Fletcher than with SHA-256. Note that I made no absolute statements in the previous sentence -- that''s because I''ve not read any studies of Fletcher''s performance relative to SHA-256, thus I''m not certain of anything stated in the previous sentence.) David Magda''s advice is spot on. Nico --
David Magda
2011-Jan-06 23:07 UTC
[zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)
On Jan 6, 2011, at 15:57, Nicolas Williams wrote:> Fletcher is faster than SHA-256, so I think that must be what you''re > asking about: "can Fletcher+Verification be faster than > Sha256+NoVerification?" Or do you have some other goal?Would running on recent T-series servers, which have have on-die crypto units, help any in this regard?
Nicolas Williams
2011-Jan-06 23:20 UTC
[zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)
On Thu, Jan 06, 2011 at 06:07:47PM -0500, David Magda wrote:> On Jan 6, 2011, at 15:57, Nicolas Williams wrote: > > > Fletcher is faster than SHA-256, so I think that must be what you''re > > asking about: "can Fletcher+Verification be faster than > > Sha256+NoVerification?" Or do you have some other goal? > > Would running on recent T-series servers, which have have on-die > crypto units, help any in this regard?Yes, particularly for larger blocks. Hash collisions don''t matter as long as ZFS verifies dups, so the real question is: what is the false positive dup rate (i.e., the accidental collision rate). But that''s going to vary a lot by {hash function, working data set}, thus it''s not possible to make exact determinations, just estimates. For me the biggest issue is that as good as Fletcher is for a CRC, I''d rather have a cryptographic hash function because I''ve seen incredibly odd CRC failures before. There''s a famous case from within SWAN a few years ago where a switch flipped pairs of bits such that all too often the various CRCs that applied to the moving packets failed to detect the bit flips, and we discovered this when an SCCS file in a clone of the ON gate got corrupted. Such failures (collisions) wouldn''t affect dedup, but they would mask corruption of non-deduped blocks. Nico --
Edward Ned Harvey
2011-Jan-07 06:05 UTC
[zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)
> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- > bounces at opensolaris.org] On Behalf Of Peter Taps > > Perhaps (Sha256+NoVerification) would work 99.999999% of the time. ButAppend 50 more 9''s on there. 99.99999999999999999999999999999999999999999999999999999999% See below.> I have been told that the checksum value returned by Sha256 is almost > guaranteed to be unique. In fact, if Sha256 fails in some case, we have a > bigger problem such as memory corruption, etc. Essentially, adding > verification to sha256 is an overkill.Someone please correct me if I''m wrong. I assume ZFS dedup matches both the blocksize and the checksum right? A simple checksum collision (which is astronomically unlikely) is still not sufficient to produce corrupted data. It''s even more unlikely than that. Using the above assumption, here''s how you calculate the probability of corruption if you''re not using verification: Suppose every single block in your whole pool is precisely the same size (which is unrealistic in the real world, but I''m trying to calculate worst case.) Suppose the block is 4K, which is again, unrealistically worst case. Suppose your dataset is purely random or sequential ... with no duplicated data ... which is unrealisic because if your data is like that, then why in the world are you enabling dedupe? But again, assuming worst case scenario... At this point we''ll throw in some evil clowns, spit on a voodoo priestess, and curse the heavens for some extra bad luck. If you have astronomically infinite quantities of data, then your probability of corruption approaches 100%. With infinite data, eventually you''re guaranteed to have a collision. So the probability of corruption is directly related to the total amount of data you have, and the new question is: For anything Earthly, how near are you to 0% probability of collision in reality? Suppose you have 128TB of data. That is ... you have 2^35 unique 4k blocks of uniformly sized data. Then the probability you have any collision in your whole dataset is (sum(1 thru 2^35))*2^-256 Note: sum of integers from 1 to N is (N*(N+1))/2 Note: 2^35 * (2^35+1) = 2^35 * 2^35 + 2^35 = 2^70 + 2^35 Note: (N*(N+1))/2 in this case = 2^69 + 2^34 So the probability of data corruption in this case, is 2^-187 + 2^-222 ~5.1E-57 + 1.5E-67 ~= 5.1E-57 In other words, even in the absolute worst case, cursing the gods, running without verification, using data that''s specifically formulated to try and cause errors, on a dataset that I bet is larger than what you''re doing, ... Before we go any further ... The total number of bits stored on all the storage in the whole planet is a lot smaller than the total number of molecules in the planet. There are estimated 8.87 * 10^49 molecules in planet Earth. The probability of a collision in your worst-case unrealistic dataset as described, is even 100 million times less likely than randomly finding a single specific molecule in the whole planet Earth by pure luck.
Michael Sullivan
2011-Jan-07 06:22 UTC
[zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)
Ed, with all due respect to your math, I''ve seen rsync bomb due to an SHA256 collision, so I know it can and does happen. I respect my data, so even with checksumming and comparing the block size, I''ll still do a comparison check if those two match. You will end up with silent data corruption which could affect you in so many ways. Do you want to stake your career and reputation on that? With a client or employer''s data? I sure don''t. "Those who walk on the razor''s edge are destined to be cut to ribbons?" Someone I used to work with said that, not me. For my home media server, maybe, but even then I''d hate to lose any of my family photos or video due to a hash collision. I''ll play it safe if I dedup. Mike --- Michael Sullivan michael.p.sullivan at me.com http://www.kamiogi.net/ Mobile: +1-662-202-7716 US Phone: +1-561-283-2034 JP Phone: +81-50-5806-6242 On 7 Jan 2011, at 00:05 , Edward Ned Harvey wrote:>> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- >> bounces at opensolaris.org] On Behalf Of Peter Taps >> >> Perhaps (Sha256+NoVerification) would work 99.999999% of the time. But > > Append 50 more 9''s on there. > 99.99999999999999999999999999999999999999999999999999999999% > > See below. > > >> I have been told that the checksum value returned by Sha256 is almost >> guaranteed to be unique. In fact, if Sha256 fails in some case, we have a >> bigger problem such as memory corruption, etc. Essentially, adding >> verification to sha256 is an overkill. > > Someone please correct me if I''m wrong. I assume ZFS dedup matches both the > blocksize and the checksum right? A simple checksum collision (which is > astronomically unlikely) is still not sufficient to produce corrupted data. > It''s even more unlikely than that. > > Using the above assumption, here''s how you calculate the probability of > corruption if you''re not using verification: > > Suppose every single block in your whole pool is precisely the same size > (which is unrealistic in the real world, but I''m trying to calculate worst > case.) Suppose the block is 4K, which is again, unrealistically worst case. > Suppose your dataset is purely random or sequential ... with no duplicated > data ... which is unrealisic because if your data is like that, then why in > the world are you enabling dedupe? But again, assuming worst case > scenario... At this point we''ll throw in some evil clowns, spit on a voodoo > priestess, and curse the heavens for some extra bad luck. > > If you have astronomically infinite quantities of data, then your > probability of corruption approaches 100%. With infinite data, eventually > you''re guaranteed to have a collision. So the probability of corruption is > directly related to the total amount of data you have, and the new question > is: For anything Earthly, how near are you to 0% probability of collision > in reality? > > Suppose you have 128TB of data. That is ... you have 2^35 unique 4k blocks > of uniformly sized data. Then the probability you have any collision in > your whole dataset is (sum(1 thru 2^35))*2^-256 > Note: sum of integers from 1 to N is (N*(N+1))/2 > Note: 2^35 * (2^35+1) = 2^35 * 2^35 + 2^35 = 2^70 + 2^35 > Note: (N*(N+1))/2 in this case = 2^69 + 2^34 > So the probability of data corruption in this case, is 2^-187 + 2^-222 ~> 5.1E-57 + 1.5E-67 > > ~= 5.1E-57 > > In other words, even in the absolute worst case, cursing the gods, running > without verification, using data that''s specifically formulated to try and > cause errors, on a dataset that I bet is larger than what you''re doing, ... > > Before we go any further ... The total number of bits stored on all the > storage in the whole planet is a lot smaller than the total number of > molecules in the planet. > > There are estimated 8.87 * 10^49 molecules in planet Earth. > > The probability of a collision in your worst-case unrealistic dataset as > described, is even 100 million times less likely than randomly finding a > single specific molecule in the whole planet Earth by pure luck. > > > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss-------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 3662 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20110107/7e65c15a/attachment.bin>
Michael DeMan
2011-Jan-07 06:42 UTC
[zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)
At the end of the day this issue essentially is about mathematical improbability versus certainty? To be quite honest, I too am skeptical about about using de-dupe just based on SHA256. In prior posts it was asked that the potential adopter of the technology provide the mathematical reason to NOT use SHA-256 only. However, if Oracle believes that it is adequate to do that, would it be possible for somebody to provide: (A) The theoretical documents and associated mathematics specific to say one simple use case? (A1) Total data size is 1PB (lets say the zpool is 2PB to not worry about that part of it). (A2) Daily, 10TB of data is updated, 1TB of data is deleted, and 1TB of data is ''new''. (A3) Out of the dataset, 25% of the data is capable of being de-duplicated (A4) Between A2 and A3 above, the 25% rule from A3 also applies to everything in A2. I think the above would be a pretty ''soft'' case for justifying the case that SHA-256 works? I would presume some kind of simple kind of scenario mathematically has been run already by somebody inside Oracle/Sun long ago when first proposing that ZFS be funded internally at all? Then - there is the other side of things. The ''black swan'' event. At some point, given percentages on a scenario like the example case above, one simply has to make the business justification case internally at their own company about whether to go SHA-256 only or Fletcher+Verification? Add Murphy''s Law to the ''black swan event'' and of course the only data that is lost is that .01% of your data that is the most critical? Not trying to be aggressive or combative here at all against peoples opinions and understandings of it all - I would just like to see some hard information about it all - it must exist somewhere already? Thanks, - Mike On Jan 6, 2011, at 10:05 PM, Edward Ned Harvey wrote:>> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- >> bounces at opensolaris.org] On Behalf Of Peter Taps >> >> Perhaps (Sha256+NoVerification) would work 99.999999% of the time. But > > Append 50 more 9''s on there. > 99.99999999999999999999999999999999999999999999999999999999% > > See below. > > >> I have been told that the checksum value returned by Sha256 is almost >> guaranteed to be unique. In fact, if Sha256 fails in some case, we have a >> bigger problem such as memory corruption, etc. Essentially, adding >> verification to sha256 is an overkill. > > Someone please correct me if I''m wrong. I assume ZFS dedup matches both the > blocksize and the checksum right? A simple checksum collision (which is > astronomically unlikely) is still not sufficient to produce corrupted data. > It''s even more unlikely than that. > > Using the above assumption, here''s how you calculate the probability of > corruption if you''re not using verification: > > Suppose every single block in your whole pool is precisely the same size > (which is unrealistic in the real world, but I''m trying to calculate worst > case.) Suppose the block is 4K, which is again, unrealistically worst case. > Suppose your dataset is purely random or sequential ... with no duplicated > data ... which is unrealisic because if your data is like that, then why in > the world are you enabling dedupe? But again, assuming worst case > scenario... At this point we''ll throw in some evil clowns, spit on a voodoo > priestess, and curse the heavens for some extra bad luck. > > If you have astronomically infinite quantities of data, then your > probability of corruption approaches 100%. With infinite data, eventually > you''re guaranteed to have a collision. So the probability of corruption is > directly related to the total amount of data you have, and the new question > is: For anything Earthly, how near are you to 0% probability of collision > in reality? > > Suppose you have 128TB of data. That is ... you have 2^35 unique 4k blocks > of uniformly sized data. Then the probability you have any collision in > your whole dataset is (sum(1 thru 2^35))*2^-256 > Note: sum of integers from 1 to N is (N*(N+1))/2 > Note: 2^35 * (2^35+1) = 2^35 * 2^35 + 2^35 = 2^70 + 2^35 > Note: (N*(N+1))/2 in this case = 2^69 + 2^34 > So the probability of data corruption in this case, is 2^-187 + 2^-222 ~> 5.1E-57 + 1.5E-67 > > ~= 5.1E-57 > > In other words, even in the absolute worst case, cursing the gods, running > without verification, using data that''s specifically formulated to try and > cause errors, on a dataset that I bet is larger than what you''re doing, ... > > Before we go any further ... The total number of bits stored on all the > storage in the whole planet is a lot smaller than the total number of > molecules in the planet. > > There are estimated 8.87 * 10^49 molecules in planet Earth. > > The probability of a collision in your worst-case unrealistic dataset as > described, is even 100 million times less likely than randomly finding a > single specific molecule in the whole planet Earth by pure luck. > > > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Bakul Shah
2011-Jan-07 08:05 UTC
[zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)
On Thu, 06 Jan 2011 22:42:15 PST Michael DeMan <solaris at deman.com> wrote:> To be quite honest, I too am skeptical about about using de-dupe just based o > n SHA256. In prior posts it was asked that the potential adopter of the tech > nology provide the mathematical reason to NOT use SHA-256 only. However, if > Oracle believes that it is adequate to do that, would it be possible for some > body to provide: > > (A) The theoretical documents and associated mathematics specific to say one > simple use case?See http://en.wikipedia.org/wiki/Birthday_problem -- in particular see section 5.1 and the probability table of section 3.4.> On Jan 6, 2011, at 10:05 PM, Edward Ned Harvey wrote: > > >> I have been told that the checksum value returned by Sha256 is almost > >> guaranteed to be unique. In fact, if Sha256 fails in some case, we have a > >> bigger problem such as memory corruption, etc. Essentially, adding > >> verification to sha256 is an overkill.Agreed.> > Someone please correct me if I''m wrong.OK :-)> > Suppose you have 128TB of data. That is ... you have 2^35 unique 4k block > s > > of uniformly sized data. Then the probability you have any collision in > > your whole dataset is (sum(1 thru 2^35))*2^-256 > > Note: sum of integers from 1 to N is (N*(N+1))/2 > > Note: 2^35 * (2^35+1) = 2^35 * 2^35 + 2^35 = 2^70 + 2^35 > > Note: (N*(N+1))/2 in this case = 2^69 + 2^34 > > So the probability of data corruption in this case, is 2^-187 + 2^-222 ~> > 5.1E-57 + 1.5E-67 > > > > ~= 5.1E-57I believe this is wrong. See the wikipedia article referenced above. p(n,d) = 1 - d!/(d^n*(d-n)!) In your example n = 2^35, d = 2^256. If you extrapolate the 256 bits row of the probability table of section 3.1, it is somewhere between 10^-48 and 10^-51. This may be easier to grasp: to get a 50% probability of a collision with sha256, you need 4*10^38 blocks. For a probability similar to disk error rates (10^-15), you need 1.5*10^31 blocks.
Darren J Moffat
2011-Jan-07 09:26 UTC
[zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)
On 06/01/2011 23:07, David Magda wrote:> On Jan 6, 2011, at 15:57, Nicolas Williams wrote: > >> Fletcher is faster than SHA-256, so I think that must be what you''re >> asking about: "can Fletcher+Verification be faster than >> Sha256+NoVerification?" Or do you have some other goal? > > Would running on recent T-series servers, which have have on-die crypto units, help any in this regard?The on chip SHA-256 implementation is not yet used see: http://blogs.sun.com/darren/entry/improving_zfs_dedup_performance_via "Note that the fix I integrated only uses a software implementation of SHA256 on the T5120 (UltraSPARC T2) and is not (yet) using the on CPU hardware implementation of SHA256. The reason for this is to do with boot time availability of the Solaris Cryptographic Framework and the need to have ZFS as the root filesystem." Not yet changed it turns out to be quite complicated to fix due to very early boot issues. -- Darren J Moffat
Sašo Kiselkov
2011-Jan-07 11:56 UTC
[zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)
On 01/07/2011 10:26 AM, Darren J Moffat wrote:> On 06/01/2011 23:07, David Magda wrote: >> On Jan 6, 2011, at 15:57, Nicolas Williams wrote: >> >>> Fletcher is faster than SHA-256, so I think that must be what you''re >>> asking about: "can Fletcher+Verification be faster than >>> Sha256+NoVerification?" Or do you have some other goal? >> >> Would running on recent T-series servers, which have have on-die >> crypto units, help any in this regard? > > The on chip SHA-256 implementation is not yet used see: > > http://blogs.sun.com/darren/entry/improving_zfs_dedup_performance_via > > "Note that the fix I integrated only uses a software implementation of > SHA256 on the T5120 (UltraSPARC T2) and is not (yet) using the on CPU > hardware implementation of SHA256. The reason for this is to do with > boot time availability of the Solaris Cryptographic Framework and the > need to have ZFS as the root filesystem." > > Not yet changed it turns out to be quite complicated to fix due to > very early boot issues. >Would it be difficult to implement both methods and allow ZFS to switch to the hardware-accelerated crypto backend at runtime after it has been brought up and initialized? It seems like one heck of a feature (essentially removing most of the computational complexity of dedup). -- Saso
Darren J Moffat
2011-Jan-07 12:15 UTC
[zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)
On 07/01/2011 11:56, Sa?o Kiselkov wrote:> On 01/07/2011 10:26 AM, Darren J Moffat wrote: >> On 06/01/2011 23:07, David Magda wrote: >>> On Jan 6, 2011, at 15:57, Nicolas Williams wrote: >>> >>>> Fletcher is faster than SHA-256, so I think that must be what you''re >>>> asking about: "can Fletcher+Verification be faster than >>>> Sha256+NoVerification?" Or do you have some other goal? >>> >>> Would running on recent T-series servers, which have have on-die >>> crypto units, help any in this regard? >> >> The on chip SHA-256 implementation is not yet used see: >> >> http://blogs.sun.com/darren/entry/improving_zfs_dedup_performance_via >> >> "Note that the fix I integrated only uses a software implementation of >> SHA256 on the T5120 (UltraSPARC T2) and is not (yet) using the on CPU >> hardware implementation of SHA256. The reason for this is to do with >> boot time availability of the Solaris Cryptographic Framework and the >> need to have ZFS as the root filesystem." >> >> Not yet changed it turns out to be quite complicated to fix due to >> very early boot issues. >> > Would it be difficult to implement both methods and allow ZFS to switch > to the hardware-accelerated crypto backend at runtime after it has been > brought up and initialized? It seems like one heck of a featureWither it is difficult or not depends on your level of familiarity with ZFS, boot and the cryptographic framework ;-) For me no it wouldn''t be difficult but it still isn''t completely trivial.> (essentially removing most of the computational complexity of dedup).Most of the data I''ve seen on the performance impact of dedup is not coming from the SHA256 computation it is mostly about the additional IO to deal with the DDT. Though lowering the overhead that SHA256 does add is always a good thing. -- Darren J Moffat
Sašo Kiselkov
2011-Jan-07 12:36 UTC
[zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)
On 01/07/2011 01:15 PM, Darren J Moffat wrote:> On 07/01/2011 11:56, Sa?o Kiselkov wrote: >> On 01/07/2011 10:26 AM, Darren J Moffat wrote: >>> On 06/01/2011 23:07, David Magda wrote: >>>> On Jan 6, 2011, at 15:57, Nicolas Williams wrote: >>>> >>>>> Fletcher is faster than SHA-256, so I think that must be what you''re >>>>> asking about: "can Fletcher+Verification be faster than >>>>> Sha256+NoVerification?" Or do you have some other goal? >>>> >>>> Would running on recent T-series servers, which have have on-die >>>> crypto units, help any in this regard? >>> >>> The on chip SHA-256 implementation is not yet used see: >>> >>> http://blogs.sun.com/darren/entry/improving_zfs_dedup_performance_via >>> >>> "Note that the fix I integrated only uses a software implementation of >>> SHA256 on the T5120 (UltraSPARC T2) and is not (yet) using the on CPU >>> hardware implementation of SHA256. The reason for this is to do with >>> boot time availability of the Solaris Cryptographic Framework and the >>> need to have ZFS as the root filesystem." >>> >>> Not yet changed it turns out to be quite complicated to fix due to >>> very early boot issues. >>> >> Would it be difficult to implement both methods and allow ZFS to switch >> to the hardware-accelerated crypto backend at runtime after it has been >> brought up and initialized? It seems like one heck of a feature > > Wither it is difficult or not depends on your level of familiarity > with ZFS, boot and the cryptographic framework ;-) > > For me no it wouldn''t be difficult but it still isn''t completely trivial. > >> (essentially removing most of the computational complexity of dedup). > > Most of the data I''ve seen on the performance impact of dedup is not > coming from the SHA256 computation it is mostly about the additional > IO to deal with the DDT. Though lowering the overhead that SHA256 > does add is always a good thing. >Well, seeing as all mainline ZFS development is now happening behind closed doors, all I can really do is ask for features and hope Oracle implements them :-). Nevertheless, thanks for the clarification. BR, -- Saso
Edward Ned Harvey
2011-Jan-07 13:10 UTC
[zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)
> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- > bounces at opensolaris.org] On Behalf Of Bakul Shah > > See http://en.wikipedia.org/wiki/Birthday_problem -- in > particular see section 5.1 and the probability table of > section 3.4.They say "The expected number of n-bit hashes that can be generated before getting a collision is not 2^n, but rather only 2^(n/2)." While that is not necessarily incorrect, the operative word is "expected." They''re looking for some probability threshold... I did not assign any particular threshold to call something "expected" or "not expected." I simply calculated the probability. And compared it to the number of molecules in Earth. ;-) Two paragraphs earlier, they show a formula, "The generic [probability] can be derived using..." p(n:d) ~= 1 - ( ( (d-1)/d ) ^ ( n*(n-1)/2 ) ) In the case that I looked at, d=2^256 and n=2^35 p(n:d) ~= ... unfortunately excel calculated this so small it resulted in "0"
David Magda
2011-Jan-07 13:56 UTC
[zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)
On Fri, January 7, 2011 04:26, Darren J Moffat wrote:> On 06/01/2011 23:07, David Magda wrote: > >> Would running on recent T-series servers, which have have on-die crypto >> units, help any in this regard? > > The on chip SHA-256 implementation is not yet used see: > > http://blogs.sun.com/darren/entry/improving_zfs_dedup_performance_via > > "Note that the fix I integrated only uses a software implementation of > SHA256 on the T5120 (UltraSPARC T2) and is not (yet) using the on CPU > hardware implementation of SHA256. The reason for this is to do with > boot time availability of the Solaris Cryptographic Framework and the > need to have ZFS as the root filesystem." > > Not yet changed it turns out to be quite complicated to fix due to very > early boot issues.Out of curiosity, would a two-stage system be practical: use the software-only code for boot, and then switch to hardware assistance (if available) later on? Generally a few extra CPUs cycles wouldn''t be a big deal on start up (not much else running), but once you get to multi-user mode, the hardware should be ready to go and the CPU offloading would be more pertinent. It''s necessary to start off in software-only mode to a certain extent anyway, since there''s still plenty of machines out there without crypto (the M-series servers, as well as x86).
David Magda
2011-Jan-07 14:13 UTC
[zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)
On Fri, January 7, 2011 01:42, Michael DeMan wrote:> Then - there is the other side of things. The ''black swan'' event. At > some point, given percentages on a scenario like the example case above, > one simply has to make the business justification case internally at their > own company about whether to go SHA-256 only or Fletcher+Verification? > Add Murphy''s Law to the ''black swan event'' and of course the only data > that is lost is that .01% of your data that is the most critical?The other thing to note is that by default (with de-dupe disabled), ZFS uses Fletcher checksums to prevent data corruption. Add also the fact all other file systems don''t have any checksums, and simply rely on the fact that disks have a bit error rate of (at best) 10^-16. Given the above: most people are content enough to trust Fletcher to not have data corruption, but are worried about SHA-256 giving ''data corruption'' when it comes de-dupe? The entire rest of the computing world is content to live with 10^-15 (for SAS disks), and yet one wouldn''t be prepared to have 10^-30 (or better) for dedupe? I certainly can understand caution, but given the numbers involved, you''re more likely to be hit by lighting a few times in a lifetime before a collision occurs for a reasonably sized ZFS pool. :)
Casper.Dik at Oracle.COM
2011-Jan-07 14:21 UTC
[zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)
>On Fri, January 7, 2011 01:42, Michael DeMan wrote: >> Then - there is the other side of things. The ''black swan'' event. At >> some point, given percentages on a scenario like the example case above, >> one simply has to make the business justification case internally at their >> own company about whether to go SHA-256 only or Fletcher+Verification? >> Add Murphy''s Law to the ''black swan event'' and of course the only data >> that is lost is that .01% of your data that is the most critical? > >The other thing to note is that by default (with de-dupe disabled), ZFS >uses Fletcher checksums to prevent data corruption. Add also the fact all >other file systems don''t have any checksums, and simply rely on the fact >that disks have a bit error rate of (at best) 10^-16. > >Given the above: most people are content enough to trust Fletcher to not >have data corruption, but are worried about SHA-256 giving ''data >corruption'' when it comes de-dupe? The entire rest of the computing world >is content to live with 10^-15 (for SAS disks), and yet one wouldn''t be >prepared to have 10^-30 (or better) for dedupe?I would; we''re not talking about flipping bits the OS comparing data using just the checksums and replacing one set with another. You might want to create a file to show how weak fletcher really is but two such files wouldn''t be properly stored on a de-dup zpool unless you verify. Casper
Michael DeMan
2011-Jan-07 14:39 UTC
[zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)
On Jan 7, 2011, at 6:13 AM, David Magda wrote:> On Fri, January 7, 2011 01:42, Michael DeMan wrote: >> Then - there is the other side of things. The ''black swan'' event. At >> some point, given percentages on a scenario like the example case above, >> one simply has to make the business justification case internally at their >> own company about whether to go SHA-256 only or Fletcher+Verification? >> Add Murphy''s Law to the ''black swan event'' and of course the only data >> that is lost is that .01% of your data that is the most critical? > > The other thing to note is that by default (with de-dupe disabled), ZFS > uses Fletcher checksums to prevent data corruption. Add also the fact all > other file systems don''t have any checksums, and simply rely on the fact > that disks have a bit error rate of (at best) 10^-16. >Agreed - but I think it is still missing the point of what the original poster was asking about. In all honesty I think the debate is a business decision - the highly improbable vs. certainty. Somebody somewhere must have written this stuff up, along with simple use cases? Perhaps even a new acronym? MTBC - mean time before collision? And even with the ''certainty'' factor being the choice - other things like human error come in to play and are far riskier?
Nicolas Williams
2011-Jan-07 15:00 UTC
[zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)
On Fri, Jan 07, 2011 at 06:39:51AM -0800, Michael DeMan wrote:> On Jan 7, 2011, at 6:13 AM, David Magda wrote: > > The other thing to note is that by default (with de-dupe disabled), ZFS > > uses Fletcher checksums to prevent data corruption. Add also the fact all > > other file systems don''t have any checksums, and simply rely on the fact > > that disks have a bit error rate of (at best) 10^-16. > > Agreed - but I think it is still missing the point of what the > original poster was asking about. > > In all honesty I think the debate is a business decision - the highly > improbable vs. certainty.The OP seemed to be concerned that SHA-256 is particularly slow, so the business decision here would involve a performance vs. error rate trade-off. Now, unless you have highly deduplicatious data, a workload with a high cache hit ratio in the ARC for DDT entries, and a fast ZIL device, I suspect that the I/O costs of dedup dominate the cost of the hash function, which means: the above business trade-off is not worthwhile as one would be trading an tiny uptick in error rates for small uptick in performance. Before you even get to where you''re making such a decision you''ll want to have invested in plenty of RAM, L2ARC and fast ZIL device capacity -- and for those making such that investment I suspect that the OP''s trade-off won''t seem worthwhile. BTW, note that verification isn''t guaranteed to have a zero error rate... Imagine a) a block being written collides with a different block already in the pool, b) bit rot on disk in that colliding block such that the on-disk block matches the new block, c) on a mirrored vdev such that you might get one or another version of the block in question, randomly. Such an error requires monumentally bad luck to happen at all. Nico --
Robert Milkowski
2011-Jan-07 19:33 UTC
[zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)
On 01/ 7/11 02:13 PM, David Magda wrote:> > Given the above: most people are content enough to trust Fletcher to not > have data corruption, but are worried about SHA-256 giving ''data > corruption'' when it comes de-dupe? The entire rest of the computing world > is content to live with 10^-15 (for SAS disks), and yet one wouldn''t be > prepared to have 10^-30 (or better) for dedupe? >I think you are do not understand entirely the problem. Lets say two different blocks A and B have the same sha256 checksum, A is already stored in a pool, B is being written. Without verify and dedup enabled B won''t be written. Next time you ask for block B you will actually end-up with the block A. Now if B is relatively common in your data set you have a relatively big impact on many files because of one corrupted block (additionally from a fs point of view this is a silent data corruption). Without dedup if you get a single block corrupted silently an impact usually will be relatively limited. Now what if block B is a meta-data block? The point is that a potential impact of a hash collision is much bigger than a single silent data corruption to a block, not to mention that dedup or not all the other possible cases of data corruption are there anyway, adding yet another one might or might not be acceptable. -- Robert Milkowski http://milek.blogspot.com
David Magda
2011-Jan-07 20:37 UTC
[zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)
On Fri, January 7, 2011 14:33, Robert Milkowski wrote:> On 01/ 7/11 02:13 PM, David Magda wrote: >> >> Given the above: most people are content enough to trust Fletcher to not >> have data corruption, but are worried about SHA-256 giving ''data >> corruption'' when it comes de-dupe? The entire rest of the computing >> world >> is content to live with 10^-15 (for SAS disks), and yet one wouldn''t be >> prepared to have 10^-30 (or better) for dedupe? >> > > I think you are do not understand entirely the problem.[...] I was aware of all of that.> Now what if block B is a meta-data block? > > The point is that a potential impact of a hash collision is much bigger > than a single silent data corruption to a block, not to mention that > dedup or not all the other possible cases of data corruption are there > anyway, adding yet another one might or might not be acceptable.And my point is that people are walking around the file hierarchy tree completely "blind" for most other file systems and are not worry about it all. They could be grabbing inappropriate data at a rate that is several orders of magnitude more likely than a hash collision and sleeping fine at night, and yet are worried about something that is 10^-20 less likely. Personally the potential impact of being hit by lightening is much bigger to me than anything that happens in at my job, and the odds of that are about 1 in 170,000 (2^-17 to 2^-18). If I''m not worried about that, why the hell should I be worried about something that is 2^-128? If then-Sun/now-Oracle thought it was a problem they wouldn''t have made "on" an alias to "sha256".
Pawel Jakub Dawidek
2011-Jan-07 21:02 UTC
[zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)
On Fri, Jan 07, 2011 at 07:33:53PM +0000, Robert Milkowski wrote:> On 01/ 7/11 02:13 PM, David Magda wrote: > > > >Given the above: most people are content enough to trust Fletcher to not > >have data corruption, but are worried about SHA-256 giving ''data > >corruption'' when it comes de-dupe? The entire rest of the computing world > >is content to live with 10^-15 (for SAS disks), and yet one wouldn''t be > >prepared to have 10^-30 (or better) for dedupe? > > > > I think you are do not understand entirely the problem. > Lets say two different blocks A and B have the same sha256 checksum, A > is already stored in a pool, B is being written. Without verify and > dedup enabled B won''t be written. Next time you ask for block B you will > actually end-up with the block A. Now if B is relatively common in your > data set you have a relatively big impact on many files because of one > corrupted block (additionally from a fs point of view this is a silent > data corruption). [...]All true, that''s why verification was mandatory for fletcher, which is not cryptographically strong hash. Until SHA256 is no broken, wasting power for verification is just a waste of resources, which isn''t "green":) Once SHA256 is broken, verification can be turned on.> [...] Without dedup if you get a single block corrupted > silently an impact usually will be relatively limited.Except when corruption happens on write, not read, ie. you write data, it is corrupted on the fly, but corrupted version still matches fletcher checksum (the default now). Now every read of this block will return silently corrupted data.> Now what if block B is a meta-data block?Metadata is not deduplicated.> The point is that a potential impact of a hash collision is much bigger > than a single silent data corruption to a block, not to mention that > dedup or not all the other possible cases of data corruption are there > anyway, adding yet another one might or might not be acceptable.I''m more in opinion that it was mistake that the verification feature wasn''t removed along with fletcher-for-dedup removal. It is good to be able to turn on verification once/if SHA256 will be broken - that''s the only reason I''ll leave it, but I somehow feel that there are bigger chances you can corrupt your data because of extra code complexity coming with verification than because of SHA256 collision. -- Pawel Jakub Dawidek http://www.wheelsystems.com pjd at FreeBSD.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 196 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20110107/7f4307b0/attachment.bin>
Brandon High
2011-Jan-07 23:06 UTC
[zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)
On Fri, Jan 7, 2011 at 11:33 AM, Robert Milkowski <milek at task.gda.pl> wrote:> end-up with the block A. Now if B is relatively common in your data set you > have a relatively big impact on many files because of one corrupted block > (additionally from a fs point of view this is a silent data corruption). > Without dedup if you get a single block corrupted silently an impact usually > will be relatively limited.A pool can be configures so that a dedup''d block will only be referenced a certain number of times. So if you write out 10,000 identical blocks, it may be written 10 times with each duplicate referenced 1,000 times. The exact number is controlled by the dedupditto property for your pool, and you should set it as your risk tolerance allows. -B -- Brandon High : bhigh at freaks.com
Robert Milkowski
2011-Jan-08 12:02 UTC
[zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)
On 01/ 7/11 09:02 PM, Pawel Jakub Dawidek wrote:> On Fri, Jan 07, 2011 at 07:33:53PM +0000, Robert Milkowski wrote: > >> Now what if block B is a meta-data block? > Metadata is not deduplicated.Good point but then it depends on a perspective. What if you you are storing lots of VMDKs? One corrupted block which is shared among hundreds of VMDKs will affect all of them. And it might be a block containing meta-data information within vmdk... Anyway, green or not, imho if in a given environment turning verification on still delivers acceptable performance then I would basically turn it on. In other environments it is about risk assessment. Best regards, Robert Milkowski
Edward Ned Harvey
2011-Jan-08 17:59 UTC
[zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)
> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- > bounces at opensolaris.org] On Behalf Of Robert Milkowski > > What if you you are storing lots of VMDKs? > One corrupted block which is shared among hundreds of VMDKs will affect > all of them. > And it might be a block containing meta-data information within vmdk...Although the probability of hash collision is astronomically infinitesimally small, if it were to happen, the damage could be expansive and unrecoverable. Even backups could not protect you, because the corruption would be replicated undetected into your backups too. Just like other astronomical events (meteors, supernova, etc) which could destroy all your data, all your backups, and kill everyone you know, if these risks are not acceptable to you, you need to take precautions against it. But you have to weigh the odds of damage versus the cost of protection. Admittedly, precautions against nuclear strike are more costly to implement than precaution against hash collision (enabling verification is a trivial task). But that does not mean enabling verification comes without cost. Has anybody measured the cost of enabling or disabling verification? The cost of disabling verification is an infinitesimally small number multiplied by possibly all your data. Basically lim->0 times lim->infinity. This can only be evaluated on a case-by-case basis and there''s no use in making any more generalizations in favor or against it. The benefit of disabling verification would presumably be faster performance. Has anybody got any measurements, or even calculations or vague estimates or clueless guesses, to indicate how significant this is? How much is there to gain by disabling verification?
Bob Friesenhahn
2011-Jan-08 23:04 UTC
[zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)
On Thu, 6 Jan 2011, David Magda wrote:> > If you''re not worried about disk read errors (and/or are not experiencing > them), then you shouldn''t be worried about has collisions.Except for the little problem that if there is a collision then there will always be a collision for the same data and it is unavoidable. :-) Bit rot is a different sort of problem. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Pawel Jakub Dawidek
2011-Jan-09 19:16 UTC
[zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)
On Fri, Jan 07, 2011 at 03:06:26PM -0800, Brandon High wrote:> On Fri, Jan 7, 2011 at 11:33 AM, Robert Milkowski <milek at task.gda.pl> wrote: > > end-up with the block A. Now if B is relatively common in your data set you > > have a relatively big impact on many files because of one corrupted block > > (additionally from a fs point of view this is a silent data corruption). > > Without dedup if you get a single block corrupted silently an impact usually > > will be relatively limited. > > A pool can be configures so that a dedup''d block will only be > referenced a certain number of times. So if you write out 10,000 > identical blocks, it may be written 10 times with each duplicate > referenced 1,000 times. The exact number is controlled by the > dedupditto property for your pool, and you should set it as your risk > tolerance allows.Dedupditto doesn''t work exactly that way. You can have at most 3 copies of your block. Dedupditto minimal value is 100. The first copy is created on first write, the second copy is created on dedupditto references and the third copy is created on ''dedupditto * dedupditto'' references. So once you reach 10000 references of your block ZFS will create three physical copies, not earlier and never more than three. -- Pawel Jakub Dawidek http://www.wheelsystems.com pjd at FreeBSD.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 196 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20110109/c25b06b0/attachment.bin>
Edward Ned Harvey
2011-Jan-10 00:27 UTC
[zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)
> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- > bounces at opensolaris.org] On Behalf Of Pawel Jakub Dawidek > > Dedupditto doesn''t work exactly that way. You can have at most 3 copies > of your block. Dedupditto minimal value is 100. The first copy is > created on first write, the second copy is created on dedupditto > references and the third copy is created on ''dedupditto * dedupditto'' > references. So once you reach 10000 references of your block ZFS will > create three physical copies, not earlier and never more than three.What is the point of dedupditto? If there is a block on disk, especially on a pool with redundancy so it can safely be assumed good now and for the future... Why store the multiples? Even if it is a maximum of 3, I presently only see the sense in a maximum of 1.
Peter Taps
2011-Jan-10 06:54 UTC
[zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)
Thank you all for your help. I am the OP. I haven''t looked at the link that talks about the probability of collision. Intuitively, I still wonder how the chances of collision can be so low. We are reducing a 4K block to just 256 bits. If the chances of collision are so low, *theoretically* it is possible to reconstruct the original block from the 256-bit signature by using a simple lookup. Essentially, we would now have world''s best compression algorithm irrespective of whether the data is text or binary. This is hard to digest. Peter -- This message posted from opensolaris.org
Eric D. Mudama
2011-Jan-10 07:41 UTC
[zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)
On Sun, Jan 9 at 22:54, Peter Taps wrote:> Thank you all for your help. I am the OP. > > I haven''t looked at the link that talks about the probability of > collision. Intuitively, I still wonder how the chances of collision > can be so low. We are reducing a 4K block to just 256 bits. If the > chances of collision are so low, *theoretically* it is possible to > reconstruct the original block from the 256-bit signature by using a > simple lookup. Essentially, we would now have world''s best > compression algorithm irrespective of whether the data is text or > binary. This is hard to digest. > > Peter"simple" lookup isn''t so simple when there are 2^256 records to search, however, fundamentally your understanding of hashes is correct. That being said, while at some point people might identify two commonly-used blocks with the same hash (e.g. system library files or other) the odds of it happening are extremely low. Random google-result website calculates you as needing ~45 exabytes in your pool of 4KB chunk deduped data before you get to a ~10^-17 chance of a hash collision: http://www.backupcentral.com/mr-backup-blog-mainmenu-47/13-mr-backup-blog/145-de-dupe-hash-collisions.html Now, obviously the above is in the context of having to restore from backup, which is rare, however in live usage I don''t think the math changes a whole lot. -- Eric D. Mudama edmudama at mail.bounceswoosh.org
Pawel Jakub Dawidek
2011-Jan-10 09:23 UTC
[zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)
On Sun, Jan 09, 2011 at 07:27:52PM -0500, Edward Ned Harvey wrote:> > From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- > > bounces at opensolaris.org] On Behalf Of Pawel Jakub Dawidek > > > > Dedupditto doesn''t work exactly that way. You can have at most 3 copies > > of your block. Dedupditto minimal value is 100. The first copy is > > created on first write, the second copy is created on dedupditto > > references and the third copy is created on ''dedupditto * dedupditto'' > > references. So once you reach 10000 references of your block ZFS will > > create three physical copies, not earlier and never more than three. > > What is the point of dedupditto? If there is a block on disk, especially on > a pool with redundancy so it can safely be assumed good now and for the > future... Why store the multiples? Even if it is a maximum of 3, I > presently only see the sense in a maximum of 1.Well, I find it quite reasonable. If your block is referenced 100 times, it is probably quite important. There are many corruption possibilities that can destroy your data. Imagine memory error, which corrupts io_offset in write zio structure and corrupted io_offset points at your deduped block referenced 100 times. It will be overwritten and redundancy won''t help you. You will be able to detect corruption on read, but it will be too late. Note, that deduped data is not alone here. Pool-wide metadata are stored ''copies+2'' times (but no more than three) and dataset-wide metadata are stored ''copies+1'' times (but no more than three), so by default pool metadata have three copies and dataset metadata have two copies, AFAIR. When you lose root node of a tree, you lose all your data, are you really, really sure only one copy is enough? -- Pawel Jakub Dawidek http://www.wheelsystems.com pjd at FreeBSD.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 196 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20110110/ccb5b354/attachment-0001.bin>
Robert Milkowski
2011-Jan-10 10:25 UTC
[zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)
On 01/ 8/11 05:59 PM, Edward Ned Harvey wrote:> > Has anybody measured the cost of enabling or disabling verification? > > The cost of disabling verification is an infinitesimally small number > multiplied by possibly all your data. Basically lim->0 times lim->infinity. > This can only be evaluated on a case-by-case basis and there''s no use in > making any more generalizations in favor or against it. > > The benefit of disabling verification would presumably be faster > performance. Has anybody got any measurements, or even calculations or > vague estimates or clueless guesses, to indicate how significant this is? > How much is there to gain by disabling verification? >Exactly my point and there isn''t one answer which fits all environments. In the testing I''m doing so far enabling/disabling verification doesn''t make any noticeable difference so I''m sticking to verify. But I have enough memory and such a workload that I see little physical reads going on. -- Robert Milkowski http://milek.blogspot.com
Pawel Jakub Dawidek
2011-Jan-10 11:43 UTC
[zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)
On Sat, Jan 08, 2011 at 12:59:17PM -0500, Edward Ned Harvey wrote:> Has anybody measured the cost of enabling or disabling verification?Of course there is no easy answer:) Let me explain how verification works exactly first. You try to write a block. You see that block is already in dedup table (it is already referenced). You read the block (maybe it is in ARC or in L2ARC). You compare read block with what you want to write. Based on the above: 1. If you have dedup on, but your blocks are not deduplicable at all, you will pay no price for verification, as there will be no need to compare anything. 2. If your data is highly deduplicable you will verify often. Now it depends if the data you need to read fits into your ARC/L2ARC or not. If it can be found in ARC, the impact will be small. If your pool is very large and you can''t count on ARC help, each write will be turned into a read. Also note an interesting property of dedup: if your data is highly deduplicable you can actually improve performance by avoiding data writes (and just increasing reference count). Let me show you three degenerated tests to compare options. I''m writing 64GB of zeros to a pool with dedup turned off, with dedup turned on and with dedup+verification turned on (I use SHA256 checksum everywhere): # zpool create -O checksum=sha256 tank ada{0,1,2,3} # time sh -c ''dd if=/dev/zero of=/tank/zero bs=1m count=65536; sync; zpool export tank'' 254,11 real 0,07 user 40,80 sys # zpool create -O checksum=sha256 -O dedup=on tank ada{0,1,2,3} # time sh -c ''dd if=/dev/zero of=/tank/zero bs=1m count=65536; sync; zpool export tank'' 154,60 real 0,05 user 37,10 sys # zpool create -O checksum=sha256 -O dedup=sha256,verify tank ada{0,1,2,3} # time sh -c ''dd if=/dev/zero of=/tank/zero bs=1m count=65536; sync; zpool export tank'' 173,43 real 0,02 user 38,41 sys As you can see in second and third test the data is of course in ARC, so the difference here is only because of data comparison (no extra reads are needed) and verification is 12% slower. This is of course silly test, but as you can see dedup (even with verification) is much faster than nodedup case, but this data is highly deduplicable:) # zpool list NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT tank 149G 8,58M 149G 0% 524288.00x ONLINE - -- Pawel Jakub Dawidek http://www.wheelsystems.com pjd at FreeBSD.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 196 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20110110/a0162bb6/attachment.bin>
David Magda
2011-Jan-10 14:36 UTC
[zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)
On Mon, January 10, 2011 02:41, Eric D. Mudama wrote:> On Sun, Jan 9 at 22:54, Peter Taps wrote: >> Thank you all for your help. I am the OP. >> >> I haven''t looked at the link that talks about the probability of >> collision. Intuitively, I still wonder how the chances of collision >> can be so low. We are reducing a 4K block to just 256 bits. If the >> chances of collision are so low, *theoretically* it is possible to >> reconstruct the original block from the 256-bit signature by using a >> simple lookup. Essentially, we would now have world''s best >> compression algorithm irrespective of whether the data is text or >> binary. This is hard to digest. > > "simple" lookup isn''t so simple when there are 2^256 records to > search, however, fundamentally your understanding of hashes is > correct.[...] It should also be noted that ZFS itself can "only" address 2^128 bytes (not even 4K ''records''), and supposedly to fill those 2^128 bytes it would take as much energy as it would take to boil the Earth''s oceans: http://blogs.sun.com/bonwick/entry/128_bit_storage_are_you So recording and looking up 2^256 records would be quite an accomplishment. It''s a lot of data. If the OP wants to know why the chances are so low, he''ll have to learn a bit about hash functions (which is what SHA-256 is): http://en.wikipedia.org/wiki/Hash_function http://en.wikipedia.org/wiki/Cryptographic_hash_function Knowing exactly how the math (?) works is not necessary, but understanding the principles would be useful if one wants to have a general picture as to why SHA-256 doesn''t need a verification step, and why it was chosen as one of the ZFS (dedupe) checksum options.
Edward Ned Harvey
2011-Jan-10 15:00 UTC
[zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)
> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- > bounces at opensolaris.org] On Behalf Of Peter Taps > > I haven''t looked at the link that talks about the probability ofcollision.> Intuitively, I still wonder how the chances of collision can be so low. Weare> reducing a 4K block to just 256 bits. If the chances of collision are solow,> *theoretically* it is possible to reconstruct the original block from the256-bit> signature by using a simple lookup. Essentially, we would now have world''s > best compression algorithm irrespective of whether the data is text or > binary. This is hard to digest.BTW, at work we do a lot of theoretical mathematics, and one day a few months ago, I was given the challenge to explore the concept of using a hashing algorithm as a form of compression, exactly as you said. The conclusion was: You can''t reverse-hash in order to reconstruct unknown original data, but you can do it (theoretically) if you have enough additional information about what constitutes valid original data. If you have a huge lookup table of all the possible original data blocks, then the hash can only be used to identify 2^(N-M) of them as possible candidates, and some additional technique is necessary to figure out precisely which one of those is the original data block. (N is the length of the data block in bits, and M is the length of the hash, in bits.) Hashing discards some of the original data. In fact, random data is generally uncompressible, so if you try to compress random data and end up with something smaller than the original, you can rest assured you''re not able to reconstruct. However, if you know something about the original... For example if you know the original is a valid text document written in English, then in all likelihood there is only one possible original block fitting that description and yielding the end hash result. Even if there is more than one original block which looks like valid English text and produces the same end hash, it is easy to choose which one is correct based on context... Since you presumably know the previous block and the subsequent block, you just choose the intermediate block which seamlessly continues to produce valid English grammar at the junctions with adjacent blocks. This technique can be applied to most types of clearly structured original data, but it cannot be applied to unstructured or unknown original data. So at best, hashing could be a special-case form of compression. To decompress would require near-infinite compute hours or a large lookup table to scan all the possible sets of inputs to find one which produces the end hash... So besides the fact that hashing is at best a specific form of compression requiring additional auxiliary information, it''s also impractical. To get this down to something reasonable, I considered using a 48MB lookup table for a 24-bit block of data (that''s 2^24 entries of 24 bits each), or a 16GB lookup table for a 32-bit block of data (2^32 entries of 32 bits each). Well, in order to get a compression ratio worth talking about, the hash size would have to be 3 bits or smaller. That''s a pretty big lookup table to decompress 3 bits into 24 or 32... And let''s face it ... 9:1 compression isn''t stellar for a text document. And the final nail in the coffin was: In order for this technique to be viable, as mentioned, the original data must be structured. For any set of structured original data, all the information which is necessary for the reverse-hash to identify valid data from the lookup table, could have been used instead to create a specialized compression algorithm which is equal or better than the reverse-hash. So reverse-hash decompression is actually the worst case algorithm for all the data types which it''s capable of working on. But yes, you''re right, it''s theoretically possible for specific cases, but not theoretically possible for the general case.
Edward Ned Harvey
2011-Jan-10 15:17 UTC
[zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)
> From: Pawel Jakub Dawidek [mailto:pjd at FreeBSD.org] > > Well, I find it quite reasonable. If your block is referenced 100 times, > it is probably quite important.If your block is referenced 1 time, it is probably quite important. Hence redundancy in the pool.> There are many corruption possibilities > that can destroy your data. Imagine memory error, which corrupts > io_offset in write zio structure and corrupted io_offset points at your > deduped block referenced 100 times. It will be overwritten and > redundancy won''t help you.All of the corruption scenarios which allow you to fail despite pool redundancy, also allow you to fail despite copies+N.> Note, that deduped data is not alone > here. Pool-wide metadata are stored ''copies+2'' times (but no more than > three) and dataset-wide metadata are stored ''copies+1'' times (but no > more than three), so by default pool metadata have three copies and > dataset metadata have two copies, AFAIR. When you lose root node of a > tree, you lose all your data, are you really, really sure only one copy > is enough?Interesting. But no. There is not only one copy as long as you have pool redundancy.
Edward Ned Harvey
2011-Jan-10 15:57 UTC
[zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)
> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- > bounces at opensolaris.org] On Behalf Of David Magda > > Knowing exactly how the math (?) works is not necessary, but understandingUnderstanding the math is not necessary, but it is pretty easy. And unfortunately it becomes kind of necessary because even when you tell somebody the odds of a collision are a zillion times smaller than the odds of our Sun exploding and destroying Earth, they still don''t believe you. The explanation of the math, again, is described in the wikipedia article "Birthday Problem," or stated a little more simply here: Given a finite pool of N items, pick one at random and return it to the pool. Pick another one. The odds of it being the same as the first are 1/N. Pick another one. The odds of it being the same as the first are 1/N, and the odds of it being the same as the 2nd are 1/N. So the odds of it matching any of the prior picks are 2/N. Pick another one. The odds of it being the same as any previous pick are 3/N. If you repeatedly draw M items out of the pool (plus the first draw), returning them each time, then the odds of any draw matching any other draw are: P = 1/N + 2/N +3/N + ... + M/N P = ( sum(1 to M) ) / N Note: If you google for "sum positive integers," you''ll find sum(1 to N) N * (N+1) / 2 P = M * (M+1) / 2N In the context of hash collisions in a zpool, M would be the number of data blocks in your zpool, and N would be all the possible hashes. A sha-256 hash has 256 bits, so N = 2^256 I described an excessively large worst-case zpool in my other email, which had 2^35 data blocks in it. So... M = 2^35 So the probability of any block hash colliding with any other hash in that case is 2^35 * (2^35+1) / (2*2^256) = ( 2^70 + 2^35 ) * 2^-257 = 2^-187 + 2^-222 ~= 5.1E-57 There are estimated 8.87 E 49 atoms in planet Earth. ( http://pages.prodigy.net/jhonig/bignum/qaearth.html ) The probability of a collision in your worst-case unrealistic dataset as described, is even 100 million times less likely than randomly finding a single specific atom in the whole planet Earth by pure luck.
Edward Ned Harvey
2011-Jan-11 01:35 UTC
[zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)
> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- > bounces at opensolaris.org] On Behalf Of Edward Ned Harvey > > ~= 5.1E-57Bah. My math is wrong. I was never very good at P&S. I''ll ask someone at work tomorrow to look at it and show me the folly. Wikipedia has it right, but I can''t evaluate numbers to the few-hundredth power in any calculator that I have handy.
Lassi Tuura
2011-Jan-11 12:26 UTC
[zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)
Hey there,>> ~= 5.1E-57 > > Bah. My math is wrong. I was never very good at P&S. I''ll ask someone at > work tomorrow to look at it and show me the folly. Wikipedia has it right, > but I can''t evaluate numbers to the few-hundredth power in any calculator > that I have handy.bc -l <<EOF scale=150 define bday(n, h) { return 1 - e(-(n^2)/(2*h)); } bday(2^35, 2^256) bday(2^35, 2^256) * 10^57 EOF Basically, ~5.1 * 10^-57. Seems your number was correct, although I am not sure how you arrived at it. It appears that number is ~ consistent with the tables on Wikipedia article on birthday attack (http://en.wikipedia.org/wiki/Birthday_attack): p=10^?15 for 256-bit hash collision requires 1.5*10^31 (O(2^103)) hashes, assuming the hashes are randomly distributed. That''s definitely improbable. For anecdotal evidence to the contrary, we once had apps using GUIDs. We did believe collisions would be extremely improbable. Yet GUID conflicts turned into recurring events, at worst several collisions a week. It may or may not be related - in our case it turned out the GUIDs weren''t anywhere near as random as people thought they would or should be. In retrospect we clearly used a flawed GUID generator which rendered our assumptions about randomness invalid. SHA256 on (even non-random) data is different, but do be careful about what you assume. Regards, Lassi
Edward Ned Harvey
2011-Jan-11 13:46 UTC
[zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)
> From: Lassi Tuura [mailto:lat at cern.ch] > > bc -l <<EOF > scale=150 > define bday(n, h) { return 1 - e(-(n^2)/(2*h)); } > bday(2^35, 2^256) > bday(2^35, 2^256) * 10^57 > EOF > > Basically, ~5.1 * 10^-57. > > Seems your number was correct, although I am not sure how you arrived at > it.The number was calculated correctly, unfortunately, the formula was not. :-( The correct formula is the one on wikipedia... p(n;d) ~= 1-( ((d-1)/d)^(0.5*n*(n-1)) ) Where n=2^35 and d=2^256 Unfortunately, bc isn''t good at exponentiating huge numbers. Observe: exponent=0.5*2^35*(2^35-1) exponent 590295810341525782528 (2^256-1)^exponent Runtime error (func=(main), adr=35): exponent too large in raise Even if I chop that down to something much much much smaller ... 99999 ... then bc simply hangs. It''s apparently looping 99999 times which will take forever. It''s so easy to calculate when I use the wrong formula, and difficult to calculate when I use the right formula. ;-) I''m hoping somebody at work knows how to do this in matlab or octave or something... Will try later today.> It appears that number is ~ consistent with the tables on Wikipedia article > on birthday attack (http://en.wikipedia.org/wiki/Birthday_attack): p=10^?15This is why, at first, I thought my formula (which is wrong) was equal to the formula on wikipedia. Because at least roughly speaking, they were coming up with believably similar numbers. But you can clearly see mine is wrong, for example, if you set n to something larger than the square root of d. According to my formula, if you choose 2^160 hashes (for example) out of the space of 2^256 hashes then you''re 100% certain to have a collision, which is blatantly wrong. An easier error to see is ... if you choose 4 out of a space of 10, then the probability of collision is 1.0. Which again, is clearly wrong.> For anecdotal evidence to the contrary, we once had apps using GUIDs. We > did > believe collisions would be extremely improbable. Yet GUID conflicts turned > into recurring events, at worst several collisions a week. It may or may not > be related - in our case it turned out the GUIDs weren''t anywhere near as > random as people thought they would or should be. In retrospect we clearly > used a flawed GUID generator which rendered our assumptions about > randomness > invalid. SHA256 on (even non-random) data is different, but do be careful > about what you assume.Whenever I hear anyone saying they have experienced collisions, just as your GUID problem, I question what precisely was colliding, and the answer is never any cryptographic hash. It''s either poor randomness, or a weak non-cryptographic hash, or something like that. Just as ... earlier in this thread I believe someone said they experienced rsync collisions... Which I consider to be irrelevant to the discussion of sha256. I certainly acknowledge that sha256 is not random, and yet it is fundamental to assume it''s random, or more accurately, unpredictable and evenly distributed. But since even the most dedicated specialized mathmaticians thus far haven''t been able to show anything to the contrary (at least not in public) the conclusion I''m drawing is: I trust the distribution characteristics as long as I don''t have malicious users or friendly users intentionally trying to specifically generate collisions specifically for sha256. Someday sha256 might become at least partially broken, but until that time, I trust it. I''d bet anything short of my life on it preventing accidental collisions... In fact, every day I risk my life on things that are more likely to kill me, so I guess I''d even bet my life on it, if I had anything to gain. Give me $1 and I''ll bet my life on sha256 not generating an accidental collision.
Edward Ned Harvey
2011-Jan-12 03:24 UTC
[zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)
For anyone who still cares: I''m calculating the odds of a sha256 collision in an extremely large zpool, containing 2^35 blocks of data, and no repetitions. The formula on wikipedia for the birthday problem is: p(n;d) ~= 1-( (d-1)/d )^( 0.5*n*(n-1) ) In this case, n=2^35 d=2^256 The problem is, this formula "does not compute" because n is so large. Fortunately x = e^ln(x) and so we''re able to use this technique to make the huge exponent instead, a huge multiplication. (Using the bc mathlib notation, the l(x) function is the natural log of x, and e(x) is e raised to the power of x) p(n;d) ~= 1-e( ( 0.5*n*(n-1)*l((d-1)/d) ) ) Using bc to calculate the answer: bc -l n=2^35 d=2^256 scale=1024 1-e( ( 0.5*n*(n-1)*l((d-1)/d) ) ) .0000000000000000000000000000000000000000000000000000000050978941154 I manually truncated here (precision goes out 1024 places). This is 5.1*10^-57 Note: I had to repeat the calculation many times, setting a larger and larger scale. The default scale of 20, and even 64 and 70 and 80 were not precise enough to produce a convergent answer around the -57th decimal place. So I just kept going larger, and in retrospect, anything over 100 would have been fine. I wrote 1024 above, so who cares. If you''ve been paying close attention you''ll recognize this is the same answer I originally calculated, but my original equation was in fact wrong. It just so happens that my original equation neglects the probability of collisions from previous events, so it is actually accurate whenever the probability of previous events is insignificant. It is merely luck that for the data size in question, my equation produced something that looked correct. It would produce a wrong calculation of probability for a larger value of n.
Edward Ned Harvey
2011-Jan-12 04:05 UTC
[zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)
In case you were wondering "how big is n before the probability of collision becomes remotely possible, slightly possible, or even likely?" Given a fixed probability of collision p, the formula to calculate n is: n = 0.5 + sqrt( ( 0.25 + 2*l(1-p)/l((d-1)/d) ) ) (That''s just the same equation as before, solved for n) p=0.000001 n=4.8*10^35 ~= 2^118 p=0.00001 n=1.5*10^36 ~= 2^120 p=0.0001 n=4.8*10^36 ~= 2^122 p=0.001 n=1.5*10^37 ~= 2^123 p=0.01 n=4.8*10^37 ~= 2^125 p=0.1 n=1.5*10^38 ~= 2^127 p=0.5 n=4.0*10^38 ~= 2^128 p=0.9 n=7.3*10^38 ~= 2^129 p=0.99 n=1.0*10^39 ~= 2^130 p=0.999 n=1.3*10^39 ~= 2^130 Recall that 2^256 ~= 1.15*10^77 Something somewhere says the n for "expected" collision happens around when the exponent is halved... Half of 77 is around 38, which is supported above. Half of 256 is 128, which is supported above. So as n is reduced exponentially from the "expected" point, the probability of collision exponentially approaches 0. You''ll notice for n larger than the "expected" point, the probability even more dramatically approaches 1. I cannot get my poor little computer to compute anything higher than 5.1*10^39, no matter how many 9''s I put on there. At this point, it becomes exponentially near impossible to avoid a collision.
Enrico Maria Crisostomo
2011-Jan-12 08:08 UTC
[zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)
Edward, this is OT but may I suggest you to use something like Wolfram Alpha to perform your calculations a bit more comfortably? -- Enrico M. Crisostomo On Jan 12, 2011, at 4:24, Edward Ned Harvey <opensolarisisdeadlongliveopensolaris at nedharvey.com> wrote:> For anyone who still cares: > > I''m calculating the odds of a sha256 collision in an extremely large zpool, > containing 2^35 blocks of data, and no repetitions. > > The formula on wikipedia for the birthday problem is: > p(n;d) ~= 1-( (d-1)/d )^( 0.5*n*(n-1) ) > > In this case, > n=2^35 > d=2^256 > > The problem is, this formula "does not compute" because n is so large. > Fortunately x = e^ln(x) and so we''re able to use this technique to make the > huge exponent instead, a huge multiplication. > > (Using the bc mathlib notation, the l(x) function is the natural log of x, > and e(x) is e raised to the power of x) > p(n;d) ~= 1-e( ( 0.5*n*(n-1)*l((d-1)/d) ) ) > > Using bc to calculate the answer: > bc -l > > n=2^35 > d=2^256 > scale=1024 > 1-e( ( 0.5*n*(n-1)*l((d-1)/d) ) ) > .0000000000000000000000000000000000000000000000000000000050978941154 > I manually truncated here (precision goes out 1024 places). This is > 5.1*10^-57 > > Note: I had to repeat the calculation many times, setting a larger and > larger scale. The default scale of 20, and even 64 and 70 and 80 were not > precise enough to produce a convergent answer around the -57th decimal > place. So I just kept going larger, and in retrospect, anything over 100 > would have been fine. I wrote 1024 above, so who cares. > > If you''ve been paying close attention you''ll recognize this is the same > answer I originally calculated, but my original equation was in fact wrong. > It just so happens that my original equation neglects the probability of > collisions from previous events, so it is actually accurate whenever the > probability of previous events is insignificant. It is merely luck that for > the data size in question, my equation produced something that looked > correct. It would produce a wrong calculation of probability for a larger > value of n. > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Edward Ned Harvey
2011-Jan-12 14:04 UTC
[zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)
> Edward, this is OT but may I suggest you to use something like WolframAlpha> to perform your calculations a bit more comfortably?Wow, that''s pretty awesome. Thanks.
Peter Taps
2011-Jan-14 19:32 UTC
[zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)
Ed, Thank you for sharing the calculations. In lay terms, for Sha256, how many blocks of data would be needed to have one collision? Assuming each block is 4K is size, we probably can calculate the final data size beyond which the collision may occur. This would enable us to make the following statement: "With Sha256, you need verification to be turned on only if you are dealing with more than xxxT of data." Also, another related question. Why 256 bits was chosen and not 128 bits or 512 bits? I guess Sha512 may be an overkill. In your formula, how many blocks of data would be needed to have one collision using Sha128? Appreciate your help. Regards, Peter -- This message posted from opensolaris.org
Peter Taps
2011-Jan-14 19:40 UTC
[zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)
I am posting this once again as my previous post went into the middle of the thread and may go unnoticed. Ed, Thank you for sharing the calculations. In lay terms, for Sha256, how many blocks of data would be needed to have one collision? Assuming each block is 4K is size, we probably can calculate the final data size beyond which the collision may occur. This would enable us to make the following statement: "With Sha256, you need verification to be turned on only if you are dealing with more than xxxT of data." Also, another related question. Why 256 bits was chosen and not 128 bits or 512 bits? I guess Sha512 may be an overkill. In your formula, how many blocks of data would be needed to have one collision using Sha128? Appreciate your help. Regards, Peter -- This message posted from opensolaris.org
David Magda
2011-Jan-15 00:16 UTC
[zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)
On Jan 14, 2011, at 14:32, Peter Taps wrote:> Also, another related question. Why 256 bits was chosen and not 128 bits or 512 bits? I guess Sha512 may be an overkill. In your formula, how many blocks of data would be needed to have one collision using Sha128?There are two ways to get 128 bits: use a 128-bit function (e.g., MD5), or use a longer function and truncate its output. In the case of MD5, it has been depreciated for a while now because of collisions. [1] Similarly 160-bit hash functions are getting collisions as well (SHA-1). [2] So the next step up is generally 256 (though there are a few 224-bit-output hashes out there). However, if you''re going to use to 256-bit hash functions, why throw away half of security if you''ve already done all the work to get those 256 bits? Might as well use all the bits and get extra security. Using a 512-bit hash function was probably deemed as "too expensive" for CPUs at this time. There''s also the fact that things are a bit in flux currently, as there''s a competition to find the official (US) ''next generation'' hash function. [3] And while it official-ness is mostly a US (military) thing, it will probably become a de facto standard in many other countries and industries. For the proverbial "sha128", you''d roughly need only half the blocks of data before getting a collision as compared to SHA-256. The math is left as an exercise for the reader. [1] http://en.wikipedia.org/wiki/MD5#Security [2] http://en.wikipedia.org/wiki/SHA-1#SHA-1 [3] http://en.wikipedia.org/wiki/SHA-3
Pawel Jakub Dawidek
2011-Jan-15 13:56 UTC
[zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)
On Fri, Jan 14, 2011 at 11:32:58AM -0800, Peter Taps wrote:> Ed, > > Thank you for sharing the calculations. In lay terms, for Sha256, how many blocks of data would be needed to have one collision? > > Assuming each block is 4K is size, we probably can calculate the final data size beyond which the collision may occur. This would enable us to make the following statement: > > "With Sha256, you need verification to be turned on only if you are dealing with more than xxxT of data."Except that this is wrong question to ask. The question you can ask is "How many blocks of data do I need so collision probability is <X>%?".> Also, another related question. Why 256 bits was chosen and not 128 bits or 512 bits? I guess Sha512 may be an overkill. In your formula, how many blocks of data would be needed to have one collision using Sha128?There is no SHA128 and SHA512 has too long hash. Currently the maximum hash ZFS can handle is 32 bytes (256 bits). Wasting another 32 bytes without improving anything in practise wasn''t probably worth it. BTW. As for SHA512 being slower it looks like it depends on implementation or SHA512 is faster to compute on 64bit CPU. On my laptop OpenSSL computes SHA256 55% _slower_ than SHA512. If this is a general rule, maybe it will be worth considering using SHA512 truncated to 256 bits to get more speed. -- Pawel Jakub Dawidek http://www.wheelsystems.com pjd at FreeBSD.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 196 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20110115/5ea12e5d/attachment.bin>
Edward Ned Harvey
2011-Jan-15 14:37 UTC
[zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)
> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- > bounces at opensolaris.org] On Behalf Of Peter Taps > > Thank you for sharing the calculations. In lay terms, for Sha256, how many > blocks of data would be needed to have one collision?There is no point in making a generalization and a recommended "best practice." Because it''s all just a calculation of probabilities and everyone will draw the line differently. My personal recommendation is to enable verification at work, just so you can never be challenged or have to try convincing your boss about something that is obvious to you. The main value of verification is that you don''t need to explain anything to anyone. Two blocks would be needed to have one collision, at a probability of 2^-256 which is 10^-78. This is coincidentally very near the probability of randomly selecting the same atom in the observable universe twice consecutively. The more blocks, the higher the probability, and that''s all there is to it (unless someone is intentionally trying to cause collisions with data generated specifically and knowledgeably for that express purpose). When you start reaching 2^128 (which is 10^38) blocks it becomes likely you have a collision. Every data pool in the world is someplace in between 2 and 2^128 blocks. Smack in the middle of the region where the probability is distinctly more likely than randomly selecting the same atom of the universe twice, and less likely than an armageddon caused by earth asteroid collision.
Bob Friesenhahn
2011-Jan-15 16:19 UTC
[zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)
On Fri, 14 Jan 2011, Peter Taps wrote:> Thank you for sharing the calculations. In lay terms, for Sha256, > how many blocks of data would be needed to have one collision?Two. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Nicolas Williams
2011-Jan-17 19:41 UTC
[zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)
On Sat, Jan 15, 2011 at 10:19:23AM -0600, Bob Friesenhahn wrote:> On Fri, 14 Jan 2011, Peter Taps wrote: > > >Thank you for sharing the calculations. In lay terms, for Sha256, > >how many blocks of data would be needed to have one collision? > > Two.Pretty funny. In this thread some of you are treating SHA-256 as an idealized hash function. The odds of accidentally finding collisions in an idealized 256-bit hash function are minute because the distribution of hash function outputs over inputs is random (or, rather, pseudo-random). But cryptographic hash functions are generally only approximations of idealized hash functions. There''s nothing to say that there aren''t pathological corner cases where a given hash function produces lots of collisions that would be semantically meaningful to people -- i.e., a set of inputs over which the outputs are not randomly distributed. Now, of course, we don''t know of such pathological corner cases for SHA-256, but not that long ago we didn''t know of any for SHA-1 or MD5 either. The question of whether disabling verification would improve performance is pretty simple: if you have highly deduplicatious, _synchronous_ (or nearly so, due to frequent fsync()s or NFS close operations) writes, and the "working set" did not fit in the ARC nor L2ARC, then yes, disabling verification will help significantly, by removing an average of at least half a disk rotation from the write latency. Or if you have the same work load but with asynchronous writes that might as well be synchronous due to an undersized cache (relative to the workload). Otherwise the cost of verification should be hidden by caching. Another way to put this would be that you should first determine that verification is actually affecting performance, and only _then_ should you consider disabling it. But if you want to have the freedom to disable verficiation, then you should be using SHA-256 (or switch to it when disabling verification). Safety features that cost nothing are not worth turning off, so make sure their cost is significant before even thinking of turning them off. Similarly, the cost of SHA-256 vs. Fletcher should also be lost in the noise if the system has enough CPU, but if the choice of hash function could make the system CPU-bound instead of I/O-bound, then the choice of hash function would make an impact on performance. The choice of hash functions will have a different performance impact than verification: a slower hash function will affect non-deduplicatious workloads more than highly deduplicatious workloads (since the latter will require more I/O for verification, which will overwhelm the cost of the hash function). Again, measure first. Nico --
Orvar Korvar
2011-Jan-18 14:58 UTC
[zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)
"...If this is a general rule, maybe it will be worth considering using SHA512 truncated to 256 bits to get more speed..." Doesn''t it need more investigation if truncating 512bit to 256bit gives equivalent security as a plain 256bit hash? Maybe truncation will introduce some bias? -- This message posted from opensolaris.org
Orvar Korvar
2011-Jan-18 15:16 UTC
[zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)
Totally Off Topic: Very interesting. Did you produce some papers on this? Where do you work? Seems very fun place to work at! BTW, I thought about this. What do you say? Assume I want to compress data and I succeed in doing so. And then I transfer the compressed data. So all the information I transferred is the compressed data. But, then you don''t count all the information: knowledge about which algorithm was used, which number system, laws of math, etc. So there are lots of other information that is implicit, when compress/decompress - not just the data. So, if you add data and all implicit information you get a certain bit size X. Do this again on the same set of data, with another algorithm and you get another bit size Y. You compress the data, using lots of implicit information. If you use less implicit information (simple algorithm relying on simple math), will X be smaller than if you use lots of implicit information (advanced algorithm relying on a large body of advanced math)? What can you say about the numbers X and Y? Advanced math requires many math books that you need to transfer as well. -- This message posted from opensolaris.org
Nicolas Williams
2011-Jan-18 17:50 UTC
[zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)
On Tue, Jan 18, 2011 at 07:16:04AM -0800, Orvar Korvar wrote:> BTW, I thought about this. What do you say? > > Assume I want to compress data and I succeed in doing so. And then I > transfer the compressed data. So all the information I transferred is > the compressed data. But, then you don''t count all the information: > knowledge about which algorithm was used, which number system, laws of > math, etc. So there are lots of other information that is implicit, > when compress/decompress - not just the data. > > So, if you add data and all implicit information you get a certain bit > size X. Do this again on the same set of data, with another algorithm > and you get another bit size Y. > > You compress the data, using lots of implicit information. If you use > less implicit information (simple algorithm relying on simple math), > will X be smaller than if you use lots of implicit information > (advanced algorithm relying on a large body of advanced math)? What > can you say about the numbers X and Y? Advanced math requires many > math books that you need to transfer as well.Just as the laws of thermodynamics preclude perpetual motion machines, so do they preclude infinite, loss-less data compression. Yes, thermodynamics and information theory are linked, amazingly enough. Data compression algorithms work by identifying certain types of patterns, then replacing the input with notes such as "pattern 1 is ... and appears at offsets 12345 and 1234567" (I''m simplifying a lot). Data that has few or no observable patterns (observable by the compression algorithm in question) will not compress, but will expand if you insist on compressing -- randomly-generated data (e.g., the output of /dev/urandom) will not compress at all and will expand if you insist. Even just one bit needed to indicate whether a file is compressed or not will mean expansion when you fail to compress and store the original instead of the "compressed" version. Data compression reduces repetition, thus making it harder to further compress compressed data. Try it yourself. Try building a pipeline of all the compression tools you have, see how many rounds of compression you can apply to typical data before further compression fails. Nico --