Hello everybody, is there a way to compute very quickly some hash of a file in a zfs? As I understand it, everything is signed in the filesystem, so I''m wondering if I can avoid reading whole files with md5sum just to get a unique hash. Seems very redundant to me :) Bottom line is I want to make manual deduplication at the file level. Thanks in advance, Bertrand -- This message posted from opensolaris.org
On 11/05/2010 11:14, Bertrand Augereau wrote:> Hello everybody, > > is there a way to compute very quickly some hash of a file in a zfs? > As I understand it, everything is signed in the filesystem, so I''m wondering if I can avoid reading whole files with md5sum just to get a unique hash. Seems very redundant to me :)Signing implies use of a key which ZFS does not use for its block based checksums. There is no "quick" way to do this just now because ZFS checksums are block based not whole file based. -- Darren J Moffat
Is there a O(nb_blocks_for_the_file) solution, then? I know O(nb_blocks_for_the_file) == O(nb_bytes_in_the_file), from Mr. Landau''s POV, but I''m quite interested in a good constant factor. -- This message posted from opensolaris.org
On Tue, May 11, 2010 at 04:15:24AM -0700, Bertrand Augereau wrote:> Is there a O(nb_blocks_for_the_file) solution, then? > > I know O(nb_blocks_for_the_file) == O(nb_bytes_in_the_file), from Mr. Landau''s POV, but I''m quite interested in a good constant factor.If you were considering the hashes of each zfs block as a precomputed value, it might be tempting to think of getting all of these and hashing them together. You could thereby avoiding reading file data, and the file metadata with the hashes in, you''d have needed to read anyway. This would seem to be appealing, eliminating seeks and cpu work. However, there are some issues that make the approach basically infeasible and unreliable for comparing the results of two otherwise identical files. First, you''re assuming there''s an easy interface to get the stored hashes of a block, which there isn''t. Even if we ignore that for a moment, the hashes zfs records depend on factors other than just the file content, including the way the file has been written over time. The blocks of the file may not be constant size; a file that grew slowly may have different hashes to a copy of it or one extracted from an archive in a fast stream. Filesystem properties, including checksum (obvious), dedup (which implies checksum), compress (which changes written data and can make holes), blocksize and maybe others may be different between filesystems or even change over the time a file has been written, and again change results and defeat comparisons. These things can defeat zfs''s dedup too, even though it does have access to the block level checksums. If you''re going to do an application-level dedup, you want to utilise the advantage of being independent of these things - or even of the underlying filesystem at all (e.g. dedup between two NAS shares). Something similar would be useful, and much more readily achievable, from ZFS from such an application, and many others. Rather than a way to compare reliably between two files for identity, I''ld liek a way to compare identity of a single file between two points in time. If my application can tell quickly that the file content is unaltered since last time I saw the file, I can avoid rehashing the content and use a stored value. If I can achieve this result for a whole directory tree, even better. -- Dan. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 194 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100518/fa781cd7/attachment.bin>
Thanks Dan, this is exactly what I had in mind (hashing the block checksums). You convinced me to do it independently from zfs. -- This message posted from opensolaris.org
Daniel Carosone wrote:> Something similar would be useful, and much more readily achievable, > from ZFS from such an application, and many others. Rather than a way > to compare reliably between two files for identity, I''ld liek a way to > compare identity of a single file between two points in time. If my > application can tell quickly that the file content is unaltered since > last time I saw the file, I can avoid rehashing the content and use a > stored value. If I can achieve this result for a whole directory > tree, even better.This would be great for any kind of archiving software. Aren''t zfs checksums already ready to solve this? If a file changes, it''s dnodes'' checksum changes, the checksum of the directory it is in and so forth all the way up to the uberblock. There may be ways a checksum changes without a real change in the files content, but the other way round should hold. If the checksum didn''t change, the file didn''t change. So the only missing link is a way to determine zfs''s checksum for a file/directory/dataset. Am I missing something here? Of course atime update should be turned off, otherwise the checksum will get changed by the archiving agent.
On Tue, Jul 06, 2010 at 05:29:54PM +0200, Arne Jansen wrote:> Daniel Carosone wrote: > > Something similar would be useful, and much more readily achievable, > > from ZFS from such an application, and many others. Rather than a way > > to compare reliably between two files for identity, I''ld liek a way to > > compare identity of a single file between two points in time. If my > > application can tell quickly that the file content is unaltered since > > last time I saw the file, I can avoid rehashing the content and use a > > stored value. If I can achieve this result for a whole directory > > tree, even better. > > This would be great for any kind of archiving software. Aren''t zfs checksums > already ready to solve this? If a file changes, it''s dnodes'' checksum changes, > the checksum of the directory it is in and so forth all the way up to the > uberblock.Not quite. The merkle tree of file and metadata blocks gets to the root via a path that is largely separate from the tree of directory objects that name them. Changing the inode(equivalent) metadata doesn''t change the contents of any of the directories that have entries pointing to that file. Put another way, we don''t have named file versions - a directory name refers to an object and all future versions of that object. Future versions will be found at different disk addresses, but are the same object id. There''s no need to rewrite the directory object unless the name changes or the link is removed. We have named filesystem versions (snapshots), that name the entire tree of data and metadata objects, once they do finally converge.> There may be ways a checksum changes without a real change in the files content, > but the other way round should hold. If the checksum didn''t change, the file > didn''t change.That''s true, for individual files. (Checksum changes can happen when checksum algorithm is changed and same data is rewritten, or if zero-filled blocks are replaced/rewritten with holes or vice-versa, and some other cases)> So the only missing link is a way to determine zfs''s checksum for a > file/directory/dataset.That is missing, yes. Perhaps in part because noone could see a valid application-level use for this implementation-specific metadata. The original purpose of my post above was to illustrate a use case for it. The throwaway line at the end, about a directory tree, was more in the vein of wishful thinking.. :)> Am I missing something here? Of course atime update > should be turned off, otherwise the checksum will get changed by the archiving > agent.Separate checksums exist that cover the difference between content and metadata changes, so at least in theory an interface that exposes enough detail could avoid this restriction. -- Dan. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 194 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100707/d46f6f2a/attachment.bin>
On Tue, Jul 6, 2010 at 10:29 AM, Arne Jansen <sensille at gmx.net> wrote:> Daniel Carosone wrote: >> Something similar would be useful, and much more readily achievable, >> from ZFS from such an application, and many others. ?Rather than a way >> to compare reliably between two files for identity, I''ld liek a way to >> compare identity of a single file between two points in time. ?If my >> application can tell quickly that the file content is unaltered since >> last time I saw the file, I can avoid rehashing the content and use a >> stored value. If I can achieve this result for a whole directory >> tree, even better. > > This would be great for any kind of archiving software. Aren''t zfs checksums > already ready to solve this? If a file changes, it''s dnodes'' checksum changes, > the checksum of the directory it is in and so forth all the way up to the > uberblock. > There may be ways a checksum changes without a real change in the files content, > but the other way round should hold. If the checksum didn''t change, the file > didn''t change. > So the only missing link is a way to determine zfs''s checksum for a > file/directory/dataset. Am I missing something here? Of course atime update > should be turned off, otherwise the checksum will get changed by the archiving > agent.What is the likelihood that the same data is re-written to the file? If that is unlikely, it looks as though znode_t''s z_seq may be useful. While it isn''t a checksum, it seems to be incremented on every file change. -- Mike Gerdts http://mgerdts.blogspot.com/
> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- > bounces at opensolaris.org] On Behalf Of Bertrand Augereau > > is there a way to compute very quickly some hash of a file in a zfs? > As I understand it, everything is signed in the filesystem, so I''m > wondering if I can avoid reading whole files with md5sum just to get a > unique hash. Seems very redundant to me :)If I understand right: Although zfs is calculating hashes of blocks, it doesn''t correlate to hashes of files, for many reasons: Block boundaries are not well aligned with file boundaries. A single block might encapsulate several small files, or a file might start in the middle of a block, span several more, and end in the middle of another block. Blocks also contain non-file information. Hashing blocks will be even more irrelevant to file hashes, if you have compression enabled, because I think it hashes the compressed data, not the uncompressed data. If you want to create file hashes out of block hashes, it''s even more convoluted. Because you can''t generally compute hash(A+B) based on hash(A) and hash(B). Although perhaps you can for some algorithms. My advice would be: Computing hashes is not very expensive, as long as you''re just computing hashes for data that you were going to handle for other reasons anyway. Specifically, I benchmarked several hash algorithms a while back, and found ... I forget which ... either adler32 or crc is almost zero-time to compute ... that is ... the cpu was very lightly utilized while hashing blocks at maximum disk speed. The weakness of adler32 and crc is that they''re not cryptographic hashes. If a malicious person wants to corrupt a data stream while preserving the hash, it''s not difficult to do. adler32 and crc are good as long as you can safely assume no malice. md5 is significantly slower (but surprisingly not much slower) and it''s a cryptographic hash. Probably not necessary for your needs. And one more thing. No matter how strong your hash is, unless your hash is just as big as your file, collisions happen. Don''t assume data is the same just because hash is the same, if you care about your data. Always byte-level verify every block or file whose hash matches some other hash.
On Thu, 2010-07-08 at 18:46 -0400, Edward Ned Harvey wrote:> > From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- > > bounces at opensolaris.org] On Behalf Of Bertrand Augereau > > > > is there a way to compute very quickly some hash of a file in a zfs? > > As I understand it, everything is signed in the filesystem, so I''m > > wondering if I can avoid reading whole files with md5sum just to get a > > unique hash. Seems very redundant to me :) > > If I understand right: > > Although zfs is calculating hashes of blocks, it doesn''t correlate to hashes > of files, for many reasons: > > Block boundaries are not well aligned with file boundaries. A single block > might encapsulate several small files, or a file might start in the middle > of a block, span several more, and end in the middle of another block. > > Blocks also contain non-file information. > > Hashing blocks will be even more irrelevant to file hashes, if you have > compression enabled, because I think it hashes the compressed data, not the > uncompressed data. > > If you want to create file hashes out of block hashes, it''s even more > convoluted. Because you can''t generally compute hash(A+B) based on hash(A) > and hash(B). Although perhaps you can for some algorithms. > > My advice would be: > > Computing hashes is not very expensive, as long as you''re just computing > hashes for data that you were going to handle for other reasons anyway. > Specifically, I benchmarked several hash algorithms a while back, and found > ... I forget which ... either adler32 or crc is almost zero-time to compute > ... that is ... the cpu was very lightly utilized while hashing blocks at > maximum disk speed. > > The weakness of adler32 and crc is that they''re not cryptographic hashes. > If a malicious person wants to corrupt a data stream while preserving the > hash, it''s not difficult to do. adler32 and crc are good as long as you can > safely assume no malice. > > md5 is significantly slower (but surprisingly not much slower) and it''s a > cryptographic hash. Probably not necessary for your needs. > > And one more thing. No matter how strong your hash is, unless your hash is > just as big as your file, collisions happen. Don''t assume data is the same > just because hash is the same, if you care about your data. Always > byte-level verify every block or file whose hash matches some other hash.MD5 hashing is not recommended for "cryptographically strong" hashing anymore. SHA256 is the current recommendation I would make (the state of the art changes over time.) The caution about collisions happening is relevant, but with a suitably strong hash, the risk is close enough to zero that normal people don''t care. By that, I mean that the chance of a collision within a 256 bit hash is something like 1/2^255. You''re probably more likely to spontaneously combust (by an order of magnitude) than you are to have two files that "accidentally" (or even maliciously) reduce to the same hash. When the probability of the Sun going supernova in the next 30 seconds exceeds the probability of a cryptographic hash collision, I don''t worry about the collision anymore. :-) - Garrett> > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
On 2010-Jul-09 06:46:54 +0800, Edward Ned Harvey <solaris2 at nedharvey.com> wrote:>md5 is significantly slower (but surprisingly not much slower) and it''s a >cryptographic hash. Probably not necessary for your needs.As someone else has pointed out, MD5 is no longer considered secure (neither is SHA-1). If you want cryptographic hashing, you should probably use SHA-256 for now and be prepared to migrate to SHA-3 once it is announced. Unfortunately, SHA-256 is significantly slower than MD5 (about 4 times on a P-4, about 3 times on a SPARC-IV) and no cryptographic hash is amenable to multi-threading . The new crypto instructions on some of Intel''s recent offerings may help performance (and it''s likely that they will help more with SHA-3).>And one more thing. No matter how strong your hash is, unless your hash is >just as big as your file, collisions happen. Don''t assume data is the same >just because hash is the same, if you care about your data. Always >byte-level verify every block or file whose hash matches some other hash.In theory, collisions happen. In practice, given a cryptographic hash, if you can find two different blocks or files that produce the same output, please publicise it widely as you have broken that hash function. -- Peter Jeremy -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 196 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100709/939a9a71/attachment.bin>
On Fri, 2010-07-09 at 10:23 +1000, Peter Jeremy wrote:> On 2010-Jul-09 06:46:54 +0800, Edward Ned Harvey <solaris2 at nedharvey.com> wrote: > >md5 is significantly slower (but surprisingly not much slower) and it''s a > >cryptographic hash. Probably not necessary for your needs. > > As someone else has pointed out, MD5 is no longer considered secure > (neither is SHA-1). If you want cryptographic hashing, you should > probably use SHA-256 for now and be prepared to migrate to SHA-3 once > it is announced. Unfortunately, SHA-256 is significantly slower than > MD5 (about 4 times on a P-4, about 3 times on a SPARC-IV) and no > cryptographic hash is amenable to multi-threading . The new crypto > instructions on some of Intel''s recent offerings may help performance > (and it''s likely that they will help more with SHA-3). > > >And one more thing. No matter how strong your hash is, unless your hash is > >just as big as your file, collisions happen. Don''t assume data is the same > >just because hash is the same, if you care about your data. Always > >byte-level verify every block or file whose hash matches some other hash. > > In theory, collisions happen. In practice, given a cryptographic hash, > if you can find two different blocks or files that produce the same > output, please publicise it widely as you have broken that hash function.Not necessarily. While you *should* publicize it widely, given all the possible text that we have, and all the other variants, its theoretically possibly to get likely. Like winning a lottery where everyone else has a million tickets, but you only have one. Such an occurrence -- if isolated -- would not, IMO, constitute a ''breaking'' of the hash function. - Garrett> > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Nicolas Williams
2010-Jul-09 16:31 UTC
[zfs-discuss] Hash functions (was Re: Hashing files rapidly on ZFS)
On Thu, Jul 08, 2010 at 08:42:33PM -0700, Garrett D''Amore wrote:> On Fri, 2010-07-09 at 10:23 +1000, Peter Jeremy wrote: > > In theory, collisions happen. In practice, given a cryptographic hash, > > if you can find two different blocks or files that produce the same > > output, please publicise it widely as you have broken that hash function. > > Not necessarily. While you *should* publicize it widely, given all the > possible text that we have, and all the other variants, its > theoretically possibly to get likely. Like winning a lottery where > everyone else has a million tickets, but you only have one. > > Such an occurrence -- if isolated -- would not, IMO, constitute a > ''breaking'' of the hash function.A hash function is broken when we know how to create colliding inputs. A random collision does not a break make, though it might, perhaps, help figure out how to break the hash function later. Nico --