Rajkumar M
2010-Mar-05 11:43 UTC
[zfs-code] De-duplication - Similar Blocks at Different Offsets
For data de-duplication, fixed-length block approach divides a file into fixed size length blocks to find duplicates. But similar data blocks may be present at different offsets in two different datasets. In other words the block boundary of similar data may be different. This is very common when some bytes are inserted in a file, and when the changed file is processed again for dedup, all the blocks appear to have changed. See http://www.mediafire.com/imageview.php?quickkey=qrvz1yhoima <http://www.google.com/url?sa=D&q=http://www.mediafire.com/imageview.php%3Fquickkey%3Dqrvz1yhoima&usg=AFQjCNHTH3201Ttm8w87MEP3BRauztD_nA> for illustration. Does ZFS de-duplication work if the offset of similar data blocks are different?
Steve Gonczi
2010-Apr-23 21:00 UTC
[zfs-code] De-duplication - Similar Blocks at Different Offsets
Zfs deduplication is strictly block-based. What you are describing would involve interpreting the data, as a stream of bytes, and trying to find patterns in it. I am guessing that this could not be done at anywhere near the performance that a block-level scheme, such as zfs dedup is capable of. best wishes, Steve -- This message posted from opensolaris.org