thr3ads.net - zfs code - [zfs-code] De-duplication - Similar Blocks at Different Offsets [Mar 2010]

If this information is useful, please help other people find it:
Share via:

Rajkumar M

2010-Mar-05 11:43 UTC

[zfs-code] De-duplication - Similar Blocks at Different Offsets

For data de-duplication, fixed-length block approach divides a file into 
fixed size length blocks to find duplicates. But similar data blocks may 
be present at different offsets in two different datasets. In other 
words the block boundary of similar data may be different. This is very 
common when some bytes are inserted in a file, and when the changed file 
is processed again for dedup, all the blocks appear to have changed. See 
http://www.mediafire.com/imageview.php?quickkey=qrvz1yhoima 
<http://www.google.com/url?sa=D&q=http://www.mediafire.com/imageview.php%3Fquickkey%3Dqrvz1yhoima&usg=AFQjCNHTH3201Ttm8w87MEP3BRauztD_nA>
for illustration.
Does ZFS de-duplication work if the offset of similar data blocks are 
different?

Steve Gonczi

2010-Apr-23 21:00 UTC

head link

[zfs-code] De-duplication - Similar Blocks at Different Offsets

Zfs deduplication is strictly block-based.   

What you are describing would involve interpreting the data, 
as a stream of bytes, and trying to find patterns in it. 

I am guessing that this could not be done at anywhere near the performance that
a
block-level scheme, such as zfs dedup is capable of.

best wishes,

Steve
-- 
This message posted from opensolaris.org

zfs code - Mar 2010 - De-duplication - Similar Blocks at Different Offsets

[zfs-code] De-duplication - Similar Blocks at Different Offsets

[zfs-code] De-duplication - Similar Blocks at Different Offsets