Rajkumar M
2010-Mar-05 11:43 UTC
[zfs-discuss] De-duplication - Similar Blocks at Different Offsets
For data de-duplication, fixed-length block approach divides a file into fixed size length blocks to find duplicates. But similar data blocks may be present at different offsets in two different datasets. In other words the block boundary of similar data may be different. This is very common when some bytes are inserted in a file, and when the changed file is processed again for dedup, all the blocks appear to have changed. See http://www.mediafire.com/imageview.php?quickkey=qrvz1yhoima <http://www.google.com/url?sa=D&q=http://www.mediafire.com/imageview.php%3Fquickkey%3Dqrvz1yhoima&usg=AFQjCNHTH3201Ttm8w87MEP3BRauztD_nA> for illustration. Does ZFS de-duplication work if the offset of similar data blocks are different?
Robert Milkowski
2010-Mar-05 12:04 UTC
[zfs-discuss] De-duplication - Similar Blocks at Different Offsets
On 05/03/2010 11:43, Rajkumar M wrote:> For data de-duplication, fixed-length block approach divides a file > into fixed size length blocks to find duplicates. But similar data > blocks may be present at different offsets in two different datasets. > In other words the block boundary of similar data may be different. > This is very common when some bytes are inserted in a file, and when > the changed file is processed again for dedup, all the blocks appear > to have changed. See > http://www.mediafire.com/imageview.php?quickkey=qrvz1yhoima > <http://www.google.com/url?sa=D&q=http://www.mediafire.com/imageview.php%3Fquickkey%3Dqrvz1yhoima&usg=AFQjCNHTH3201Ttm8w87MEP3BRauztD_nA> > for illustration. > Does ZFS de-duplication work if the offset of similar data blocks are > different?ZFS uses fs block''s checksum for deduplication. So if it would happen that the same data would be at different object such that both fs blocks would be identical then yest zfs would de-dupe it. Otherwise not. -- Robert Milkowski http://milek.blogspot.com