thr3ads.net - zfs discuss - [zfs-discuss] How does dedup work over iSCSI? [Oct 2010]

If this information is useful, please help other people find it:
Share via:

Peter Taps

2010-Oct-22 21:34 UTC

[zfs-discuss] How does dedup work over iSCSI?

Folks,

Let''s say I have a volume being shared over iSCSI. The dedup has been
turned on.

Let''s say I copy the same file twice under different names at the
initiator end. Let''s say each file ends up taking 5 blocks.

For dedupe to work, each block for a file must match the corresponding block
from the other file. Essentially, each pair of block being compared must have
the same start location into the actual data.

For a shared filesystem, ZFS may internally ensure that the block starts match.
However, over iSCSI, the initiator does not even know about the whole block
mechanism that zfs has. It is just sending raw bytes to the target. This makes
me wonder if dedup actually works over iSCSI.

Can someone please enlighten me on what I am missing?

Thank you in advance for your help.

Regards,
Peter
-- 
This message posted from opensolaris.org

Neil Perrin

2010-Oct-22 21:44 UTC

head link

[zfs-discuss] How does dedup work over iSCSI?

On 10/22/10 15:34, Peter Taps wrote:> Folks,
>
> Let''s say I have a volume being shared over iSCSI. The dedup has
been turned on.
>
> Let''s say I copy the same file twice under different names at the
initiator end. Let''s say each file ends up taking 5 blocks.
>
> For dedupe to work, each block for a file must match the corresponding
block from the other file. Essentially, each pair of block being compared must
have the same start location into the actual data.
>   
No,  ZFS doesn''t care about the file offset, just that the checksum of 
the blocks matches.
> For a shared filesystem, ZFS may internally ensure that the block starts
match. However, over iSCSI, the initiator does not even know about the whole
block mechanism that zfs has. It is just sending raw bytes to the target. This
makes me wonder if dedup actually works over iSCSI.
>
> Can someone please enlighten me on what I am missing?
>
> Thank you in advance for your help.
>
> Regards,
> Peter
>

Peter Taps

2010-Oct-22 23:28 UTC

head link

[zfs-discuss] How does dedup work over iSCSI?

Hi Neil,

if the file offset does not match, the chances that the checksum would match,
especially sha256, is almost 0.

May be I am missing something. Let''s say I have a file that contains 11
letters - ABCDEFGHIJK. Let''s say the block size is 5.

For the first file, the block contents are "ABCDE," "FGHIJ",
and "K."

For the second file, let''s say the blocks are " ABCD",
"EFGHI", and "JK."

The chance that any checksum would match is very less. The chance that any
"checksum+verify" would match is even less.

Regards,
Peter
-- 
This message posted from opensolaris.org

Neil Perrin

2010-Oct-23 00:25 UTC

head link

[zfs-discuss] How does dedup work over iSCSI?

On 10/22/10 17:28, Peter Taps wrote:> Hi Neil,
>
> if the file offset does not match, the chances that the checksum would
match, especially sha256, is almost 0.
>
> May be I am missing something. Let''s say I have a file that
contains 11 letters - ABCDEFGHIJK. Let''s say the block size is 5.
>
> For the first file, the block contents are "ABCDE,"
"FGHIJ", and "K."
>
> For the second file, let''s say the blocks are " ABCD",
"EFGHI", and "JK."
>
> The chance that any checksum would match is very less. The chance that any
"checksum+verify" would match is even less.
>
> Regards,
> Peter
The block size and contents has to match for ZFS dedup.
See http://blogs.sun.com/bonwick/entry/zfs_dedup

Neil.

Haudy Kazemi

2010-Oct-23 03:29 UTC

head link

[zfs-discuss] How does dedup work over iSCSI?

Neil Perrin wrote:> On 10/22/10 15:34, Peter Taps wrote:
>> Folks,
>>
>> Let''s say I have a volume being shared over iSCSI. The dedup
has been
>> turned on.
>>
>> Let''s say I copy the same file twice under different names at
the
>> initiator end. Let''s say each file ends up taking 5 blocks.
>>
>> For dedupe to work, each block for a file must match the 
>> corresponding block from the other file. Essentially, each pair of 
>> block being compared must have the same start location into the 
>> actual data.
>>   
>
> No,  ZFS doesn''t care about the file offset, just that the
checksum of
> the blocks matches.
>
One conclusion is that one should be careful not to mess up file 
alignments when working with large files (like you might have in 
virtualization scenarios).  I.e. if you have a bunch of virtual machine 
image clones, they''ll dedupe quite well initially.  However, if you
then
make seemingly minor changes inside some of those clones (like changing 
their partition offsets to do 1mb alignment), you''ll lose most or all
of
the dedupe benefits.

General purpose compression tends to be less susceptible to changes in 
data offsets but also has its limits based on algorithm and dictionary 
size.  I think dedupe can be viewed as a special-case of compression 
that happens to work quite well for certain workloads when given ample 
hardware resources (compared to what would be needed to run without dedupe).

zfs discuss - Oct 2010 - How does dedup work over iSCSI?

[zfs-discuss] How does dedup work over iSCSI?

[zfs-discuss] How does dedup work over iSCSI?

[zfs-discuss] How does dedup work over iSCSI?

[zfs-discuss] How does dedup work over iSCSI?

[zfs-discuss] How does dedup work over iSCSI?