thr3ads.net - zfs discuss - [zfs-discuss] DeDup and Compression

If this information is useful, please help other people find it:
Share via:

Steve Radich, BitShop, Inc.

2009-Dec-13 20:51 UTC

[zfs-discuss] DeDup and Compression - Reverse Order?

I enabled compression on a zfs filesystem with compression=gzip9 - i.e. fairly
slow compression - this stores backups of databases (which compress fairly
well).

The next question is:  Is the CRC on the disk based on the uncompressed data
(which seems more likely to be able to be recovered) or based on the zipped data
(which seems slightly less likely to be able to be recovered).

Why? 

Because if you can de-dup anyway why bother to compress THEN check? This SEEMS
to be the behaviour - i.e. I would suspect many of the files I''m
writing are dups - however I see high cpu use even though on some of the copies
I see almost no disk writes.

If the dup check logic happens first AND it''s a duplicate I
shouldn''t see hardly any CPU use (because it won''t need to
compress the data).

Steve Radich
BitShop.com
-- 
This message posted from opensolaris.org

Robert Milkowski

2009-Dec-14 09:02 UTC

head link

[zfs-discuss] DeDup and Compression - Reverse Order?

On 13/12/2009 20:51, Steve Radich, BitShop, Inc. wrote:> I enabled compression on a zfs filesystem with compression=gzip9 - i.e.
fairly slow compression - this stores backups of databases (which compress
fairly well).
>
> The next question is:  Is the CRC on the disk based on the uncompressed
data (which seems more likely to be able to be recovered) or based on the zipped
data (which seems slightly less likely to be able to be recovered).
>
> Why?
>
> Because if you can de-dup anyway why bother to compress THEN check? This
SEEMS to be the behaviour - i.e. I would suspect many of the files I''m
writing are dups - however I see high cpu use even though on some of the copies
I see almost no disk writes.
>
> If the dup check logic happens first AND it''s a duplicate I
shouldn''t see hardly any CPU use (because it won''t need to
compress the data).
>
>
>    
First, the checksum is calculated after compression happens.

If both compression and dedup is enabled for a given dataset then zfs 
will first compress the data, calculae the checksum and then dedup it.

It makes perfect sense as if you have some data which is very 
compressible and the unique set is big enough so compression would be 
useful it makes sense to use them both.

If you don''t want the compression while using dedup just disable it.


-- 
Robert Milkowski
http://milek.blogspot.com

Andrey Kuzmin

2009-Dec-14 18:30 UTC

head link

[zfs-discuss] DeDup and Compression - Reverse Order?

On Sun, Dec 13, 2009 at 11:51 PM, Steve Radich, BitShop, Inc.
<stever at bitshop.com> wrote:> I enabled compression on a zfs filesystem with compression=gzip9 - i.e.
fairly slow compression - this stores backups of databases (which compress
fairly well).
>
> The next question is: ?Is the CRC on the disk based on the uncompressed
data (which seems more likely to be able to be recovered) or based on the zipped
data (which seems slightly less likely to be able to be recovered).
>
> Why?
>
> Because if you can de-dup anyway why bother to compress THEN check? This
SEEMS to be the behaviour - i.e. I
ZFS deduplication is block-level, so to deduplicate one needs data
broken into blocks to be written. With compression enabled, you don''t
have these until data is compressed. Looks like cycles waste indeed,
but ...

Regards,
Andrey
> would suspect many of the files I''m writing are dups - however I
see high cpu use even though
> on some of the copies I see almost no disk writes.
>
> If the dup check logic happens first AND it''s a duplicate I
shouldn''t see hardly any CPU use (because it won''t need to
compress the data).
>
> Steve Radich
> BitShop.com
> --
> This message posted from opensolaris.org
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>

A Darren Dunham

2009-Dec-14 18:46 UTC

head link

[zfs-discuss] DeDup and Compression - Reverse Order?

On Mon, Dec 14, 2009 at 09:30:29PM +0300, Andrey Kuzmin
wrote:> ZFS deduplication is block-level, so to deduplicate one needs data
> broken into blocks to be written. With compression enabled, you
don''t
> have these until data is compressed. Looks like cycles waste indeed,
> but ...
ZFS compression is also block-level.  Both are done on ZFS blocks.  ZFS
compression is not streamwise.

-- 
Darren

Casper.Dik at Sun.COM

2009-Dec-14 18:53 UTC

head link

[zfs-discuss] DeDup and Compression - Reverse Order?

>On Mon, Dec 14, 2009 at 09:30:29PM +0300, Andrey Kuzmin wrote:
>> ZFS deduplication is block-level, so to deduplicate one needs data
>> broken into blocks to be written. With compression enabled, you
don''t
>> have these until data is compressed. Looks like cycles waste indeed,
>> but ...
>
>ZFS compression is also block-level.  Both are done on ZFS blocks.  ZFS
>compression is not streamwise.

And if you enable "verify" and you checksum the uncompressed data, you
will need to uncompress before you can verify.

Casper

Andrey Kuzmin

2009-Dec-14 19:32 UTC

head link

[zfs-discuss] DeDup and Compression - Reverse Order?

On Mon, Dec 14, 2009 at 9:53 PM,  <Casper.Dik at sun.com>
wrote:>
>>On Mon, Dec 14, 2009 at 09:30:29PM +0300, Andrey Kuzmin wrote:
>>> ZFS deduplication is block-level, so to deduplicate one needs data
>>> broken into blocks to be written. With compression enabled, you
don''t
>>> have these until data is compressed. Looks like cycles waste
indeed,
>>> but ...
>>
>>ZFS compression is also block-level. ?Both are done on ZFS blocks. ?ZFS
>>compression is not streamwise.
>
>
> And if you enable "verify" and you checksum the uncompressed
data, you
> will need to uncompress before you can verify.
Right, but ''verify'' seems to be ''extreme
safety'' and thus rather rare
use case. Saving cycles lost to compress duplicates looks to outweigh
''uncompress before verify'' overhead, imo.

Regards,
Andrey
>
> Casper
>
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>

Cyril Plisko

2009-Dec-14 20:15 UTC

head link

[zfs-discuss] DeDup and Compression - Reverse Order?

On Mon, Dec 14, 2009 at 9:32 PM, Andrey Kuzmin
<andrey.v.kuzmin at gmail.com> wrote:>
> Right, but ''verify'' seems to be ''extreme
safety'' and thus rather rare
> use case.
Hmm, dunno. I wouldn''t set anything, but scratch file system to
dedup=on. Anything of even slight significance is set to dedup=verify.
> Saving cycles lost to compress duplicates looks to outweigh
> ''uncompress before verify'' overhead, imo.
Dedup doesn''t come for free - it imposes additional load on CPU. just
like a checksumming and compression. The more fancy things we want our
file system to do for us, the stronger CPU it''ll take.

-- 
Regards,
        Cyril

Nick

2009-Dec-14 20:33 UTC

head link

[zfs-discuss] DeDup and Compression - Reverse Order?

> 
> Hmm, dunno. I wouldn''t set anything, but scratch file
> system to
> dedup=on. Anything of even slight significance is set
> to dedup=verify.
Why?  Are you saying this because the ZFS dedup code is relatively new?  Or
because you think there''s some other problem/disadvantage to it? 
We''re planning on using deduplication for archiving old data, and I see
good use cases for it for virtual machine data.
>  
> Dedup doesn''t come for free - it imposes additional
> load on CPU. just
> like a checksumming and compression. The more fancy
> things we want our
> file system to do for us, the stronger CPU it''ll
> take.
> 
Understood and agreed...but if you have the extra CPU cycles already, then,
depending on the type of data and your deduplication ratios, it may be worth it
to use the extra CPU to avoid buying the disk.

-Nick
-- 
This message posted from opensolaris.org

Andrey Kuzmin

2009-Dec-15 00:03 UTC

head link

[zfs-discuss] DeDup and Compression - Reverse Order?

On 12/14/09, Cyril Plisko <cyril.plisko at mountall.com>
wrote:> On Mon, Dec 14, 2009 at 9:32 PM, Andrey Kuzmin
> <andrey.v.kuzmin at gmail.com> wrote:
>>
>> Right, but ''verify'' seems to be ''extreme
safety'' and thus rather rare
>> use case.
>
> Hmm, dunno. I wouldn''t set anything, but scratch file system to
> dedup=on. Anything of even slight significance is set to dedup=verify.
>
>> Saving cycles lost to compress duplicates looks to outweigh
>> ''uncompress before verify'' overhead, imo.
>
> Dedup doesn''t come for free - it imposes additional load on CPU.
just
> like a checksumming and compression. The more fancy things we want our
> file system to do for us, the stronger CPU it''ll take.
>
> --
> Regards,
>         Cyril
>Verify mode actually looks compress/dedupe order-neutral. To do
byte-comparison, one can either compress new block or decompress old
one, and the latter is usually a bit easter. Pipeline design may
dictate a choice, for instance one could compress new block while old
one is being fetched from disk for comparison, but overall it looks
pretty close. And with dedupe=on reversing the order, if feasible,
saves quite some cycles.

Regards,
Andrey

Darren J Moffat

2009-Dec-15 10:00 UTC

head link

[zfs-discuss] DeDup and Compression - Reverse Order?

Cyril Plisko wrote:> On Mon, Dec 14, 2009 at 9:32 PM, Andrey Kuzmin
> <andrey.v.kuzmin at gmail.com> wrote:
>> Right, but ''verify'' seems to be ''extreme
safety'' and thus rather rare
>> use case.
> 
> Hmm, dunno. I wouldn''t set anything, but scratch file system to
> dedup=on. Anything of even slight significance is set to dedup=verify.
Why ?  Is it because you don''t believe SHA256 (which is the default 
checksum used when dedup=on is specified) is strong enough ?

-- 
Darren J Moffat

Kjetil Torgrim Homme

2009-Dec-15 12:06 UTC

head link

[zfs-discuss] DeDup and Compression - Reverse Order?

Robert Milkowski <milek at task.gda.pl> writes:> On 13/12/2009 20:51, Steve Radich, BitShop, Inc. wrote:
>> Because if you can de-dup anyway why bother to compress THEN check?
>> This SEEMS to be the behaviour - i.e. I would suspect many of the
>> files I''m writing are dups - however I see high cpu use even
though
>> on some of the copies I see almost no disk writes.
>
> First, the checksum is calculated after compression happens.
for some reason I, like Steve, thought the checksum was calculated on
the uncompressed data, but a look in the source confirms you''re right,
of course.

thinking about the consequences of changing it, RAID-Z recovery would be
much more CPU intensive if hashing was done on uncompressed data --
every possible combination of the N-1 disks would have to be
decompressed (and most combinations would fail), and *then* the
remaining candidates would be hashed to see if the data is correct.

this would be done on a per recordsize basis, not per stripe, which
means reconstruction would fail if two disk blocks (512 octets) on
different disks and in different stripes go bad.  (doing an exhaustive
search for all possible permutations to handle that case doesn''t seem
realistic.)

in addition, hashing becomes slightly more expensive since more data
needs to be hashed.

overall, my guess is that this choice (made before dedup!) will give
worse performance in normal situations in the future, when dedup+lzjb
will be very common, at a cost of faster and more reliable resilver.  in
any case, there is not much to be done about it now.

-- 
Kjetil T. Homme
Redpill Linpro AS - Changing the game

Andrey Kuzmin

2009-Dec-15 14:50 UTC

head link

[zfs-discuss] DeDup and Compression - Reverse Order?

On Tue, Dec 15, 2009 at 3:06 PM, Kjetil Torgrim Homme
<kjetilho at linpro.no> wrote:> Robert Milkowski <milek at task.gda.pl> writes:
>> On 13/12/2009 20:51, Steve Radich, BitShop, Inc. wrote:
>>> Because if you can de-dup anyway why bother to compress THEN check?
>>> This SEEMS to be the behaviour - i.e. I would suspect many of the
>>> files I''m writing are dups - however I see high cpu use
even though
>>> on some of the copies I see almost no disk writes.
>>
>> First, the checksum is calculated after compression happens.
>
> for some reason I, like Steve, thought the checksum was calculated on
> the uncompressed data, but a look in the source confirms you''re
right,
> of course.
>
> thinking about the consequences of changing it, RAID-Z recovery would be
> much more CPU intensive if hashing was done on uncompressed data --
I don''t quite see how dedupe (based on sha256) and parity (based on
crc32) are related.

Regards,
Andrey
> every possible combination of the N-1 disks would have to be
> decompressed (and most combinations would fail), and *then* the
> remaining candidates would be hashed to see if the data is correct.
>
> this would be done on a per recordsize basis, not per stripe, which
> means reconstruction would fail if two disk blocks (512 octets) on
> different disks and in different stripes go bad. ?(doing an exhaustive
> search for all possible permutations to handle that case doesn''t
seem
> realistic.)
>
> in addition, hashing becomes slightly more expensive since more data
> needs to be hashed.
>
> overall, my guess is that this choice (made before dedup!) will give
> worse performance in normal situations in the future, when dedup+lzjb
> will be very common, at a cost of faster and more reliable resilver. ?in
> any case, there is not much to be done about it now.
>
> --
> Kjetil T. Homme
> Redpill Linpro AS - Changing the game
>
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>

Kjetil Torgrim Homme

2009-Dec-16 14:18 UTC

head link

[zfs-discuss] DeDup and Compression - Reverse Order?

Andrey Kuzmin <andrey.v.kuzmin at gmail.com> writes:
> Kjetil Torgrim Homme wrote:
>> for some reason I, like Steve, thought the checksum was calculated on
>> the uncompressed data, but a look in the source confirms
you''re right,
>> of course.
>>
>> thinking about the consequences of changing it, RAID-Z recovery would
be
>> much more CPU intensive if hashing was done on uncompressed data --
>
> I don''t quite see how dedupe (based on sha256) and parity (based
on
> crc32) are related.
I tried to hint at an explanation:
>> every possible combination of the N-1 disks would have to be
>> decompressed (and most combinations would fail), and *then* the
>> remaining candidates would be hashed to see if the data is correct.
the key is that you don''t know which block is corrupt.  if everything
is
hunky-dory, the parity will match the data.  parity in RAID-Z1 is not a
checksum like CRC32, it is simply XOR (like in RAID 5).  here''s an
example with four data disks and one paritydisk:

  D1  D2  D3  D4  PP
  00  01  10  10  01

this is a single stripe with 2-bit disk blocks for simplicity.  if you
XOR together all the blocks, you get 00.  that''s the simple premise for
reconstruction -- D1 = XOR(D2, D3, D4, PP), D2 = XOR(D1, D3, D4, PP) and
so on.

so what happens if a bit flips in D4 and it becomes 00?  the total XOR
isn''t 00 anymore, it is 10 -- something is wrong.  but unless you get a
hardware signal from D4, you don''t know which block is corrupt.  this
is
a major problem with RAID 5, the data is irrevocably corrupt.  the
parity discovers the error, and can alert the user, but that''s the best
it can do.  in RAID-Z the hash saves the day: first *assume* D1 is bad
and reconstruct it from parity.  if the hash for the block is OK, D1
*was* bad.  otherwise, assume D2 is bad.  and so on.

so, the parity calculation will indicate which stripes contain bad
blocks.  but the hashing, the sanity check for which disk blocks are
actually bad must be calculated over all the stripes a ZFS block
(record) consists of.
>> this would be done on a per recordsize basis, not per stripe, which
>> means reconstruction would fail if two disk blocks (512 octets) on
>> different disks and in different stripes go bad. ?(doing an exhaustive
>> search for all possible permutations to handle that case
doesn''t seem
>> realistic.)
actually this is the same for compression before/after hashing.  it''s
just that each permutation is more expensive to check.
>> in addition, hashing becomes slightly more expensive since more data
>> needs to be hashed.
>>
>> overall, my guess is that this choice (made before dedup!) will give
>> worse performance in normal situations in the future, when dedup+lzjb
>> will be very common, at a cost of faster and more reliable resilver.
?in
>> any case, there is not much to be done about it now.
-- 
Kjetil T. Homme
Redpill Linpro AS - Changing the game

Andrey Kuzmin

2009-Dec-16 14:38 UTC

head link

[zfs-discuss] DeDup and Compression - Reverse Order?

Yet again, I don''t see how RAID-Z reconstruction is related to the
subject discussed (what data should be sha256''ed when both dedupe and
compression are enabled, raw or compressed ). sha256 has nothing to do
with bad block detection (may be it will when encryption is
implemented, but for now sha256 is used for duplicate candidates
look-up only).

Regards,
Andrey




On Wed, Dec 16, 2009 at 5:18 PM, Kjetil Torgrim Homme
<kjetilho at linpro.no> wrote:> Andrey Kuzmin <andrey.v.kuzmin at gmail.com> writes:
>
>> Kjetil Torgrim Homme wrote:
>>> for some reason I, like Steve, thought the checksum was calculated
on
>>> the uncompressed data, but a look in the source confirms
you''re right,
>>> of course.
>>>
>>> thinking about the consequences of changing it, RAID-Z recovery
would be
>>> much more CPU intensive if hashing was done on uncompressed data --
>>
>> I don''t quite see how dedupe (based on sha256) and parity
(based on
>> crc32) are related.
>
> I tried to hint at an explanation:
>
>>> every possible combination of the N-1 disks would have to be
>>> decompressed (and most combinations would fail), and *then* the
>>> remaining candidates would be hashed to see if the data is correct.
>
> the key is that you don''t know which block is corrupt. ?if
everything is
> hunky-dory, the parity will match the data. ?parity in RAID-Z1 is not a
> checksum like CRC32, it is simply XOR (like in RAID 5). ?here''s an
> example with four data disks and one paritydisk:
>
> ?D1 ?D2 ?D3 ?D4 ?PP
> ?00 ?01 ?10 ?10 ?01
>
> this is a single stripe with 2-bit disk blocks for simplicity. ?if you
> XOR together all the blocks, you get 00. ?that''s the simple
premise for
> reconstruction -- D1 = XOR(D2, D3, D4, PP), D2 = XOR(D1, D3, D4, PP) and
> so on.
>
> so what happens if a bit flips in D4 and it becomes 00? ?the total XOR
> isn''t 00 anymore, it is 10 -- something is wrong. ?but unless you
get a
> hardware signal from D4, you don''t know which block is corrupt.
?this is
> a major problem with RAID 5, the data is irrevocably corrupt. ?the
> parity discovers the error, and can alert the user, but that''s the
best
> it can do. ?in RAID-Z the hash saves the day: first *assume* D1 is bad
> and reconstruct it from parity. ?if the hash for the block is OK, D1
> *was* bad. ?otherwise, assume D2 is bad. ?and so on.
>
> so, the parity calculation will indicate which stripes contain bad
> blocks. ?but the hashing, the sanity check for which disk blocks are
> actually bad must be calculated over all the stripes a ZFS block
> (record) consists of.
>
>>> this would be done on a per recordsize basis, not per stripe, which
>>> means reconstruction would fail if two disk blocks (512 octets) on
>>> different disks and in different stripes go bad. ?(doing an
exhaustive
>>> search for all possible permutations to handle that case
doesn''t seem
>>> realistic.)
>
> actually this is the same for compression before/after hashing.
?it''s
> just that each permutation is more expensive to check.
>
>>> in addition, hashing becomes slightly more expensive since more
data
>>> needs to be hashed.
>>>
>>> overall, my guess is that this choice (made before dedup!) will
give
>>> worse performance in normal situations in the future, when
dedup+lzjb
>>> will be very common, at a cost of faster and more reliable
resilver. ?in
>>> any case, there is not much to be done about it now.
>
> --
> Kjetil T. Homme
> Redpill Linpro AS - Changing the game
>
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>

Kjetil Torgrim Homme

2009-Dec-16 16:25 UTC

head link

[zfs-discuss] DeDup and Compression - Reverse Order?

Andrey Kuzmin <andrey.v.kuzmin at gmail.com>
writes:> Yet again, I don''t see how RAID-Z reconstruction is related to the
> subject discussed (what data should be sha256''ed when both dedupe
and
> compression are enabled, raw or compressed ). sha256 has nothing to do
> with bad block detection (may be it will when encryption is
> implemented, but for now sha256 is used for duplicate candidates
> look-up only).
how do you think RAID-Z resilvering works?  please correct me where I''m
wrong.

-- 
Kjetil T. Homme
Redpill Linpro AS - Changing the game

Andrey Kuzmin

2009-Dec-16 16:37 UTC

head link

[zfs-discuss] DeDup and Compression - Reverse Order?

On Wed, Dec 16, 2009 at 7:25 PM, Kjetil Torgrim Homme
<kjetilho at linpro.no> wrote:> Andrey Kuzmin <andrey.v.kuzmin at gmail.com> writes:
>> Yet again, I don''t see how RAID-Z reconstruction is related to
the
>> subject discussed (what data should be sha256''ed when both
dedupe and
>> compression are enabled, raw or compressed ). sha256 has nothing to do
>> with bad block detection (may be it will when encryption is
>> implemented, but for now sha256 is used for duplicate candidates
>> look-up only).
>
> how do you think RAID-Z resilvering works? ?please correct me where
I''m
> wrong.
Resilvering has noting to do with sha256: one could resilver long
before dedupe was introduced in zfs.

Regards,
Andrey
>
> --
> Kjetil T. Homme
> Redpill Linpro AS - Changing the game
>
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>

Darren J Moffat

2009-Dec-16 16:46 UTC

head link

[zfs-discuss] DeDup and Compression - Reverse Order?

Andrey Kuzmin wrote:> On Wed, Dec 16, 2009 at 7:25 PM, Kjetil Torgrim Homme
> <kjetilho at linpro.no> wrote:
>> Andrey Kuzmin <andrey.v.kuzmin at gmail.com> writes:
>>> Yet again, I don''t see how RAID-Z reconstruction is
related to the
>>> subject discussed (what data should be sha256''ed when both
dedupe and
>>> compression are enabled, raw or compressed ). sha256 has nothing to
do
>>> with bad block detection (may be it will when encryption is
>>> implemented, but for now sha256 is used for duplicate candidates
>>> look-up only).
>> how do you think RAID-Z resilvering works?  please correct me where
I''m
>> wrong.
> 
> Resilvering has noting to do with sha256: one could resilver long
> before dedupe was introduced in zfs.
SHA256 isn''t just used for dedup it is available as one of the checksum
algorithms right back to pool version 1 that integrated in build 27.

SHA256 is also used to checksum the pool uberblock.

This means that SHA256 is used during resilvering and especially so if 
you have checksum=sha256 for your datasets.

If you still don''t believe me check the source code history:

http://src.opensolaris.org/source/history/onnv/onnv-gate/usr/src/uts/common/fs/zfs/zio_checksum.c
http://src.opensolaris.org/source/history/onnv/onnv-gate/usr/src/uts/common/fs/zfs/sha256.c

Look at the date when that integrated 31st October 2005.

In case you still doubt me look at the fix I just integrated today:

http://mail.opensolaris.org/pipermail/onnv-notify/2009-December/011090.html


-- 
Darren J Moffat

Andrey Kuzmin

2009-Dec-16 16:52 UTC

head link

[zfs-discuss] DeDup and Compression - Reverse Order?

On Wed, Dec 16, 2009 at 7:46 PM, Darren J Moffat
<darrenm at opensolaris.org> wrote:> Andrey Kuzmin wrote:
>>
>> On Wed, Dec 16, 2009 at 7:25 PM, Kjetil Torgrim Homme
>> <kjetilho at linpro.no> wrote:
>>>
>>> Andrey Kuzmin <andrey.v.kuzmin at gmail.com> writes:
>>>>
>>>> Yet again, I don''t see how RAID-Z reconstruction is
related to the
>>>> subject discussed (what data should be sha256''ed when
both dedupe and
>>>> compression are enabled, raw or compressed ). sha256 has
nothing to do
>>>> with bad block detection (may be it will when encryption is
>>>> implemented, but for now sha256 is used for duplicate
candidates
>>>> look-up only).
>>>
>>> how do you think RAID-Z resilvering works? ?please correct me where
I''m
>>> wrong.
>>
>> Resilvering has noting to do with sha256: one could resilver long
>> before dedupe was introduced in zfs.
>
> SHA256 isn''t just used for dedup it is available as one of the
checksum
> algorithms right back to pool version 1 that integrated in build 27.
''One of'' is the key word. And thanks for code pointers,
I''ll take a look.

Regards,
Andrey>
> SHA256 is also used to checksum the pool uberblock.
>
> This means that SHA256 is used during resilvering and especially so if you
> have checksum=sha256 for your datasets.
>
> If you still don''t believe me check the source code history:
>
>
http://src.opensolaris.org/source/history/onnv/onnv-gate/usr/src/uts/common/fs/zfs/zio_checksum.c
>
http://src.opensolaris.org/source/history/onnv/onnv-gate/usr/src/uts/common/fs/zfs/sha256.c
>
> Look at the date when that integrated 31st October 2005.
>
> In case you still doubt me look at the fix I just integrated today:
>
> http://mail.opensolaris.org/pipermail/onnv-notify/2009-December/011090.html
>
>
> --
> Darren J Moffat
>

Kjetil Torgrim Homme

2009-Dec-17 00:33 UTC

head link

[zfs-discuss] DeDup and Compression - Reverse Order?

Andrey Kuzmin <andrey.v.kuzmin at gmail.com>
writes:> Darren J Moffat wrote:
>> Andrey Kuzmin wrote:
>>> Resilvering has noting to do with sha256: one could resilver long
>>> before dedupe was introduced in zfs.
>>
>> SHA256 isn''t just used for dedup it is available as one of the
>> checksum algorithms right back to pool version 1 that integrated in
>> build 27.
>
> ''One of'' is the key word. And thanks for code pointers,
I''ll take a
> look.
I didn''t mention sha256 at all :-).  the reasoning is the same no
matter
what hash algorithm you''re using (fletcher2, fletcher4 or sha256. 
dedup
doesn''t require sha256 either, you can use fletcher4.

the question was: why does data have to be compressed before it can be
recognised as a duplicate?  it does seem like a waste of CPU, no?  I
attempted to show the downsides to identifying blocks by their
uncompressed hash.  (BTW, it doesn''t affect storage efficiency, the
same
duplicate blocks will be discovered either way.)

-- 
Kjetil T. Homme
Redpill Linpro AS - Changing the game

Andrey Kuzmin

2009-Dec-17 08:22 UTC

head link

[zfs-discuss] DeDup and Compression - Reverse Order?

Downside you have described happens only when the same checksum is
used for data protection and duplicate detection. This implies sha256,
BTW, since fletcher-based dedupe has been dropped in recent builds.

On 12/17/09, Kjetil Torgrim Homme <kjetilho at linpro.no>
wrote:> Andrey Kuzmin <andrey.v.kuzmin at gmail.com> writes:
>> Darren J Moffat wrote:
>>> Andrey Kuzmin wrote:
>>>> Resilvering has noting to do with sha256: one could resilver
long
>>>> before dedupe was introduced in zfs.
>>>
>>> SHA256 isn''t just used for dedup it is available as one of
the
>>> checksum algorithms right back to pool version 1 that integrated in
>>> build 27.
>>
>> ''One of'' is the key word. And thanks for code
pointers, I''ll take a
>> look.
>
> I didn''t mention sha256 at all :-).  the reasoning is the same no
matter
> what hash algorithm you''re using (fletcher2, fletcher4 or sha256. 
dedup
> doesn''t require sha256 either, you can use fletcher4.
>
> the question was: why does data have to be compressed before it can be
> recognised as a duplicate?  it does seem like a waste of CPU, no?  I
> attempted to show the downsides to identifying blocks by their
> uncompressed hash.  (BTW, it doesn''t affect storage efficiency,
the same
> duplicate blocks will be discovered either way.)
>
> --
> Kjetil T. Homme
> Redpill Linpro AS - Changing the game
>
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>

-- 
Regards,
Andrey

Kjetil Torgrim Homme

2009-Dec-17 14:32 UTC

head link

[zfs-discuss] DeDup and Compression - Reverse Order?

Andrey Kuzmin <andrey.v.kuzmin at gmail.com> writes:
> Downside you have described happens only when the same checksum is
> used for data protection and duplicate detection. This implies sha256,
> BTW, since fletcher-based dedupe has been dropped in recent builds.
if the hash used for dedup is completely separate from the hash used for
data protection, I don''t see any downsides to computing the dedup hash
from uncompressed data.  why isn''t it?

-- 
Kjetil T. Homme
Redpill Linpro AS - Changing the game

Darren J Moffat

2009-Dec-17 14:45 UTC

head link

[zfs-discuss] DeDup and Compression - Reverse Order?

Kjetil Torgrim Homme wrote:> Andrey Kuzmin <andrey.v.kuzmin at gmail.com> writes:
> 
>> Downside you have described happens only when the same checksum is
>> used for data protection and duplicate detection. This implies sha256,
>> BTW, since fletcher-based dedupe has been dropped in recent builds.
> 
> if the hash used for dedup is completely separate from the hash used for
> data protection, I don''t see any downsides to computing the dedup
hash
> from uncompressed data.  why isn''t it?
It isn''t separate because that isn''t how Jeff and Bill
designed it.  I
think the design the have is great.

Instead of trying to pick holes in the theory can you demonstrate a real 
performance problem with compression=on and dedup=on and show that it is 
because of the compression step ?

Otherwise if you want it changed code it up and show how what you have 
done is better in all cases.

-- 
Darren J Moffat

Kjetil Torgrim Homme

2009-Dec-17 15:14 UTC

head link

[zfs-discuss] DeDup and Compression - Reverse Order?

Darren J Moffat <darrenm at opensolaris.org>
writes:> Kjetil Torgrim Homme wrote:
>> Andrey Kuzmin <andrey.v.kuzmin at gmail.com> writes:
>>
>>> Downside you have described happens only when the same checksum is
>>> used for data protection and duplicate detection. This implies
sha256,
>>> BTW, since fletcher-based dedupe has been dropped in recent builds.
>>
>> if the hash used for dedup is completely separate from the hash used
>> for data protection, I don''t see any downsides to computing
the dedup
>> hash from uncompressed data.  why isn''t it?
>
> It isn''t separate because that isn''t how Jeff and Bill
designed it.
thanks for confirming that, Darren.
> I think the design the have is great.
I don''t disagree.
> Instead of trying to pick holes in the theory can you demonstrate a
> real performance problem with compression=on and dedup=on and show
> that it is because of the compression step ?
compression requires CPU, actually quite a lot of it.  even with the
lean and mean lzjb, you will get not much more than 150 MB/s per core or
something like that.  so, if you''re copying a 10 GB image file, it will
take a minute or two, just to compress the data so that the hash can be
computed so that the duplicate block can be identified.  if the dedup
hash was based on uncompressed data, the copy would be limited by
hashing efficiency (and dedup tree lookup).

I don''t know how tightly interwoven the dedup hash tree and the block
pointer hash tree are, or if it is all possible to disentangle them.

conceptually it doesn''t seem impossible, but that''s easy for
me to
say, with no knowledge of the zio pipeline...

oh, how does encryption play into this?  just don''t?  knowing that
someone else has the same block as you is leaking information, but that
may be acceptable -- just make different pools for people you don''t
trust.
> Otherwise if you want it changed code it up and show how what you have
> done is better in all cases.
I wish I could :-)

-- 
Kjetil T. Homme
Redpill Linpro AS - Changing the game

Darren J Moffat

2009-Dec-17 16:14 UTC

head link

[zfs-discuss] DeDup and Compression - Reverse Order?

Kjetil Torgrim Homme wrote:
> I don''t know how tightly interwoven the dedup hash tree and the
block
> pointer hash tree are, or if it is all possible to disentangle them.
At the moment I''d say very interwoven by desgin.
> conceptually it doesn''t seem impossible, but that''s easy
for me to
> say, with no knowledge of the zio pipeline...
Correct it isn''t impossible but instead there would probably need to be
two checksums held, one of the untransformed data (ie uncompressed and 
unencrypted) and one of the transformed data (compressed and encrypted). 
  That has different tradeoffs and SHA256 can be expensive too see:

http://blogs.sun.com/darren/entry/improving_zfs_dedup_performance_via

Note also that the compress/encrypt/checksum and the dedup are separate 
pipeline stages so while dedup is happening for block N block N+1 can be 
getting transformed - so this is designed to take advantage of multiple 
scheduling units (threads,cpus,cores etc).
> oh, how does encryption play into this?  just don''t?  knowing that
> someone else has the same block as you is leaking information, but that
> may be acceptable -- just make different pools for people you
don''t
> trust.
compress, encrypt, checksum, dedup.

You are correct that it is an information leak but only within a dataset 
and its clones and only if you can observe the deduplication stats (and 
you need to use zdb to get enough info to see the leak - and that means 
you have access to the raw devices), the deupratio isn''t really enough 
unless the pool is really idle or has only one user writing at a time.

For the encryption case deduplication of the same plaintext block will 
only work with in a dataset or a clone of it - because only in those 
cases do you have the same key (and the way I have implemented the IV 
generation for AES CCM/GCM mode ensures that the same plaintext will 
have the same IV so the ciphertexts will match).  Also if you place a 
block in an unencrypted dataset that happens to match the ciphertext in 
an encrypted dataset they won''t dedup either (you need to understand 
what I''ve done with the AES CCM/GCM MAC and the zio_chksum_t field in 
the blkptr_t and how that is used by dedup to see why).

If that small information leak isn''t acceptable even within the dataset
then don''t enable both encryption and deduplication on those datasets -
and don''t delegate that property to your users either.  Or you can 
frequently rekey your per dataset data encryption keys ''zfs key
-K'' but
then you might as well turn dedup off - other there are some very good 
usecases in multi level security where doing dedup/encryption and rekey 
provides a nice effect.

-- 
Darren J Moffat

Bob Friesenhahn

2009-Dec-17 16:18 UTC

head link

[zfs-discuss] DeDup and Compression - Reverse Order?

On Thu, 17 Dec 2009, Kjetil Torgrim Homme wrote:>
> compression requires CPU, actually quite a lot of it.  even with the
> lean and mean lzjb, you will get not much more than 150 MB/s per core or
> something like that.  so, if you''re copying a 10 GB image file, it
will
> take a minute or two, just to compress the data so that the hash can be
> computed so that the duplicate block can be identified.  if the dedup
> hash was based on uncompressed data, the copy would be limited by
> hashing efficiency (and dedup tree lookup).
It is useful to keep in mind that dedupication can save a lot of disk 
space but it is usually only quite effective in certain circumstances, 
such as when replicating a collection of files.  The majority of write 
I/O will never benefit from deduplication.  Based on this, 
speculatively assuming that the data will not be deduplicated does not 
result in increased cost most of the time.  If the data does end up 
being deduplicated, then that is a blessing.

Bob
--
Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Nicolas Williams

2009-Dec-17 17:13 UTC

head link

[zfs-discuss] DeDup and Compression - Reverse Order?

On Thu, Dec 17, 2009 at 03:32:21PM +0100, Kjetil Torgrim Homme
wrote:> if the hash used for dedup is completely separate from the hash used for
> data protection, I don''t see any downsides to computing the dedup
hash
> from uncompressed data.  why isn''t it?
Hash and checksum functions are slow (hash functions are slower, but
either way you''ll be loading large blocks of data, which sets a floor
for cost).  Duplicating work is bad for performance.  Using the same
checksum for integrity protection and dedup is an optimization, and a
very nice one at that.  Having separate checksums would require making
blkptr_t larger, which imposes its own costs.

There''s lots of trade-offs here.  Using the same checksum/hash for
integrity protection and dedup is a great solution.

If you use a non-cryptographic checksum algorithm then you''ll
want to enable verification for dedup.  That''s all.

Nico
--

Andrey Kuzmin

2009-Dec-17 19:56 UTC

head link

[zfs-discuss] DeDup and Compression - Reverse Order?

On Thu, Dec 17, 2009 at 6:14 PM, Kjetil Torgrim Homme
<kjetilho at linpro.no> wrote:> Darren J Moffat <darrenm at opensolaris.org> writes:
>> Kjetil Torgrim Homme wrote:
>>> Andrey Kuzmin <andrey.v.kuzmin at gmail.com> writes:
>>>
>>>> Downside you have described happens only when the same checksum
is
>>>> used for data protection and duplicate detection. This implies
sha256,
>>>> BTW, since fletcher-based dedupe has been dropped in recent
builds.
>>>
>>> if the hash used for dedup is completely separate from the hash
used
>>> for data protection, I don''t see any downsides to
computing the dedup
>>> hash from uncompressed data. ?why isn''t it?
>>
>> It isn''t separate because that isn''t how Jeff and
Bill designed it.
>
> thanks for confirming that, Darren.
>
>> I think the design the have is great.
>
> I don''t disagree.
>
>> Instead of trying to pick holes in the theory can you demonstrate a
>> real performance problem with compression=on and dedup=on and show
>> that it is because of the compression step ?
>
> compression requires CPU, actually quite a lot of it. ?even with the
> lean and mean lzjb, you will get not much more than 150 MB/s per core or
> something like that. ?so, if you''re copying a 10 GB image file, it
will
> take a minute or two, just to compress the data so that the hash can be
> computed so that the duplicate block can be identified. ?if the dedup
> hash was based on uncompressed data, the copy would be limited by
> hashing efficiency (and dedup tree lookup)
This isn''t exactly true. If, speculatively, one stores two hashes, one
for uncompressed data in ddt and another one, for compressed data,
with data block for data healing, one wins compression for duplicates
and pays by extra hash computation for singletons. So a more correct
question would be if the set of cases where duplicates/singletons and
compression/hashing bandwidth ratios are such that one wins is
non-empty (or, rather, o practical importance).

Regards,
Andrey
.>
> I don''t know how tightly interwoven the dedup hash tree and the
block
> pointer hash tree are, or if it is all possible to disentangle them.
>
> conceptually it doesn''t seem impossible, but that''s easy
for me to
> say, with no knowledge of the zio pipeline...
>
> oh, how does encryption play into this? ?just don''t? ?knowing that
> someone else has the same block as you is leaking information, but that
> may be acceptable -- just make different pools for people you
don''t
> trust.
>
>> Otherwise if you want it changed code it up and show how what you have
>> done is better in all cases.
>
> I wish I could :-)
>
> --
> Kjetil T. Homme
> Redpill Linpro AS - Changing the game
>
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>

Daniel Carosone

2009-Dec-17 23:53 UTC

head link

[zfs-discuss] DeDup and Compression - Reverse Order?

Your parenthetical comments here raise some concerns, or at least eyebrows, with
me.  Hopefully you can lower them again.
> compress, encrypt, checksum, dedup.
> (and you need to use zdb to get enough info to see the
> leak - and that means you have access to the raw devices)
An attacker with access to the raw devices is the primary base threat model for
on-disk encryption, surely?

An attacker with access to disk traffic, via e.g. iSCSI, who can also deploy
dynamic traffic analysis in addition to static content analysis, and who also
has similarly greater opportunities for tampering, is another trickier threat
model.

It seems like entirely wrong thinking (even in parentheses) to dismiss an issue
as irrelevant because it only applies in the primary threat model.
> (and the way I have implemented the IV 
> generation for AES CCM/GCM mode ensures that the same
> plaintext will have the same IV so the ciphertexts will match).
Again, this seems like a cause for concern.  Have you effectively turned these
fancy and carefully designed crypto modes back into ECB, albeit at a larger
block size (and only within a dataset)?

Let''s consider copy-on-write semantics: with the above issue an
attacker can tell which blocks of a file have changed over time, even if
unchanged blocks have been rewritten.. giving even the static image attacker
some traffic analysis capability.

This would be a problem regardless of dedup, for the scenario where the attacker
can see repeated ciphertext on disk (unless the dedup metadata itself is
sufficiently encrypted, which I understand it is not).
> (you need to understand 
> what I''ve done with the AES CCM/GCM MAC
I''d like to, but more to understand what (if any) protection is given
against replay attacks (above that already provided by the merkle hash tree).

I await ZFS crypto with even more enthusiasm than dedup, thanks for talking
about the details with us.
-- 
This message posted from opensolaris.org

Kjetil Torgrim Homme

2009-Dec-18 10:48 UTC

head link

[zfs-discuss] DeDup and Compression - Reverse Order?

Darren J Moffat <darrenm at opensolaris.org> writes:
> Kjetil Torgrim Homme wrote:
>
>> I don''t know how tightly interwoven the dedup hash tree and
the block
>> pointer hash tree are, or if it is all possible to disentangle them.
>
> At the moment I''d say very interwoven by design.
>
>> conceptually it doesn''t seem impossible, but that''s
easy for me to
>> say, with no knowledge of the zio pipeline...
>
> Correct it isn''t impossible but instead there would probably need
to
> be two checksums held, one of the untransformed data (ie uncompressed
> and unencrypted) and one of the transformed data (compressed and
> encrypted). That has different tradeoffs and SHA256 can be expensive
> too see:
>
> http://blogs.sun.com/darren/entry/improving_zfs_dedup_performance_via
great work!  SHA256 is more expensive than I thought, even with
misc/sha2 it takes 1 ms per 128 KiB?  that''s roughly the same CPU usage
as lzjb!  in that case hashing the (smaller) compressed data is more
efficient than doing an additional hash of the full uncompressed block.

it''s interesting to note that 64 KiB looks faster (a bit hard to read
the chart accurately), L1 cache size coming into play, perhaps?
> Note also that the compress/encrypt/checksum and the dedup are
> separate pipeline stages so while dedup is happening for block N block
> N+1 can be getting transformed - so this is designed to take advantage
> of multiple scheduling units (threads,cpus,cores etc).
nice.  are all of them separate stages, or are compress/encrypt/checksum
done as one stage?
>> oh, how does encryption play into this?  just don''t?  knowing
that
>> someone else has the same block as you is leaking information, but that
>> may be acceptable -- just make different pools for people you
don''t
>> trust.
>
> compress, encrypt, checksum, dedup.
>
> You are correct that it is an information leak but only within a
> dataset and its clones and only if you can observe the deduplication
> stats (and you need to use zdb to get enough info to see the leak -
> and that means you have access to the raw devices), the deupratio
> isn''t really enough unless the pool is really idle or has only one
> user writing at a time.
>
> For the encryption case deduplication of the same plaintext block will
> only work within a dataset or a clone of it - because only in those
> cases do you have the same key (and the way I have implemented the IV
> generation for AES CCM/GCM mode ensures that the same plaintext will
> have the same IV so the ciphertexts will match).
makes sense.
> Also if you place a block in an unencrypted dataset that happens to
> match the ciphertext in an encrypted dataset they won''t dedup
either
> (you need to understand what I''ve done with the AES CCM/GCM MAC
and
> the zio_chksum_t field in the blkptr_t and how that is used by dedup
> to see why).
wow, I didn''t think of that problem.  did you get bitten by wrongful
dedup during testing with image files? :-)
> If that small information leak isn''t acceptable even within the
> dataset then don''t enable both encryption and deduplication on
those
> datasets - and don''t delegate that property to your users either. 
Or
> you can frequently rekey your per dataset data encryption keys
''zfs
> key -K'' but then you might as well turn dedup off - other there
are
> some very good usecases in multi level security where doing
> dedup/encryption and rekey provides a nice effect.
indeed.  ZFS is extremely flexible.

thank you for your response, it was very enlightening.
-- 
Kjetil T. Homme
Redpill Linpro AS - Changing the game

Darren J Moffat

2009-Dec-21 22:44 UTC

head link

[zfs-discuss] DeDup and Compression - Reverse Order?

Daniel Carosone wrote:> Your parenthetical comments here raise some concerns, or at least eyebrows,
with me.  Hopefully you can lower them again.
> 
>> compress, encrypt, checksum, dedup.
> 
> 
>> (and you need to use zdb to get enough info to see the
>> leak - and that means you have access to the raw devices)
> 
> An attacker with access to the raw devices is the primary base threat model
for on-disk encryption, surely?
> 
> An attacker with access to disk traffic, via e.g. iSCSI, who can also
deploy dynamic traffic analysis in addition to static content analysis, and who
also has similarly greater opportunities for tampering, is another trickier
threat model.
> 
> It seems like entirely wrong thinking (even in parentheses) to dismiss an
issue as irrelevant because it only applies in the primary threat model.
I wasn''t dismissing it I was pointing out that this wasn''t
something an
unprivilege end user could easily do.

If the risk is unacceptable then dedup shouldn''t be enabled.  For some 
uses cases the risk is acceptable and for those use cases we want to 
allow the use of dedup with encryption.
>> (and the way I have implemented the IV 
>> generation for AES CCM/GCM mode ensures that the same
>> plaintext will have the same IV so the ciphertexts will match).
> 
> Again, this seems like a cause for concern.  Have you effectively turned
these fancy and carefully designed crypto modes back into ECB, albeit at a
larger block size (and only within a dataset)?
No I don''t believe I have.  The IV generation when doing deduplication 
is done by calculating an HMAC of the plaintext using a separate per 
dataset key (that is also refreshed if ''zfs key -K'' is run to
rekey the
dataset).
> Let''s consider copy-on-write semantics: with the above issue an
attacker can tell which blocks of a file have changed over time, even if
unchanged blocks have been rewritten.. giving even the static image attacker
some traffic analysis capability.
So if that is part of your deployment risk model deduplication is not 
worth enabling in that case.
> This would be a problem regardless of dedup, for the scenario where the
attacker can see repeated ciphertext on disk (unless the dedup metadata itself
is sufficiently encrypted, which I understand it is not).
In the case where deduplication is not enabled the IV generation uses a 
compbination of the txg number, the object and blockid which complies 
with the recommendations for IV generation for both CCM and GCM.
>> (you need to understand 
>> what I''ve done with the AES CCM/GCM MAC
> 
> I''d like to, but more to understand what (if any) protection is
given against replay attacks (above that already provided by the merkle hash
tree).
What do you mean by a replay attack ?  What is being replayed and by whom ?

-- 
Darren J Moffat

Darren J Moffat

2009-Dec-21 22:47 UTC

head link

[zfs-discuss] DeDup and Compression - Reverse Order?

Kjetil Torgrim Homme wrote:
>> Note also that the compress/encrypt/checksum and the dedup are
>> separate pipeline stages so while dedup is happening for block N block
>> N+1 can be getting transformed - so this is designed to take advantage
>> of multiple scheduling units (threads,cpus,cores etc).
> 
> nice.  are all of them separate stages, or are compress/encrypt/checksum
> done as one stage?
Originally compress, encrypt, checksum were all separate stages in the 
zio pipeline they are now all one stage ZIO_WRITE_BP_INIT for the write 
case and ZIO_READ_BP_INIT for the read case.

>> Also if you place a block in an unencrypted dataset that happens to
>> match the ciphertext in an encrypted dataset they won''t dedup
either
>> (you need to understand what I''ve done with the AES CCM/GCM
MAC and
>> the zio_chksum_t field in the blkptr_t and how that is used by dedup
>> to see why).
> 
> wow, I didn''t think of that problem.  did you get bitten by
wrongful
> dedup during testing with image files? :-)
No I didn''t see the problem in reality I just thought about it in as a 
possible risk that needed to be addressed.

Solving it didn''t actually require me to do any additional work because
ZFS uses a separate table for each checksum algorithm anyway and the 
checksum algorithm for encrypted datasets is listed as sha256+mac not 
sha256.  It was nice that I didn''t have to write more code to solve the
problem but it may not have been that way.

-- 
Darren J Moffat

Daniel Carosone

2009-Dec-22 01:44 UTC

head link

[zfs-discuss] zfs-crypto vs. tampering and replay

On Mon, Dec 21, 2009 at 02:44:00PM -0800, Darren J Moffat
wrote:> The IV generation when doing deduplication  
> is done by calculating an HMAC of the plaintext using a separate per  
> dataset key (that is also refreshed if ''zfs key -K'' is
run to rekey the
> dataset).
> [..]
> In the case where deduplication is not enabled the IV generation uses a  
> compbination of the txg number, the object and blockid which complies  
> with the recommendations for IV generation for both CCM and GCM.
Aha!  This was the crucial detail - that IV generation depends on the
dedup setting.  Makes perfect sense now, and seems like a sensible
choice to enable a meaningful risk vs space tradeoff, rather than have
mutually defeating features.

One (obvious, and no doubt well dealt with) question: I presume the IV
method is stored per block, similar to compression, such that changing
the dedup setting doesn''t cause decryption to use the wrong IV?
> What do you mean by a replay attack?
> What is being replayed and by whom?
Replay of previous disk blocks, substituted for more recent contents,
by an attacker who either has offline access to the disk or MITM
access to the storage path.  Contrived example: playing back an old disk
block with a previous, compromised password, instead of the block with
the new, changed password.

Many disk encryption mechanisms don''t provide integrity at all, in
part because (other than some recent advanced crypto modes) it cost
extra space and therefore brought many other complexities. Even those
with some integrity may not defend strongly against this replay. 

ZFS clearly is game-changing, particularly with respect to integrity
protections.  With the hopefully-imminent introduction of zfs-crypto,
it''s worthwhile understanding where the interaction and boundary
between protection-against-error and protection-against-malice falls.

Including the txg ctr in the IV (for non-dedup) is clearly useful,
as is CoW (which changes the block number), but how far does it go
back up the tree?  Coming back down the tree, when do I first
encounter zfs-crypto, and what can I fiddle with on the way to bypass
or defeat its ongoing protections?

Put as a threat: I have access to your data pool (image). We''ll ignore
boot-time integrity issues for your rpool and zfs-crypto executable
code, for now.  We''ll also assume, for simplicity, that I''m
not trying
to tamper with a live running pool. Perhaps I''m a malicious SAN admin,
or the SAN admin hasn''t locked down his infrastructure. Perhaps you
left your usb widget at home and I used my ninja skills to break in. 

I want to tamper with your data, and for you to accept my changes
without detection:

 With plain ZFS, I have a whole lot of other work to do than I would
 with normal filesystems, updating hash chains back up to the
 uberblock.  It''s tricky, but doesn''t require any secrets. 
It''s
 trivial if I don''t need to rewrite history and can just commit a new
 txg.  

 With ZFS-crypto, the question is how far the integrity protection is
 extended with the inclusion of keys.  What kinds of attack can still
 be mounted without keys, particularly against metadata tampering?  

It''s the difference between hash and hmac/signature, and
what''s
included in each.

One example threat scenario: what if I change the dataset properties
to "crypto=off", will that cause future writes to be in plaintext? 
What would help a user notice?

None of this is a criticism, zfs-crypto is never going to be a
universal solution to all threats - it''s about understanding its
coverage and limitations.  It''s about deciding whether/which existing
(much less convenient) defenses can be dropped in favour of this new
hotness, as much as it is about which of those still provide value or
which threats remain unprotected because no defense is available or
economical. 

(Subject: changed accordingly)

--
Dan.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 194 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20091222/498b69d0/attachment.bin>

zfs discuss - Dec 2009 - DeDup and Compression - Reverse Order?

[zfs-discuss] DeDup and Compression - Reverse Order?

[zfs-discuss] DeDup and Compression - Reverse Order?

[zfs-discuss] DeDup and Compression - Reverse Order?

[zfs-discuss] DeDup and Compression - Reverse Order?

[zfs-discuss] DeDup and Compression - Reverse Order?

[zfs-discuss] DeDup and Compression - Reverse Order?

[zfs-discuss] DeDup and Compression - Reverse Order?

[zfs-discuss] DeDup and Compression - Reverse Order?

[zfs-discuss] DeDup and Compression - Reverse Order?

[zfs-discuss] DeDup and Compression - Reverse Order?

[zfs-discuss] DeDup and Compression - Reverse Order?

[zfs-discuss] DeDup and Compression - Reverse Order?

[zfs-discuss] DeDup and Compression - Reverse Order?

[zfs-discuss] DeDup and Compression - Reverse Order?

[zfs-discuss] DeDup and Compression - Reverse Order?

[zfs-discuss] DeDup and Compression - Reverse Order?

[zfs-discuss] DeDup and Compression - Reverse Order?

[zfs-discuss] DeDup and Compression - Reverse Order?

[zfs-discuss] DeDup and Compression - Reverse Order?

[zfs-discuss] DeDup and Compression - Reverse Order?

[zfs-discuss] DeDup and Compression - Reverse Order?

[zfs-discuss] DeDup and Compression - Reverse Order?

[zfs-discuss] DeDup and Compression - Reverse Order?

[zfs-discuss] DeDup and Compression - Reverse Order?

[zfs-discuss] DeDup and Compression - Reverse Order?

[zfs-discuss] DeDup and Compression - Reverse Order?

[zfs-discuss] DeDup and Compression - Reverse Order?

[zfs-discuss] DeDup and Compression - Reverse Order?

[zfs-discuss] DeDup and Compression - Reverse Order?

[zfs-discuss] DeDup and Compression - Reverse Order?

[zfs-discuss] DeDup and Compression - Reverse Order?

[zfs-discuss] zfs-crypto vs. tampering and replay