thr3ads.net - zfs crypto discuss - Truncating SHA2 hashes vs shortening a MAC for ZFS Crypto [Oct 2009]

If this information is useful, please help other people find it:
Share via:

Darren J Moffat

2009-Oct-30 17:30 UTC

Truncating SHA2 hashes vs shortening a MAC for ZFS Crypto

For the encryption functionality in the ZFS filesystem we use AES in CCM 
or GCM mode at the block level to provide confidentiality and 
authentication.  There is also a SHA256 checksum per block (of the 
ciphertext) that forms a Merkle tree of all the blocks in the pool. 
Note that I have to store the full IV in the block.   A block here is a 
ZFS block which is any power of two from 512 bytes to 128k (the default).

The SHA256 checksums are used even for blocks in the pool that aren''t 
encrypted and are used for detecting and repairing (resilvering) block 
corruption.  Each filesystem in the pool has its own wrapping key and 
data encryption keys.

Due to some unchangeable constraints I have only 384 bits of space to 
fit in all of: IV, MAC (CCM or GCM Auth Tag), and the SHA256 checksum, 
which best case would need about 480 bits.

Currently I have Option 1 below but I the truncation of SHA256 down to 
128 bits makes me question if this is safe.  Remember the SHA256 is of 
the ciphertext and is used for resilvering.

Option 1
--------
IV		96 bits  (the max CCM allows given the other params)
MAC		128 bits
Checksum	SHA256 truncated to 128 bits

Other options are:

Option 2
--------
IV		96 bits
MAC		128 bits
Checksum	SHA224 truncated to 128 bits

	Basically if I have to truncate to 128 bits is it better to do
	it against SHA224 or SHA256 ?

Option 3
--------
IV		96 bits
MAC		128 bits
Checksum	SHA224 or SHA256 truncated to 160 bits

	Obviously better than the 1 and 2 but how much better ?
	The reason it isn''t used just now is because it is slightly
	harder to layout given other constrains in where the data lives.

Option 4
--------
IV		96 bits
MAC		32 bits
Checksum	SHA256 at full 256 bits

	I''m pretty sure the size of the MAC is far to small.

Option 5
--------
IV		96 bits
MAC		64 bits
Checksum	SHA224 at full 224 bits

	This feels like the best compromise, but is it ?

Option 6
--------
IV		96 bits
MAC		96 bits
Checksum	SHA224 or SHA256 truncated to 192 bits

-- 
Darren J Moffat

Zooko Wilcox-O''Hearn

2009-Nov-02 05:33 UTC

head link

Truncating SHA2 hashes vs shortening a MAC for ZFS Crypto

Dear Darren J Moffat:

I don''t understand why you need a MAC when you already have the hash  
of the ciphertext.  Does it have something to do with the fact that  
the checksum is non-cryptographic by default (docs.sun.com/app 
docs/doc/819-5461/ftyue?a=view ), and is that still true?  Your  
original design document [1] said you needed a way to force the  
checksum to be SHA-256 if encryption was turned on.  But back then  
you were planning to support non-authenticating modes like CBC.  I  
guess once you dropped non-authenticating modes then you could relax  
that requirement to force the checksum to be secure.

Too bad, though!  Not only are you now tight on space in part because  
you have two integrity values where one ought to do, but also a  
secure hash of the ciphertext is actually stronger than a MAC!  A  
secure hash of the ciphertext tells whether the ciphertext is right  
(assuming the hash function is secure and implemented correctly).   
Given that the ciphertext is right, then the plaintext is right  
(given that the encryption is implemented correctly and you use the  
right decryption key).  A MAC on the plaintext tells you only that  
the plaintext was chosen by someone who knew the key.  See what I  
mean?  A MAC can''t be used to give someone the ability to read some  
data while withholding from them the ability to alter that data.  A  
secure hash can.

One of the founding ideas of the whole design of ZFS was end-to-end  
integrity checking.  It does that successfully now, for the case of  
accidents, using large checksums.  If the checksum is secure then it  
also does it for the case of malice.  In contrast a MAC doesn''t do  
"end-to-end" integrity checking.  For example, if you''ve
previously
allowed someone to read a filesystem (i.e., you''ve given them access  
to the key), but you never gave them permission to write to it, but  
they are able to exploit the isses that you mention at the beginning  
of [1] such as "Untrusted path to SAN", then the MAC can''t
stop them
from altering the file, nor can the non-secure checksum, but a secure  
hash can (provided that they can''t overwrite all the way up the  
Merkle Tree of the whole pool and any copies of the Merkle Tree root  
hash).

Likewise, a secure hash can be relied on as a dedupe tag *even* if  
someone with malicious intent may have slipped data into the pool.   
An insecure hash or a MAC tag can''t -- a malicious actor could submit  
data which would cause a collision in an insecure hash or a MAC tag,  
causing tag-based dedupe to mistakenly unify two different blocks.

So, since you''re tight on space, it would be really nice if you could  
tell your users to use a secure hash for the checksum and then  
allocate more space to the secure hash value and less space to the  
now-unnecessary MAC tag.  :-)

Anyway, if this is the checksum which is used for dedupe then  
remember the birthday so-called paradox -- some people may be  
uncomfortable with the prospect of not being able to safely dedupe  
their 2^64-block storage pool if the hash is only 128 bits, for  
example.  :-)  Maybe you could include the MAC tag in the dedupe  
comparison.

Also, the IVs for GCM don''t need to be random, they need only to be  
unique.  Can you use a block number and birth number or other such  
guaranteed-unique data instead of storing an IV?  (Apropos recent  
discussion on the cryptography list [2].)

Regards,

Zooko

[1] hub.opensolaris.org/bin/download/Project+zfs-crypto 
files/zfs%2Dcrypto%2Ddesign.pdf
[2] mail-archive.com/cryptography at metzdowd.com/msg11020.html
---
Your cloud storage provider does not need access to your data.
Tahoe-LAFS -- allmydata.org

Alexander Klimov

2009-Nov-02 07:45 UTC

head link

Truncating SHA2 hashes vs shortening a MAC for ZFS Crypto

On Fri, 30 Oct 2009, Darren J Moffat wrote:> The SHA256 checksums are used even for blocks in the pool that
aren''t
> encrypted and are used for detecting and repairing (resilvering) block
> corruption.  Each filesystem in the pool has its own wrapping key and
> data encryption keys.
>
> Due to some unchangeable constraints I have only 384 bits of space to
> fit in all of: IV, MAC (CCM or GCM Auth Tag), and the SHA256 checksum,
> which best case would need about 480 bits.
>
> Currently I have Option 1 below but I the truncation of SHA256 down to
> 128 bits makes me question if this is safe.  Remember the SHA256 is of
> the ciphertext and is used for resilvering.
If you use hash only to protect against non-malicious corruptions,
when why you use SHA-2? Would not MD5 or even CRC be enough?

-- 
Regards,
ASK

Matt Ball

2009-Nov-02 15:23 UTC

head link

Truncating SHA2 hashes vs shortening a MAC for ZFS Crypto

Hi Darren,

On Fri, Oct 30, 2009 at 11:30 AM, Darren J Moffat <Darren.Moffat at
sun.com>wrote:
> For the encryption functionality in the ZFS filesystem we use AES in CCM or
> GCM mode at the block level to provide confidentiality and authentication.
>  There is also a SHA256 checksum per block (of the ciphertext) that forms a
> Merkle tree of all the blocks in the pool. Note that I have to store the
> full IV in the block.   A block here is a ZFS block which is any power of
> two from 512 bytes to 128k (the default).
>
> The SHA256 checksums are used even for blocks in the pool that
aren''t
> encrypted and are used for detecting and repairing (resilvering) block
> corruption.  Each filesystem in the pool has its own wrapping key and data
> encryption keys.
>
> Due to some unchangeable constraints I have only 384 bits of space to fit
> in all of: IV, MAC (CCM or GCM Auth Tag), and the SHA256 checksum, which
> best case would need about 480 bits.
>
> Currently I have Option 1 below but I the truncation of SHA256 down to 128
> bits makes me question if this is safe.  Remember the SHA256 is of the
> ciphertext and is used for resilvering.
>
> Option 1
> --------
> IV              96 bits  (the max CCM allows given the other params)
> MAC             128 bits
> Checksum        SHA256 truncated to 128 bits
>
>I personally like the default option 1.  All the others have various
uglinesses.

SHA-224 has patent issues (see US patent
6829355<v3.espacenet.com/textdoc?DB=EPODOC&IDX=US6829355>).
It''s really identical to SHA-256 except that it uses a different
initial
value and truncates to 224 bits.  I would love to see SHA-224 completely
disappear.

Cryptographers will all have different opinions about how big a MAC (i.e.,
cryptographic integrity check) should be, but my take on it is to ask how
big of a CRC would you need in a non-adversarial environment to meet the
undetectable error rate specified within the system, and then use that for
the minimum size of the MAC.  For tape drives I''ve worked on, this was
typically somewhere around 1 undetected error in 10^27 bits.  If you protect
1 data bit, then you''d roughly need an 90 bit CRC, which you could
round up
to 96-bits.  Anything more than 96 bits in my opinion is somewhat overkill.
I''d pick a CCM mac of either 96 bits or 128.

For hashing, it''s a little different since you have to worry about the
birthday paradox.  The size of the hashing output depends on the
undetectable error rate of the system, along with the maximum number of
candidate plaintexts that an adversary could create in finding a hash
collision.  Most cryptographers (not knowing more about the system) would be
conservative and say something like "Use the full 256-bits of SHA-256 to
get
a minimum of 128-bits of security", but realistically for this system, that
would be way overkill.  There''s already a 128-bit CCM MAC to fall back
to,
so here again (given the other safety nets in the system), I think that a
128-bit truncated SHA-256 has would be plenty of assurance for the system.

-- 
Thanks!

Matt Ball, Chair, IEEE P1619 Security in Storage Working Group
Staff Engineer, Sun Microsystems, Inc.
500 Eldorado Blvd, Bldg #5 BRM05-212, Broomfield, CO 80021
Work: 303-272-7580, Cell: 303-717-2717
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<mail.opensolaris.org/pipermail/zfs-crypto-discuss/attachments/20091102/8ccf90d2/attachment.html>

Nicolas Williams

2009-Nov-02 16:39 UTC

head link

Truncating SHA2 hashes vs shortening a MAC for ZFS Crypto

On Sun, Nov 01, 2009 at 10:33:34PM -0700, Zooko Wilcox-O''Hearn
wrote:> I don''t understand why you need a MAC when you already have the
hash
> of the ciphertext.  Does it have something to do with the fact that  
> the checksum is non-cryptographic by default (docs.sun.com/app 
> docs/doc/819-5461/ftyue?a=view ), and is that still true?  Your  
> original design document [1] said you needed a way to force the  
> checksum to be SHA-256 if encryption was turned on.  But back then  
> you were planning to support non-authenticating modes like CBC.  I  
> guess once you dropped non-authenticating modes then you could relax  
> that requirement to force the checksum to be secure.
[Not speaking for Darren...]  No, the requirement to use a strong hash
remains, but since the hash would be there primarily for protection
against errors, I don''t the requirement for a strong hash is really
needed.
> Too bad, though!  Not only are you now tight on space in part because  
> you have two integrity values where one ought to do, but also a  
> secure hash of the ciphertext is actually stronger than a MAC!  A  
> secure hash of the ciphertext tells whether the ciphertext is right  
> (assuming the hash function is secure and implemented correctly).   
> Given that the ciphertext is right, then the plaintext is right  
> (given that the encryption is implemented correctly and you use the  
> right decryption key).  A MAC on the plaintext tells you only that  
> the plaintext was chosen by someone who knew the key.  See what I  
> mean?  A MAC can''t be used to give someone the ability to read
some
> data while withholding from them the ability to alter that data.  A  
> secure hash can.
Users won''t actually get the data keys, only the data key wrapping
keys.
Users who can read the disk and find the wrapped keys and know the
wrapping keys can find the actual data keys, of course, but add in a
host key that the user can''t read and now the user cannot recover their
data keys.  One goal is to protect a system against its users, but
another is to protect user data against maliciou modification by anyone
else.  A MAC provides the first kind of protection if the user can''t
access the data keys, and a MAC provides the second kind of protection
if the data keys can be kept secret.
> One of the founding ideas of the whole design of ZFS was end-to-end  
> integrity checking.  It does that successfully now, for the case of  
> accidents, using large checksums.  If the checksum is secure then it  
> also does it for the case of malice.  In contrast a MAC doesn''t do
> "end-to-end" integrity checking.  For example, if you''ve
previously
> allowed someone to read a filesystem (i.e., you''ve given them
access
> to the key), but you never gave them permission to write to it, but  
> they are able to exploit the isses that you mention at the beginning  
> of [1] such as "Untrusted path to SAN", then the MAC
can''t stop them
> from altering the file, nor can the non-secure checksum, but a secure  
> hash can (provided that they can''t overwrite all the way up the  
> Merkle Tree of the whole pool and any copies of the Merkle Tree root  
> hash).
I think we have to assume that an attacker can write to any part of the
pool, including the Merkle tree roots.  It''d be odd to assume that the
attacker can write anywhere but there -- there''s nothing to make it so!

I.e., we have to at least authenticate the Merkle tree roots.  That
still means depending on collision resistance of the hash function for
security.  If we authenticate every block we don''t have that dependence
(I''ll come back to this).

The interesting thing here is that we want the hash _and_ the MAC, not
just the MAC.  The reason is that we want block pointers (which include
the {IV, MAC, hash} for the block being pointed to) to be visible to the
layer below the filesystem, so that we can scrub/resilver and evacuate
devices from a pool (meaning: re-write all the block pointers point to
blocks on the evacuated devices so that they point elsewhere) even
without having the data keys at hand (more on this below).

We could MAC the Merkle tree roots alone, thus alleviating the space
situation in the block pointer structure (and also saving precious CPU
cycles).  But interestingly we wouldn''t alleviate it that much!  We
need
to store a 96-bit IV, and if we don''t MAC every block then
we''ll want
the strongest hash we can use, so we''ll need at least another 256 bits,
for a total of 352 bits of the 384 that we have to play with.  Whereas
if we MAC every block we might store a 96-bit IV, a 128-bit
authentication tag and 160-bit hash, using all 384 bits.

You get more collision resistance from an N-bit MAC than from a hash of
the same length.  That''s because in the MAC case the forger
can''t check
the forgery without knowing the key, while in the hash case the attacker
can verify that some contents collides with another''s hash.  In the MAC
case an attacker that hasn''t broken the MAC/key must wait until the
system reads the modified block(s) to determine if his/her guess was
correct.  So a 128-bit MAC provides more protection than a 160-bit hash,
and about as much as a 256-bit hash.  If we remove the MAC then the hash
has to grow longer to compensate, thus the space gained by not including
the MAC is minimal, possibly zero.

If we MAC every block then we don''t need the hash function for security
purposes: its main role would still be to provide integrity protection
against errors for scrubbing and resilvering when keys are unavailable.
The hash would continue to provide end-to-end integrity protection
against errors.  The hash would add _some_ security value though: not
only must an attacker seeking to modify data forge the right MAC for the
new contents, they must also find a hash collision (and they must do
this all the way up the Merkle tree).
> Likewise, a secure hash can be relied on as a dedupe tag *even* if  
> someone with malicious intent may have slipped data into the pool.   
For dedup you want to compare block contents on hash equality.  That''s
what ZFS will do.  That defeats your attack on dedup.
> Also, the IVs for GCM don''t need to be random, they need only to
be
> unique.  Can you use a block number and birth number or other such  
> guaranteed-unique data instead of storing an IV?  (Apropos recent  
> discussion on the cryptography list [2].)
The block address can''t be used: a blkptr_t actually stores 1-3 actual
block addresses, but these can change if a block is relocated.

I think the notion that all encrypted/authenticated filesystems need not
be logged in in order to perform certain pool operations is both, very
useful and rather odd.  Odd because once a filesystem is logged in, an
all-powerful administrator could either learn its keys or, if the system
were using a token to avoid this, the admin could abuse those keys --
the sysadmin remains so powerful that trying to protect users against
the sysadmin seems like a waste of resources.  But the ability to
perform some pool operations without having the keys is still useful:
the sysadmin is a user, after all, and might not be around.  Think of a
SAN operator reconfiguring pools without having to have the keys to the
datasets on those pools.

Nico
--

Zooko Wilcox-O''Hearn

2009-Nov-03 16:32 UTC

head link

Truncating SHA2 hashes vs shortening a MAC for ZFS Crypto

[adding cc: zfs-crypto-discuss at opensolaris.org]

David-Sarah:

Yes, a secure hash of the plaintext might give better assurance than  
a secure hash of the ciphertext, because the implementation of the  
cipher could be buggy or because the decryption key could be wrong.   
The latter problem could perhaps be addressed by appending the  
encryption key to the plaintext before encryption.

But my point was about something else: that hashes are actually  
sometimes more robust than MACs from a security engineering  
standpoint even though MACs are much stronger than secure hashes from  
a crypto standpoint.  I think your reply best summarized what I was  
trying to say:

On Monday,2009-11-02, at 23:31 , David-Sarah Hopwood wrote:
> Right. If hashes are used instead of MACs, then the integrity of  
> the system does not depend on keeping secrets. It only depends on  
> preventing the attacker from modifying the root of the Merkle tree.  
> One consequence of this is that if there are side-channel attacks  
> against the implementations of crypto algorithms, there is no  
> information that they can leak to an attacker that would allow  
> compromising integrity.
Yes, and in addition to side-channel attacks and theft of the key,  
there is also the simple fact that with a secure hash you can give a  
person or process the ability to verify the integrity of data without  
thereby giving them the ability to forge data.  With a MAC, you can''t.

The way this might be relevant to ZFS is that they have these  
constraints on how much space they have to store crypto material, and  
they have these issues about integrity and about dedupe, and they  
*already have* a SHA-256 hash of the ciphertext!  So it would seem to  
me that they should leverage that powerful feature that they already  
have: don''t allocate a lot of bits to the MAC tag which is mostly  
redundant.  Maybe just allocate 32 bits to it, and think of it as a  
double-check that you have the right key and that your AES  
implementation is working right.

Also, of course, require that the checksum is SHA-256 and not one of  
the faster, insecure checksums.  Also encourage users (as Jeff  
Bonwick has already done on his blog [1]) to set dedupe to act solely  
on hash tags and not do a full comparison of block data.
> (Of course, the integrity of the OS also needs to be protected. One  
> way of doing that would be to have a TPM, or the same hardware that  
> is used for crypto, store the root hash of the Merkle tree and also  
> the hash of a boot loader that supports ZFS. Then the boot loader  
> would load an OS from the ZFS filesystem, and only that OS would be  
> permitted to update the ZFS root hash.)
Wow -- that is a good idea!

Regards,

Zooko

[1] blogs.sun.com/bonwick/en_US/entry/zfs_dedup
---
Your cloud storage provider does not need access to your data.
Tahoe-LAFS -- allmydata.org

Zooko Wilcox-O''Hearn

2009-Nov-03 17:12 UTC

head link

Truncating SHA2 hashes vs shortening a MAC for ZFS Crypto

following-up to my own post to clarify something important and add  
some further ideas

On Tuesday,2009-11-03, at 9:32 , Zooko Wilcox-O''Hearn wrote:
> don''t allocate a lot of bits to the MAC tag which is mostly  
> redundant.  Maybe just allocate 32 bits to it, and think of it as a  
> double-check that you have the right key and that your AES  
> implementation is working right.
Important note: GCM does *not* have the security properties that you  
expect from a truncated MAC tag: [1, 2].  If you''re relying on the  
MAC tag for integrity (i.e., if the SHA256 tag is truncated to be  
short or if the user is allowed to run with an insecure checksum),  
then you must use a sufficiently large MAC tag.

It seems like the IV field could be mostly or completely optimized  
out by generating the IV at runtime from other data which is  
guaranteed to be unique for this version of this block.  Note that  
you really should use a unique IV on *every write* of the block --  
i.e. for every unique block''s worth of plaintext -- and not re-use  
the same IV for successive contents of the same block.  Do you  
already do that?

Looking at [3] I don''t see anything that obviously fits the bill.   
The Birth Transaction ID uniquely identifies this block as far as I  
understand, but nothing uniquely identifies this particular version  
of this block.  So maybe you could make the IV be the (64-bit) Birth  
Transaction ID plus a  64-bit counter which gets incremented on every  
write and is stored in the place where you are currently storing an  
IV.  That counter could roll-over, in the hopes that someone who  
steals your ciphertext and wants to learn something about your  
plaintext doesn''t have a copy of your ciphertext from 2^64 versions  
ago.  Of course, a larger counter would be better, if you can fit it in.

Regards,

Zooko

[1] csrc.nist.gov/groups/ST/toolkit/BCM/documents/comments/CWC- 
GCM/Ferguson2.pdf
[2] csrc.nist.gov/groups/ST/toolkit/BCM/documents/comments/CWC- 
GCM/gcm-update.pdf
[3] opensolaris.org/os/community/zfs/docs/ondiskformat0822.pdf
---
Your cloud storage provider does not need access to your data.
Tahoe-LAFS -- allmydata.org

Nicolas Williams

2009-Nov-03 18:21 UTC

head link

Truncating SHA2 hashes vs shortening a MAC for ZFS Crypto

On Tue, Nov 03, 2009 at 10:12:06AM -0700, Zooko Wilcox-O''Hearn
wrote:> following-up to my own post to clarify something important and add  
> some further ideas
> 
> On Tuesday,2009-11-03, at 9:32 , Zooko Wilcox-O''Hearn wrote:
> 
> >don''t allocate a lot of bits to the MAC tag which is mostly  
> >redundant.  Maybe just allocate 32 bits to it, and think of it as a  
> >double-check that you have the right key and that your AES  
> >implementation is working right.
> 
> Important note: GCM does *not* have the security properties that you  
> expect from a truncated MAC tag: [1, 2].  If you''re relying on the
> MAC tag for integrity (i.e., if the SHA256 tag is truncated to be  
> short or if the user is allowed to run with an insecure checksum),  
> then you must use a sufficiently large MAC tag.
Exactly.  I proposed to Darren that he MAC only the Merkle tree roots,
and he rejected that as too big a change at this point.  That leaves him
with the MAC/hash size trade-off.  Therefore my recommendation then is
to truncate only the hash.  Yes, that means that you''ll want to enable
dedup block match verification.
> It seems like the IV field could be mostly or completely optimized  
> out by generating the IV at runtime from other data which is  
> guaranteed to be unique for this version of this block.  Note that  
> you really should use a unique IV on *every write* of the block --  
> i.e. for every unique block''s worth of plaintext -- and not re-use
> the same IV for successive contents of the same block.  Do you  
> already do that?
Note that blocks can be relocated when dataset keys are not available,
which means the IV cannot be constructed from block addresses, for
example.
> Looking at [3] I don''t see anything that obviously fits the bill.
> The Birth Transaction ID uniquely identifies this block as far as I  
> understand, but nothing uniquely identifies this particular version  
> of this block.  So maybe you could make the IV be the (64-bit) Birth  
> Transaction ID plus a  64-bit counter which gets incremented on every  
> write and is stored in the place where you are currently storing an  
> IV.  That counter could roll-over, in the hopes that someone who  
> steals your ciphertext and wants to learn something about your  
> plaintext doesn''t have a copy of your ciphertext from 2^64
versions
> ago.  Of course, a larger counter would be better, if you can fit it in.
Interesting.  If ZFS could make sure no blocks exist in a pool from more
than 2^64-1 transactions ago[*], then the txg + a 32-bit per-transaction
block write counter would suffice.  That way Darren would have to store
just 32 bits of the IV.  That way he''d have 352 bits to work with, and
then it''d be possible to have a 128-bit authentication tag and a
224-bit
hash.

And if later Darren is able to switch to MACing the Merkle roots then
he''d have 352 bits for a hash.

[*] Transactions happen a fairly low rate of about a one every few
    seconds.  At that rate 2^64 transactions means over a trillion years
    before the txg wraps (half a trillion if the rate is 1/sec).
    Therefore ZFS does not need a cleaner service to re-write really old
    blocks.

    If 32 bits for per-transaction block write counters is too low, then
    transaction rate could increase (and arguably would have to
    anyways); even with the fastest flash 2^32 IOPS seems a long way
    away, and there should be enough CPU to jack up the transaction rate
    by then to compensate.  Let''s suppose that we end up with a txg
    per-microsecond: then we get down to a still comfy (though starting
    to push it) 584,542 years before we wrap.

Nico
--

Darren J Moffat

2009-Nov-03 19:19 UTC

head link

Truncating SHA2 hashes vs shortening a MAC for ZFS Crypto

Zooko Wilcox-O''Hearn wrote:> following-up to my own post to clarify something important and add some 
> further ideas
> 
> On Tuesday,2009-11-03, at 9:32 , Zooko Wilcox-O''Hearn wrote:
> 
>> don''t allocate a lot of bits to the MAC tag which is mostly 
>> redundant.  Maybe just allocate 32 bits to it, and think of it as a 
>> double-check that you have the right key and that your AES 
>> implementation is working right.
> 
> Important note: GCM does *not* have the security properties that you 
> expect from a truncated MAC tag: [1, 2].  
I never said anything about truncating the GCM MAC and I wouldn''t do 
that.  With GCM you can choose in the params the size of the MAC.  I 
that is what you mean by truncate though in choosing a short tag though.

The main thing I get from those two references and the GCM spec is: ever 
go below 96 bits of MAC but ideally use 128 bits of MAC.

So that leads me to think that for ZFS given my space restriction this 
is probably the best set of sizes for IV, MAC, cryptographic hash of 
ciphertext:

	96 bit IV (stored in block pointer)
	96 bit MAC (stored in block pointer)
	SHA256 truncated to 192 bits.

 > If you''re relying on the MAC> tag for integrity (i.e., if the SHA256 tag is truncated to be short or 
> if the user is allowed to run with an insecure checksum), then you must 
> use a sufficiently large MAC tag.
They user can''t choose a checksum other than SHA256 if encryption is 
enabled.  In the future when SHA-3 is choosen we will allow that too.

Right, thats the question, how big of a GCM MAC is big enough ?
> It seems like the IV field could be mostly or completely optimized out 
> by generating the IV at runtime from other data which is guaranteed to 
> be unique for this version of this block.  Note that you really should 
> use a unique IV on *every write* of the block -- i.e. for every unique 
> block''s worth of plaintext -- and not re-use the same IV for
successive
> contents of the same block.  Do you already do that?
Yes the IV is unique for every write already.

I have to store the IV because of other features coming in the future. 
Originally I was calculating the IV based on: object set, object, 
blockid and transaction group (unsigned 64bit ints).  I still do 
calculate the IV based on those but it needs to be stored.
> Looking at [3] I don''t see anything that obviously fits the bill. 
The
> Birth Transaction ID uniquely identifies this block as far as I 
> understand, but nothing uniquely identifies this particular version of 
> this block. 
"Version" of the block doesn''t really make sense in ZFS in
that way
because ZFS is copy on write.  Or maybe you can think of the birth 
transaction id as the version because the other things like the object 
set, object, level and block id identify the logical filesystem 
location.  The DVA (Data Virtual Address) is the 128 bit disk location 
but I don''t believe I can use any of that for the IV because in the 
future we will allow the physical location of the logical block to 
change (and we need that to work without the crypto keys present).

--
Darren J Moffat

Darren J Moffat

2009-Nov-03 19:28 UTC

head link

Truncating SHA2 hashes vs shortening a MAC for ZFS Crypto

Nicolas Williams wrote:> Interesting.  If ZFS could make sure no blocks exist in a pool from more
> than 2^64-1 transactions ago[*], then the txg + a 32-bit per-transaction
> block write counter would suffice.  That way Darren would have to store
> just 32 bits of the IV.  That way he''d have 352 bits to work with,
and
> then it''d be possible to have a 128-bit authentication tag and a
224-bit
> hash.
The logical txg (post dedup integration we have physical and logical 
transaction ids) + a 32 bit counter is interesting.   It was actually my 
very first design for IV''s several years ago!

All this assumes that the data encryption key is staying the same - we 
don''t have to go on that assumption with ZFS since I have the means to 
start using a new one for new blocks.  Currently switching to a new data 
encryption key (distinct from changing the wrapping key the user looks 
after) is under the admin/users control but it could be done 
automagically based on time or volume of blocks written.
>     If 32 bits for per-transaction block write counters is too low, then
>     transaction rate could increase (and arguably would have to
>     anyways); even with the fastest flash 2^32 IOPS seems a long way
>     away, and there should be enough CPU to jack up the transaction rate
>     by then to compensate.  Let''s suppose that we end up with a
txg
>     per-microsecond: then we get down to a still comfy (though starting
>     to push it) 584,542 years before we wrap.
I suspect that sometime in the next 584,542 years the block pointer size 
for ZFS will increase and I''ll have more space to store a bigger MAC, 
hash and IV.  In fact I guess that will happen even in the next 50 years.

--
Darren J Moffat

Nicolas Williams

2009-Nov-03 19:36 UTC

head link

Truncating SHA2 hashes vs shortening a MAC for ZFS Crypto

On Tue, Nov 03, 2009 at 07:28:15PM +0000, Darren J Moffat
wrote:> Nicolas Williams wrote:
> >Interesting.  If ZFS could make sure no blocks exist in a pool from
more
> >than 2^64-1 transactions ago[*], then the txg + a 32-bit
per-transaction
> >block write counter would suffice.  That way Darren would have to store
> >just 32 bits of the IV.  That way he''d have 352 bits to work
with, and
> >then it''d be possible to have a 128-bit authentication tag and
a 224-bit
> >hash.
> 
> The logical txg (post dedup integration we have physical and logical 
> transaction ids) + a 32 bit counter is interesting.   It was actually my 
> very first design for IV''s several years ago!
Excellent.
> All this assumes that the data encryption key is staying the same - we 
> don''t have to go on that assumption with ZFS since I have the
means to
> start using a new one for new blocks.  Currently switching to a new data 
> encryption key (distinct from changing the wrapping key the user looks 
> after) is under the admin/users control but it could be done 
> automagically based on time or volume of blocks written.
Not really.  You can change or not change keys, and still, txg+32-bit
counter will give you enough.
> >    If 32 bits for per-transaction block write counters is too low,
then
> >    transaction rate could increase (and arguably would have to
> >    anyways); even with the fastest flash 2^32 IOPS seems a long way
> >    away, and there should be enough CPU to jack up the transaction
rate
> >    by then to compensate.  Let''s suppose that we end up with
a txg
> >    per-microsecond: then we get down to a still comfy (though starting
> >    to push it) 584,542 years before we wrap.
> 
> I suspect that sometime in the next 584,542 years the block pointer size 
> for ZFS will increase and I''ll have more space to store a bigger
MAC,
> hash and IV.  In fact I guess that will happen even in the next 50 years.
Heh.  txg + 32-bit counter == 96-bit IVs sounds like the way to go.

Zooko Wilcox-O''Hearn

2009-Nov-04 05:18 UTC

head link

Truncating SHA2 hashes vs shortening a MAC for ZFS Crypto

[Folks, I don''t seem to be getting messages from zfs-crypto-discuss  
so I am reading the web archives of zfs-crypto-discuss and replying.]

in mail.opensolaris.org/pipermail/zfs-crypto-discuss/2009- 
November/002951.html Darren Moffat wrote:

 > 	SHA256 truncated to 192 bits.

You know, I''ve thought about this sort of thing quite a lot for Tahoe- 
LAFS and there''s a very good reason not to truncate SHA-256 at all.   
That reason is: now you''ve got to do your own cryptanalysis work.   
Suppose open cryptographers publish better and better attacks on  
SHA-256 in the future.  As the attacks get better and better, we''ll  
have to decide how urgent it is to upgrade from SHA-256 to (hopefully  
by that time) SHA-3.  If you''re using a truncation of SHA-256 then  
you might need to jump sooner than other people, and you won''t know  
whether or not this is the case unless you study the attacks yourself!

It is possible that a cryptographer will publish an attack on SHA-256  
which is evaluated as "not a realistic threat", but which *is* a  
realistic threat on SHA-256-trunc-192.  None of the open  
cryptographers will be checking or publicly mentioning whether  
SHA-256-trunc-192 is vulnerable because SHA-256-trunc-192 isn''t on  
their radar.

 > I have to store the IV because of other features coming in the  
future. Originally I was calculating the IV based on: object set,  
object, blockid and transaction group (unsigned 64bit ints).  I still  
do calculate the IV based on those but it needs to be stored.

 > "Version" of the block doesn''t really make sense in ZFS
in that
way because ZFS is copy on write. Or maybe you can think of the birth  
transaction id as the version because the other things like the  
object set, object, level and block id identify the logical  
filesystem location.

I don''t understand the last sentence there.  Does this mean that  
you''ll never be asked to encrypt more than one plaintext under a  
different birth transaction id?  If, so then perfect! -- use the  
birth transaction id as the IV!  What other features coming in the  
future would need to know the IV and would not already know the birth  
transaction id?

Something that I still don''t understand is: why do you have a MAC tag  
at all if you already have a SHA-256 hash of the ciphertext?  David- 
Sarah Hopwood suggested a good reason that you might want one [1],  
but is that your reason?  Because how big the MAC tag needs to be is  
probably determined by why you need it.

Regards,

Zooko Wilcox-O''Hearn

[1] mail-archive.com/cryptography at metzdowd.com/msg11034.html
---
Your cloud storage provider does not need access to your data.
Tahoe-LAFS -- allmydata.org

Darren J Moffat

2009-Nov-04 15:04 UTC

head link

Truncating SHA2 hashes vs shortening a MAC for ZFS Crypto

Zooko Wilcox-O''Hearn wrote:> [Folks, I don''t seem to be getting messages from
zfs-crypto-discuss so I
> am reading the web archives of zfs-crypto-discuss and replying.]
> 
> in 
>
mail.opensolaris.org/pipermail/zfs-crypto-discuss/2009-November/002951.html
> Darren Moffat wrote:
> 
>  >     SHA256 truncated to 192 bits.
> 
> You know, I''ve thought about this sort of thing quite a lot for 
> Tahoe-LAFS and there''s a very good reason not to truncate SHA-256
at
> all.  That reason is: now you''ve got to do your own cryptanalysis
work.
> Suppose open cryptographers publish better and better attacks on SHA-256 
> in the future.  As the attacks get better and better, we''ll have
to
> decide how urgent it is to upgrade from SHA-256 to (hopefully by that 
> time) SHA-3.  If you''re using a truncation of SHA-256 then you
might
> need to jump sooner than other people, and you won''t know whether
or not
> this is the case unless you study the attacks yourself!
That is exactly my concern and why I came here for advice.

I think I''m now convinced that truncating the SHA256 hash would not be
a
good idea even though we do have an additional MAC.
> I don''t understand the last sentence there.  Does this mean that
you''ll
> never be asked to encrypt more than one plaintext under a different 
> birth transaction id?  If, so then perfect! -- use the birth transaction 
> id as the IV!  What other features coming in the future would need to 
> know the IV and would not already know the birth transaction id?
In a given birth transaction (txg) there may be many blocks being 
encrypted under the same key, so the txg alone isn''t enough.  The 
combination of txg, objset and block id are unique - but that is 192 bits.

One of the possible future features that would need access to the IV is 
if we do a version of ''zfs send'' (which takes a ZFS filesystem
and makes
a stream out of it for replication purposes) that transfers the blocks 
as they are on disk (ie compressed and encrypted).  Currently the ''zfs 
send'' works at the DMU layer of ZFS and doesn''t deal in
transactions or
even disk blocks - it deals in DMU objects and the send stream is all 
decrypted and decompressed.   To be able to send ciphertext blocks we 
will need to send the IV to the remote side too.  Which is why we need 
to store the IV rather than calculate it - the remote side won''t be 
putting that ciphertext on disk in the same txg number.   I don''t want 
to do anything now that would make that difficult to do later.
> Something that I still don''t understand is: why do you have a MAC
tag at
> all if you already have a SHA-256 hash of the ciphertext?  David-Sarah 
> Hopwood suggested a good reason that you might want one [1], but is that 
> your reason?  Because how big the MAC tag needs to be is probably 
> determined by why you need it.
The SHA-256 is unkeyed so there would be nothing to stop an attacker 
that can write to the disks but doesn''t know the key from modifying the
on disk ciphertext and all the SHA-256 hashes up to the top of the 
Merkle tree to the uberblock.  That would create a valid ZFS pool but 
the data would have been tampered with.   I don''t see that as an 
acceptable risk.

I can''t make the SHA-256 keyed because there are ZFS operations that we
must be able to perform with the decryption key is not available: 
resilvering a mirror/raidz, disk removal (raid relayout), hotspare, 
scrub (proactive resilver).

By using a MAC we reduce that risk because now the attacker needs to 
forge the MAC and modify the SHA-256 Merkle tree all the way to the 
uberblock as well.  Depending on the type of modification the attacker 
may actually need to forge multiple MAC tags.

A given MAC tag applies to a single ZFS block that is between 512 bytes 
and 128k.

So if I don''t truncate the SHA-256 how big does my MAC need to be given
every ZFS block has its own IV ?

--
Darren J Moffat

Zooko at allmydata.com

2009-Nov-05 05:32 UTC

head link

Truncating SHA2 hashes vs shortening a MAC for ZFS Crypto

On Wednesday,2009-11-04, at 8:04 , Darren J Moffat wrote:
> One of the possible future features that would need access to the  
> IV is if we do a version of ''zfs send'' (which takes a ZFS
> filesystem and makes a stream out of it for replication purposes)  
> that transfers the blocks as they are on disk (ie compressed and  
> encrypted).  Currently the ''zfs send'' works at the DMU
layer of ZFS
> and doesn''t deal in transactions or even disk blocks - it deals in
> DMU objects and the send stream is all decrypted and  
> decompressed.   To be able to send ciphertext blocks we will need  
> to send the IV to the remote side too.  Which is why we need to  
> store the IV rather than calculate it - the remote side won''t be  
> putting that ciphertext on disk in the same txg number.   I don''t
> want to do anything now that would make that difficult to do later.
Is there some information that does come with and then get stored by  
the "zfs recv" side that you could use for (partial) IV?  Maybe an  
objset id?  Anything which is stable through zfs send-and-receive and  
which is unique to this block could be used for (part of) the IV and  
thus shorten the amount of bits you need to store for the IV.

Regards,

Zooko

Zooko Wilcox-O''Hearn

2009-Nov-08 21:02 UTC

head link

Truncating SHA2 hashes vs shortening a MAC for ZFS Crypto

On Wednesday,2009-11-04, at 7:04 , Darren J Moffat wrote:
> The SHA-256 is unkeyed so there would be nothing to stop an  
> attacker that can write to the disks but doesn''t know the key from
> modifying the on disk ciphertext and all the SHA-256 hashes up to  
> the top of the Merkle tree to the uberblock.  That would create a  
> valid ZFS pool but the data would have been tampered with.   I  
> don''t see that as an acceptable risk.
I see.  It is interesting that you and I have different intuitions  
about this.  My intuition is that it is easier to make sure that the  
Merkle Tree root hash wasn''t unauthorizedly changed than to make sure  
that an unauthorized person hasn''t learned a secret.  Is your  
intuition the opposite?  I suppose in different situations either one  
could be true.

Now I better appreciate why you want to use both a secure hash and a  
MAC.  Now I understand the appeal of Nico Williams''s proposal to MAC  
just the root of the tree and not every node of the tree.  That would  
save space in all the non-root nodes but would retain the property  
that you have to both know the secret *and* be able to write to the  
root hash in order to change the filesystem.
> So if I don''t truncate the SHA-256 how big does my MAC need to be
> given every ZFS block has its own IV ?
I don''t know the answer to this question.  I have a hard time  
understanding if the minimum safe size of the MAC is zero (i.e. you  
don''t need it anyway) or a 128 bits (i.e. you rely on the MAC and you  
want 128-bit crypto strength) or something in between.

Regards,

Zooko

Zooko Wilcox-O''Hearn

2009-Nov-27 18:30 UTC

head link

Truncating SHA2 hashes vs shortening a MAC for ZFS Crypto

I wrote this note on 2009-11-04 22:32:34 but I haven''t seen a reply.   
I wanted to make sure that the ZFS crypto engineers thought about  
this, because if there *is* any information which fits the bill then  
it could help.


On Wednesday,2009-11-04, at 8:04 , Darren J Moffat wrote:

> One of the possible future features that would need access to the  
> IV is if we do a version of ''zfs send'' (which takes a ZFS
> filesystem and makes a stream out of it for replication purposes)  
> that transfers the blocks as they are on disk (ie compressed and  
> encrypted).  Currently the ''zfs send'' works at the DMU
layer of ZFS
> and doesn''t deal in transactions or even disk blocks - it deals in
> DMU objects and the send stream is all decrypted and  
> decompressed.   To be able to send ciphertext blocks we will need  
> to send the IV to the remote side too.  Which is why we need to  
> store the IV rather than calculate it - the remote side won''t be  
> putting that ciphertext on disk in the same txg number.   I don''t
> want to do anything now that would make that difficult to do later.
>
Is there some information that does come with and then get stored by  
the "zfs recv" side that you could use for (partial) IV?  Maybe an  
objset id?  Anything which is stable through zfs send-and-receive and  
which is unique to this block could be used for (part of) the IV and  
thus shorten the amount of bits you need to store for the IV.

Regards,

Zooko

Darren J Moffat

2009-Dec-01 13:45 UTC

head link

Truncating SHA2 hashes vs shortening a MAC for ZFS Crypto

Zooko Wilcox-O''Hearn wrote:> I wrote this note on 2009-11-04 22:32:34 but I haven''t seen a
reply.  I
> wanted to make sure that the ZFS crypto engineers thought about this, 
> because if there *is* any information which fits the bill then it could 
> help.
I had had seen it and thought about it.  In fact I was originally using 
information from the zbookmark_t and the txg id.
> On Wednesday,2009-11-04, at 8:04 , Darren J Moffat wrote:
> 
> 
>> One of the possible future features that would need access to the IV 
>> is if we do a version of ''zfs send'' (which takes a
ZFS filesystem and
>> makes a stream out of it for replication purposes) that transfers the 
>> blocks as they are on disk (ie compressed and encrypted).  Currently 
>> the ''zfs send'' works at the DMU layer of ZFS and
doesn''t deal in
>> transactions or even disk blocks - it deals in DMU objects and the 
>> send stream is all decrypted and decompressed.   To be able to send 
>> ciphertext blocks we will need to send the IV to the remote side too.  
>> Which is why we need to store the IV rather than calculate it - the 
>> remote side won''t be putting that ciphertext on disk in the
same txg
>> number.   I don''t want to do anything now that would make that
>> difficult to do later.
>>
> 
> Is there some information that does come with and then get stored by the 
> "zfs recv" side that you could use for (partial) IV?  Maybe an
objset
> id?  Anything which is stable through zfs send-and-receive and which is 
> unique to this block could be used for (part of) the IV and thus shorten 
> the amount of bits you need to store for the IV.
The information in the zbookmark_t + txg works for all cases except for:
	send-recv where the data is sent encrypted
	deduplication

In both of those cases the zbookmark_t and txg will be different.

I''ve settled on 160 bits of SHA256, 96 bit stored IV, and 96 bit 
AuthTag/MAC.

-- 
Darren J Moffat

Zooko Wilcox-O''Hearn

2009-Dec-01 19:27 UTC

head link

Truncating SHA2 hashes vs shortening a MAC for ZFS Crypto

Thanks for the reply.  Too bad you can''t wait for SHA3.  Now
you''ll
have to think about whether new cryptanalytic results against SHA-256  
mean that SHA-256-trunc-160 is vulnerable and if so what effect that  
has on the safety of your scheme.

But, I don''t have a better solution for you, other than Nico  
Williams''s proposal to put a MAC on only the root, which
you''ve
already rejected as being too disruptive of a change at this point.

Regards,

Zooko
---
Your cloud storage provider does not need access to your data.
Tahoe-LAFS -- allmydata.org

Darren J Moffat

2009-Dec-01 21:26 UTC

head link

Truncating SHA2 hashes vs shortening a MAC for ZFS Crypto

Zooko Wilcox-O''Hearn wrote:> Thanks for the reply.  Too bad you can''t wait for SHA3. 
Waiting for SHA-3 means waiting until 2012 and that is totally unrealistic.
> Now you''ll have 
> to think about whether new cryptanalytic results against SHA-256 mean 
> that SHA-256-trunc-160 is vulnerable and if so what effect that has on 
> the safety of your scheme.
That is a risk we have to take but we aren''t dependent on the truncated
SHA-256 for security of the ciphertext the MAC and the trunctated 
SHA2-256 together provides that for us.
> But, I don''t have a better solution for you, other than Nico
Williams''s
> proposal to put a MAC on only the root, which you''ve already
rejected as
> being too disruptive of a change at this point.
It is yes, but it can be investigated for the future - ZFS is versioned 
on disk and we can thus make changes like this.

-- 
Darren J Moffat

Zooko Wilcox-O''Hearn

2009-Dec-14 01:36 UTC

head link

Truncating SHA2 hashes vs shortening a MAC for ZFS Crypto

On Tuesday, 2009-12-01, at 6:45 , Darren J Moffat wrote:
> I''ve settled on 160 bits of SHA256, 96 bit stored IV, and 96 bit  
> AuthTag/MAC.
And this 96-bit stored IV is the only IV, right?  Because auxiliary  
information such as zbookmark_t and txg can''t be used for this purpose?

Have you worked out the birthday paradox consequences for a 96-bit  
IV?  The easy one to calculate is that if you had 2^48 blocks then  
you''d have a 50% chance of IV collision.  I don''t know about
you, but
I would consider 2^48 blocks to be way more than you need to support  
for the forseeable future.  But how many blocks does it take before  
you suffer a 10^-5 chance of IV collision?  How about a 10^-9  
chance?  Anyway, what is your tolerance for a chance of IV collision?

I haven''t worked out the answers to these birthday paradox questions  
yet, but I intend to, with the help of my brother who is a  
statistician, and report back.

Obviously only you can answer the one about what chance of IV  
collision you are comfortable with.

Another question: does this scheme prevent deduplication?  If two  
blocks have identical plaintext, but independent random IVs and  
therefore different ciphertext, then how can the deduper figure out  
that they could be deduped?

(Foreshadowing: I have a crypto hack in mind that could address these  
two issues, if issues they be.)

Regards,

Zooko
---
Your cloud storage provider does not need access to your data.
Tahoe-LAFS -- allmydata.org

David-Sarah Hopwood

2009-Dec-14 03:20 UTC

head link

Truncating SHA2 hashes vs shortening a MAC for ZFS Crypto

Zooko Wilcox-O''Hearn wrote:> On Tuesday, 2009-12-01, at 6:45 , Darren J Moffat wrote:
> 
>> I''ve settled on 160 bits of SHA256, 96 bit stored IV, and 96
bit
>> AuthTag/MAC.
> 
> And this 96-bit stored IV is the only IV, right?  Because auxiliary
> information such as zbookmark_t and txg can''t be used for this
purpose?
> 
> Have you worked out the birthday paradox consequences for a 96-bit IV? 
> The easy one to calculate is that if you had 2^48 blocks then
you''d have
> a 50% chance of IV collision. I don''t know about you, but I would
> consider 2^48 blocks to be way more than you need to support for the
> forseeable future.  But how many blocks does it take before you suffer a
> 10^-5 chance of IV collision?
sqrt(2 * 10^-5) * 2^48 =~ 2^39.7.

(You can do it in your head: nearest power of 2 to 100000 is 2^17, so
that''s roughly 2^(48 - 17/2) = 2^39.5.)
> How about a 10^-9 chance?
sqrt(2 * 10^-9) * 2^48 =~ 2^33.1.
> Anyway, what is your tolerance for a chance of IV collision?
For me, these figures for number of blocks are not high enough, if
the IV is derived randomly. If it is derived in such a way that it''s
guaranteed to be unique, then there is no problem. But note the comment
I made about repeated IVs in CTR-based modes, in the post referenced
below.
> Obviously only you can answer the one about what chance of IV collision
> you are comfortable with.
> 
> Another question: does this scheme prevent deduplication?  If two blocks
> have identical plaintext, but independent random IVs and therefore
> different ciphertext, then how can the deduper figure out that they
> could be deduped?
> 
> (Foreshadowing: I have a crypto hack in mind that could address these
> two issues, if issues they be.)
In <article.gmane.org/gmane.comp.encryption.general/13719>,
I suggested a scheme that would also address both issues (maybe it''s
similar to what you''re thinking of). Ideally you would want a
256-bit-block cipher if you''re going to use that approach for CTR mode,
though.

-- 
David-Sarah Hopwood  ?  davidsarah.livejournal.com

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 292 bytes
Desc: OpenPGP digital signature
URL:
<mail.opensolaris.org/pipermail/zfs-crypto-discuss/attachments/20091214/49b64baf/attachment.bin>

Darren J Moffat

2009-Dec-14 11:53 UTC

head link

Truncating SHA2 hashes vs shortening a MAC for ZFS Crypto

Zooko Wilcox-O''Hearn wrote:> On Tuesday, 2009-12-01, at 6:45 , Darren J Moffat wrote:
> 
>> I''ve settled on 160 bits of SHA256, 96 bit stored IV, and 96
bit
>> AuthTag/MAC.
> 
> And this 96-bit stored IV is the only IV, right?  Because auxiliary 
> information such as zbookmark_t and txg can''t be used for this
purpose?
The 96 bit IV is calculated by hashing some of the fields from the 
zbookmark_t (object, level, block) and the txg.
> Have you worked out the birthday paradox consequences for a 96-bit IV?
GCM mode will GHASH any IV larger than 96 bits down to 96 bits anyway 
and 96 bit is considered the "default" IV size.

For CCM the maximum possible IV (nonce field in the CCM_PARAMS) is 13 
bytes (104 bits) but that limits the data size to 64k which is two low 
to store the current maximum size of ZFS block (128k) we could encrypt 
in one operation. So I have to set the CCM nonce size to 12 bytes (96 
bits) so that the size of data that can be encrypted goes up to 16777215 
bits.  This is the way CCM mode is defined, so again my maximum is a 96 
bit IV.
> Another question: does this scheme prevent deduplication?  If two blocks 
> have identical plaintext, but independent random IVs and therefore 
> different ciphertext, then how can the deduper figure out that they 
> could be deduped?
The IVs aren''t randomly generated, and yes that does prevent 
deduplication if I use an IV derived from the zbookmark_t+txg or a 
random IV.

For deduplication the IV generate is done differently to ensure the 
ciphertext does match, but only when dedup is enabled on those ZFS 
datasets - this means we can dedup with in a dataset and its clones 
(since they by default share data encryption keys_ but not any other 
datasets.

-- 
Darren J Moffat

David-Sarah Hopwood

2009-Dec-14 19:34 UTC

head link

Truncating SHA2 hashes vs shortening a MAC for ZFS Crypto

Darren J Moffat wrote:> Zooko Wilcox-O''Hearn wrote:
>> On Tuesday, 2009-12-01, at 6:45 , Darren J Moffat wrote:
>>
>>> I''ve settled on 160 bits of SHA256, 96 bit stored IV, and
96 bit
>>> AuthTag/MAC.
>>
>> And this 96-bit stored IV is the only IV, right?  Because auxiliary
>> information such as zbookmark_t and txg can''t be used for this
purpose?
> 
> The 96 bit IV is calculated by hashing some of the fields from the
> zbookmark_t (object, level, block) and the txg.
> 
>> Have you worked out the birthday paradox consequences for a 96-bit IV?
> 
> GCM mode will GHASH any IV larger than 96 bits down to 96 bits anyway
> and 96 bit is considered the "default" IV size.
Right, but GCM mode (like other CTR-based modes) is designed under the
assumption that the IV is chosen uniquely. If it''s a hash, then even
when the inputs to the hash are unique, we expect collisions after
sqrt(2*p) * 2^48 blocks with probability p.

The consequence of a collision is to leak the exclusive-or of the
plaintexts of the colliding blocks (which will often be sufficient to
derive both plaintexts). This security level is significantly less than
the design strength of 128-bit AES. For most applications, this weakness
probably wouldn''t matter very much, but the problem is that you
don''t
know what will be stored in the filesystem or how significant a leak
would be.
>> Another question: does this scheme prevent deduplication?  If two
>> blocks have identical plaintext, but independent random IVs and
>> therefore different ciphertext, then how can the deduper figure out
>> that they could be deduped?
> 
> The IVs aren''t randomly generated, and yes that does prevent
> deduplication if I use an IV derived from the zbookmark_t+txg or a
> random IV.
> 
> For deduplication the IV generate is done differently to ensure the
> ciphertext does match, but only when dedup is enabled on those ZFS
> datasets - this means we can dedup with in a dataset and its clones
> (since they by default share data encryption keys_ but not any other
> datasets.
How is the IV derived when dedup is enabled?

-- 
David-Sarah Hopwood  ?  davidsarah.livejournal.com

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 292 bytes
Desc: OpenPGP digital signature
URL:
<mail.opensolaris.org/pipermail/zfs-crypto-discuss/attachments/20091214/5cb7dfec/attachment.bin>

Zooko Wilcox-O''Hearn

2009-Dec-14 21:48 UTC

head link

Truncating SHA2 hashes vs shortening a MAC for ZFS Crypto

On Sunday, 2009-12-13, at 20:20 , David-Sarah Hopwood
wrote:>> But how many blocks does it take before you suffer a 10^-5 chance  
>> of IV collision?
>
> sqrt(2 * 10^-5) * 2^48 =~ 2^39.7.
>
> (You can do it in your head: nearest power of 2 to 100000 is 2^17,  
> so that''s roughly 2^(48 - 17/2) = 2^39.5.)
>
>> How about a 10^-9 chance?
>
> sqrt(2 * 10^-9) * 2^48 =~ 2^33.1.
Hrm, why is this the answer?  Can you explain the math to me?

In any case, assuming you are right then this means that having a  
mere 8 billion blocks in your filesystem would incur a one-in-a- 
billion chance of IV collision.  Hm.  That''s not something I would be  
comfortable with.
>> (Foreshadowing: I have a crypto hack in mind that could address  
>> these two issues, if issues they be.)
>
> In <article.gmane.org/gmane.comp.encryption.general/13719>,  
> I suggested a scheme that would also address both issues (maybe  
> it''s similar to what you''re thinking of). Ideally you
would want a
> 256-bit-block cipher if you''re going to use that approach for CTR
> mode, though.
Ha ha!  Thank you for reminding me of this.  Well, I read your  
proposal when you posted it, but I didn''t really understand it all.   
Then a couple of days ago I woke up in the morning with a clever idea  
in mind for how to improve ZFS crypto, which I alluded to, above.  Re- 
reading your post now it appears that my clever invention is nothing  
but a poor variation of yours.  :-)

So, just like you said, this proposal of mine -- I mean of yours --  
would have several advantages: most importantly, it allows  
deduplication of encrypted blocks and it does so even if dedupe was  
not enabled when that encrypted dataset was created and filled with  
data.  Second most important, it allows 128-bit IVs, which means that  
if you are unwilling to tolerate a 10^-9 chance of IV collision, you  
can raise your dataset size limit from 2^33 blocks (with a 96-bit IV)  
to 2^49 blocks (with a 128-bit IV).

Thirdly, your scheme allows any block cipher mode of operation  
including unauthenticated ones.

And like you said, the major drawback to this is that you have to  
process each block in two passes -- once to compute the MAC and then  
a second time to encrypt.  Since the blocks are small enough to fit  
wholly in RAM, this may not be a significant performance problem in  
practice -- I''m not sure.  If I were making the decision then I would  
measure that.

You mentioned that this would allow a parallelizable computation of  
the encryption, such as by using CTR mode, and that you don''t know of  
any unpatented parallelizable MAC.  However it is easy to make a  
parallelizable MAC: use a Merkle Tree!  For example, generate a  
separate MAC tag over each 4 KiB of the block in a separate thread,  
resulting in 32 MAC tags (if the block were 128 KiB in size).  Then  
concatenate all the MAC tags together and compute a MAC over them.

Regards,

Zooko

David-Sarah Hopwood

2009-Dec-15 04:59 UTC

head link

Truncating SHA2 hashes vs shortening a MAC for ZFS Crypto

Zooko Wilcox-O''Hearn wrote:> On Sunday, 2009-12-13, at 20:20 , David-Sarah Hopwood wrote:
>>> But how many blocks does it take before you suffer a 10^-5 chance
of
>>> IV collision?
>>
>> sqrt(2 * 10^-5) * 2^48 =~ 2^39.7.
>>
>> (You can do it in your head: nearest power of 2 to 100000 is 2^17, so
>> that''s roughly 2^(48 - 17/2) = 2^39.5.)
>>
>>> How about a 10^-9 chance?
>>
>> sqrt(2 * 10^-9) * 2^48 =~ 2^33.1.
> 
> Hrm, why is this the answer?  Can you explain the math to me?
Given n random values drawn from a discrete uniform distribution with
d >= n elements, let p be the probability that at least two values are
the same. Then 1 - p is the probability that all values are distinct.
When choosing the (k+1)th value, k values have already been chosen,
so the probability that this value will be distinct from those already
chosen is 1 - k/d. Therefore

       1 - p   =  product{k = 1..n-1}(1 - k/d)   for n <= d

Apply the approximation 1 ? x =~ e^-x for each x = k/d:

       1 - p   =~ product{k = 1..n-1}(e^(-k/d))
                    = e^sum{k = 1..n-1}(-k/d)
                    = e^-(n(n-1)/2d)

    ln(1 - p)  =~ -n(n-1)/2d
 ln(1/(1 - p)) =~ n(n-1)/2d
           n^2 =~ 2d ln(1/(1 - p))         [when d >> n]
             n =~ sqrt(2d ln(1/(1 - p)))

Apply 1 - p =~ e^-p for small p, therefore ln(1/(1 - p)) =~ p:

             n =~ sqrt(2dp)
                    = sqrt(2p) sqrt(d)

This doesn''t tell us how close the approximation is. In fact it is
only about 18% out for p = 1/2, and more accurate for all smaller p.
> In any case, assuming you are right then this means that having a mere 8
> billion blocks in your filesystem would incur a one-in-a-billion chance
> of IV collision.  Hm.  That''s not something I would be comfortable
with.
Me neither -- at least not for a general-purpose system where you don''t
know the sensitivity of the data.
>> In
<article.gmane.org/gmane.comp.encryption.general/13719>, I
>> suggested a scheme that would also address both issues (maybe
it''s
>> similar to what you''re thinking of). Ideally you would want a
>> 256-bit-block cipher if you''re going to use that approach for
CTR
>> mode, though.
> 
> Ha ha!  Thank you for reminding me of this.  Well, I read your proposal
> when you posted it, but I didn''t really understand it all.  Then a
> couple of days ago I woke up in the morning with a clever idea in mind
> for how to improve ZFS crypto, which I alluded to, above.  Re-reading
> your post now it appears that my clever invention is nothing but a poor
> variation of yours.  :-)
This has happened to me many times :-) (You very often need to reinvent
something in order to see that it''s possible, and know what to look for
in previous research.)
> So, just like you said, this proposal of mine -- I mean of yours --
> would have several advantages: most importantly, it allows deduplication
> of encrypted blocks and it does so even if dedupe was not enabled when
> that encrypted dataset was created and filled with data.
Oh, I hadn''t spotted that.
> Second most
> important, it allows 128-bit IVs, which means that if you are unwilling
> to tolerate a 10^-9 chance of IV collision, you can raise your dataset
> size limit from 2^33 blocks (with a 96-bit IV) to 2^49 blocks (with a
> 128-bit IV).
> 
> Thirdly, your scheme allows any block cipher mode of operation including
> unauthenticated ones.
> 
> And like you said, the major drawback to this is that you have to
> process each block in two passes -- once to compute the MAC and then a
> second time to encrypt.  Since the blocks are small enough to fit wholly
> in RAM, this may not be a significant performance problem in practice --
> I''m not sure.  If I were making the decision then I would measure
that.
> 
> You mentioned that this would allow a parallelizable computation of the
> encryption, such as by using CTR mode, and that you don''t know of
any
> unpatented parallelizable MAC.  However it is easy to make a
> parallelizable MAC: use a Merkle Tree! For example, generate a separate
> MAC tag over each 4 KiB of the block in a separate thread, resulting in
> 32 MAC tags (if the block were 128 KiB in size).  Then concatenate all
> the MAC tags together and compute a MAC over them.
D''oh. Obvious when you point it out. That''s simple enough that
it''s
probably worth doing it that way, even if current machines can''t take
full advantage because of threading overheads.

-- 
David-Sarah Hopwood  ?  davidsarah.livejournal.com

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 292 bytes
Desc: OpenPGP digital signature
URL:
<mail.opensolaris.org/pipermail/zfs-crypto-discuss/attachments/20091215/c4d1950a/attachment.bin>

David-Sarah Hopwood

2009-Dec-15 05:11 UTC

head link

Truncating SHA2 hashes vs shortening a MAC for ZFS Crypto

Darren J Moffat wrote:> Zooko Wilcox-O''Hearn wrote:
>> Have you worked out the birthday paradox consequences for a 96-bit IV?
> 
> GCM mode will GHASH any IV larger than 96 bits down to 96 bits anyway
> and 96 bit is considered the "default" IV size.
Note that GCM mode should never be used with an IV other than 96 bits,
because of the weakness described in section 2.4 of
<csrc.nist.gov/groups/ST/toolkit/BCM/documents/comments/CWC-GCM/Ferguson2.pdf>.

-- 
David-Sarah Hopwood  ?  davidsarah.livejournal.com

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 292 bytes
Desc: OpenPGP digital signature
URL:
<mail.opensolaris.org/pipermail/zfs-crypto-discuss/attachments/20091215/229faf3c/attachment.bin>

David-Sarah Hopwood

2009-Dec-15 05:52 UTC

head link

Truncating SHA2 hashes vs shortening a MAC for ZFS Crypto

David-Sarah Hopwood wrote:> Darren J Moffat wrote:
>> Zooko Wilcox-O''Hearn wrote:
>>> Have you worked out the birthday paradox consequences for a 96-bit
IV?
>> GCM mode will GHASH any IV larger than 96 bits down to 96 bits anyway
>> and 96 bit is considered the "default" IV size.
> 
> Note that GCM mode should never be used with an IV other than 96 bits,
> because of the weakness described in section 2.4 of
>
<csrc.nist.gov/groups/ST/toolkit/BCM/documents/comments/CWC-GCM/Ferguson2.pdf>.
Also, section 3 of
<csrc.nist.gov/groups/ST/toolkit/BCM/documents/Joux_comments.pdf>
describes an attack against GCM with repeated IV when the attacker can
obtain more than one collision (but still only a small number of
collisions).

-- 
David-Sarah Hopwood  ?  davidsarah.livejournal.com

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 292 bytes
Desc: OpenPGP digital signature
URL:
<mail.opensolaris.org/pipermail/zfs-crypto-discuss/attachments/20091215/0202fd71/attachment.bin>

Darren J Moffat

2009-Dec-15 10:04 UTC

head link

Truncating SHA2 hashes vs shortening a MAC for ZFS Crypto

David-Sarah Hopwood wrote:> Darren J Moffat wrote:
>> Zooko Wilcox-O''Hearn wrote:
>>> Have you worked out the birthday paradox consequences for a 96-bit
IV?
>> GCM mode will GHASH any IV larger than 96 bits down to 96 bits anyway
>> and 96 bit is considered the "default" IV size.
> 
> Note that GCM mode should never be used with an IV other than 96 bits,
> because of the weakness described in section 2.4 of
>
<csrc.nist.gov/groups/ST/toolkit/BCM/documents/comments/CWC-GCM/Ferguson2.pdf>.
In ZFS crypto the IV for GCM is always 96 bits for this reason.

-- 
Darren J Moffat

Darren J Moffat

2009-Dec-15 10:06 UTC

head link

Truncating SHA2 hashes vs shortening a MAC for ZFS Crypto

David-Sarah Hopwood wrote:>> For deduplication the IV generate is done differently to ensure the
>> ciphertext does match, but only when dedup is enabled on those ZFS
>> datasets - this means we can dedup with in a dataset and its clones
>> (since they by default share data encryption keys_ but not any other
>> datasets.
> 
> How is the IV derived when dedup is enabled?
An HMAC (using a different per filesystem key from the dataset 
encryption key) of the plaintext.  This allows for deduplication when 
they data encryption keys match, ie within the same ZFS filesystem (but 
not child filesystems) or a clone of it.  At least until someone runs 
''zfs key -K'' on the filesystem to start using a new data
encryption key.

-- 
Darren J Moffat

Darren J Moffat

2009-Dec-15 10:10 UTC

head link

Truncating SHA2 hashes vs shortening a MAC for ZFS Crypto

Darren J Moffat wrote:> David-Sarah Hopwood wrote:
>>> For deduplication the IV generate is done differently to ensure the
>>> ciphertext does match, but only when dedup is enabled on those ZFS
>>> datasets - this means we can dedup with in a dataset and its clones
>>> (since they by default share data encryption keys_ but not any
other
>>> datasets.
>>
>> How is the IV derived when dedup is enabled?
> 
> An HMAC (using a different per filesystem key from the dataset 
> encryption key) of the plaintext.  This allows for deduplication when 
> they data encryption keys match, ie within the same ZFS filesystem (but 
> not child filesystems) or a clone of it.  At least until someone runs 
> ''zfs key -K'' on the filesystem to start using a new data
encryption key.
The reason an HMAC is used rather than a hash is to reduce the likely 
hood of IV precomputation of known plaintexts.  The small additional 
overhead of doing an HMAC over a hash is worth the extra computation in 
my opinion.

-- 
Darren J Moffat

Zooko Wilcox-O''Hearn

2009-Dec-15 15:54 UTC

head link

Truncating SHA2 hashes vs shortening a MAC for ZFS Crypto

On Monday, 2009-12-14, at 4:53 , Darren J Moffat wrote:
> GCM mode will GHASH any IV larger than 96 bits down to 96 bits  
> anyway and 96 bit is considered the "default" IV size.
That''s interesting.  That means if you have a deterministic way to  
generate unique IVs, such as transaction counters and block  
identifiers and so on, and they generate guaranteed-unique 96-bit IVs  
and you give them to GCM then you''re good.  But if your deterministic  
method generates 128-bit guaranteed-unique IVs and you give those to  
GCM then you suffer from the birthday paradox and your security might  
fail at large scale.  That seems like a flaw in the GCM design that  
it would surprise the user like that.

Regards,

Zooko
---
Your cloud storage provider does not need access to your data.
Tahoe-LAFS -- allmydata.org

Reasonably Related Threads

Search for more seemingly similar threads

zfs crypto discuss - Oct 2009 - Truncating SHA2 hashes vs shortening a MAC for ZFS Crypto

Truncating SHA2 hashes vs shortening a MAC for ZFS Crypto

Truncating SHA2 hashes vs shortening a MAC for ZFS Crypto

Truncating SHA2 hashes vs shortening a MAC for ZFS Crypto

Truncating SHA2 hashes vs shortening a MAC for ZFS Crypto

Truncating SHA2 hashes vs shortening a MAC for ZFS Crypto

Truncating SHA2 hashes vs shortening a MAC for ZFS Crypto

Truncating SHA2 hashes vs shortening a MAC for ZFS Crypto

Truncating SHA2 hashes vs shortening a MAC for ZFS Crypto

Truncating SHA2 hashes vs shortening a MAC for ZFS Crypto

Truncating SHA2 hashes vs shortening a MAC for ZFS Crypto

Truncating SHA2 hashes vs shortening a MAC for ZFS Crypto

Truncating SHA2 hashes vs shortening a MAC for ZFS Crypto

Truncating SHA2 hashes vs shortening a MAC for ZFS Crypto

Truncating SHA2 hashes vs shortening a MAC for ZFS Crypto

Truncating SHA2 hashes vs shortening a MAC for ZFS Crypto

Truncating SHA2 hashes vs shortening a MAC for ZFS Crypto

Truncating SHA2 hashes vs shortening a MAC for ZFS Crypto

Truncating SHA2 hashes vs shortening a MAC for ZFS Crypto

Truncating SHA2 hashes vs shortening a MAC for ZFS Crypto

Truncating SHA2 hashes vs shortening a MAC for ZFS Crypto

Truncating SHA2 hashes vs shortening a MAC for ZFS Crypto

Truncating SHA2 hashes vs shortening a MAC for ZFS Crypto

Truncating SHA2 hashes vs shortening a MAC for ZFS Crypto

Truncating SHA2 hashes vs shortening a MAC for ZFS Crypto

Truncating SHA2 hashes vs shortening a MAC for ZFS Crypto

Truncating SHA2 hashes vs shortening a MAC for ZFS Crypto

Truncating SHA2 hashes vs shortening a MAC for ZFS Crypto

Truncating SHA2 hashes vs shortening a MAC for ZFS Crypto

Truncating SHA2 hashes vs shortening a MAC for ZFS Crypto

Truncating SHA2 hashes vs shortening a MAC for ZFS Crypto

Truncating SHA2 hashes vs shortening a MAC for ZFS Crypto

Reasonably Related Threads