Darren J Moffat
2009-Oct-30 17:30 UTC
Truncating SHA2 hashes vs shortening a MAC for ZFS Crypto
For the encryption functionality in the ZFS filesystem we use AES in CCM or GCM mode at the block level to provide confidentiality and authentication. There is also a SHA256 checksum per block (of the ciphertext) that forms a Merkle tree of all the blocks in the pool. Note that I have to store the full IV in the block. A block here is a ZFS block which is any power of two from 512 bytes to 128k (the default). The SHA256 checksums are used even for blocks in the pool that aren''t encrypted and are used for detecting and repairing (resilvering) block corruption. Each filesystem in the pool has its own wrapping key and data encryption keys. Due to some unchangeable constraints I have only 384 bits of space to fit in all of: IV, MAC (CCM or GCM Auth Tag), and the SHA256 checksum, which best case would need about 480 bits. Currently I have Option 1 below but I the truncation of SHA256 down to 128 bits makes me question if this is safe. Remember the SHA256 is of the ciphertext and is used for resilvering. Option 1 -------- IV 96 bits (the max CCM allows given the other params) MAC 128 bits Checksum SHA256 truncated to 128 bits Other options are: Option 2 -------- IV 96 bits MAC 128 bits Checksum SHA224 truncated to 128 bits Basically if I have to truncate to 128 bits is it better to do it against SHA224 or SHA256 ? Option 3 -------- IV 96 bits MAC 128 bits Checksum SHA224 or SHA256 truncated to 160 bits Obviously better than the 1 and 2 but how much better ? The reason it isn''t used just now is because it is slightly harder to layout given other constrains in where the data lives. Option 4 -------- IV 96 bits MAC 32 bits Checksum SHA256 at full 256 bits I''m pretty sure the size of the MAC is far to small. Option 5 -------- IV 96 bits MAC 64 bits Checksum SHA224 at full 224 bits This feels like the best compromise, but is it ? Option 6 -------- IV 96 bits MAC 96 bits Checksum SHA224 or SHA256 truncated to 192 bits -- Darren J Moffat
Zooko Wilcox-O''Hearn
2009-Nov-02 05:33 UTC
Truncating SHA2 hashes vs shortening a MAC for ZFS Crypto
Dear Darren J Moffat: I don''t understand why you need a MAC when you already have the hash of the ciphertext. Does it have something to do with the fact that the checksum is non-cryptographic by default (docs.sun.com/app docs/doc/819-5461/ftyue?a=view ), and is that still true? Your original design document [1] said you needed a way to force the checksum to be SHA-256 if encryption was turned on. But back then you were planning to support non-authenticating modes like CBC. I guess once you dropped non-authenticating modes then you could relax that requirement to force the checksum to be secure. Too bad, though! Not only are you now tight on space in part because you have two integrity values where one ought to do, but also a secure hash of the ciphertext is actually stronger than a MAC! A secure hash of the ciphertext tells whether the ciphertext is right (assuming the hash function is secure and implemented correctly). Given that the ciphertext is right, then the plaintext is right (given that the encryption is implemented correctly and you use the right decryption key). A MAC on the plaintext tells you only that the plaintext was chosen by someone who knew the key. See what I mean? A MAC can''t be used to give someone the ability to read some data while withholding from them the ability to alter that data. A secure hash can. One of the founding ideas of the whole design of ZFS was end-to-end integrity checking. It does that successfully now, for the case of accidents, using large checksums. If the checksum is secure then it also does it for the case of malice. In contrast a MAC doesn''t do "end-to-end" integrity checking. For example, if you''ve previously allowed someone to read a filesystem (i.e., you''ve given them access to the key), but you never gave them permission to write to it, but they are able to exploit the isses that you mention at the beginning of [1] such as "Untrusted path to SAN", then the MAC can''t stop them from altering the file, nor can the non-secure checksum, but a secure hash can (provided that they can''t overwrite all the way up the Merkle Tree of the whole pool and any copies of the Merkle Tree root hash). Likewise, a secure hash can be relied on as a dedupe tag *even* if someone with malicious intent may have slipped data into the pool. An insecure hash or a MAC tag can''t -- a malicious actor could submit data which would cause a collision in an insecure hash or a MAC tag, causing tag-based dedupe to mistakenly unify two different blocks. So, since you''re tight on space, it would be really nice if you could tell your users to use a secure hash for the checksum and then allocate more space to the secure hash value and less space to the now-unnecessary MAC tag. :-) Anyway, if this is the checksum which is used for dedupe then remember the birthday so-called paradox -- some people may be uncomfortable with the prospect of not being able to safely dedupe their 2^64-block storage pool if the hash is only 128 bits, for example. :-) Maybe you could include the MAC tag in the dedupe comparison. Also, the IVs for GCM don''t need to be random, they need only to be unique. Can you use a block number and birth number or other such guaranteed-unique data instead of storing an IV? (Apropos recent discussion on the cryptography list [2].) Regards, Zooko [1] hub.opensolaris.org/bin/download/Project+zfs-crypto files/zfs%2Dcrypto%2Ddesign.pdf [2] mail-archive.com/cryptography at metzdowd.com/msg11020.html --- Your cloud storage provider does not need access to your data. Tahoe-LAFS -- allmydata.org
Alexander Klimov
2009-Nov-02 07:45 UTC
Truncating SHA2 hashes vs shortening a MAC for ZFS Crypto
On Fri, 30 Oct 2009, Darren J Moffat wrote:> The SHA256 checksums are used even for blocks in the pool that aren''t > encrypted and are used for detecting and repairing (resilvering) block > corruption. Each filesystem in the pool has its own wrapping key and > data encryption keys. > > Due to some unchangeable constraints I have only 384 bits of space to > fit in all of: IV, MAC (CCM or GCM Auth Tag), and the SHA256 checksum, > which best case would need about 480 bits. > > Currently I have Option 1 below but I the truncation of SHA256 down to > 128 bits makes me question if this is safe. Remember the SHA256 is of > the ciphertext and is used for resilvering.If you use hash only to protect against non-malicious corruptions, when why you use SHA-2? Would not MD5 or even CRC be enough? -- Regards, ASK
Hi Darren, On Fri, Oct 30, 2009 at 11:30 AM, Darren J Moffat <Darren.Moffat at sun.com>wrote:> For the encryption functionality in the ZFS filesystem we use AES in CCM or > GCM mode at the block level to provide confidentiality and authentication. > There is also a SHA256 checksum per block (of the ciphertext) that forms a > Merkle tree of all the blocks in the pool. Note that I have to store the > full IV in the block. A block here is a ZFS block which is any power of > two from 512 bytes to 128k (the default). > > The SHA256 checksums are used even for blocks in the pool that aren''t > encrypted and are used for detecting and repairing (resilvering) block > corruption. Each filesystem in the pool has its own wrapping key and data > encryption keys. > > Due to some unchangeable constraints I have only 384 bits of space to fit > in all of: IV, MAC (CCM or GCM Auth Tag), and the SHA256 checksum, which > best case would need about 480 bits. > > Currently I have Option 1 below but I the truncation of SHA256 down to 128 > bits makes me question if this is safe. Remember the SHA256 is of the > ciphertext and is used for resilvering. > > Option 1 > -------- > IV 96 bits (the max CCM allows given the other params) > MAC 128 bits > Checksum SHA256 truncated to 128 bits > >I personally like the default option 1. All the others have various uglinesses. SHA-224 has patent issues (see US patent 6829355<v3.espacenet.com/textdoc?DB=EPODOC&IDX=US6829355>). It''s really identical to SHA-256 except that it uses a different initial value and truncates to 224 bits. I would love to see SHA-224 completely disappear. Cryptographers will all have different opinions about how big a MAC (i.e., cryptographic integrity check) should be, but my take on it is to ask how big of a CRC would you need in a non-adversarial environment to meet the undetectable error rate specified within the system, and then use that for the minimum size of the MAC. For tape drives I''ve worked on, this was typically somewhere around 1 undetected error in 10^27 bits. If you protect 1 data bit, then you''d roughly need an 90 bit CRC, which you could round up to 96-bits. Anything more than 96 bits in my opinion is somewhat overkill. I''d pick a CCM mac of either 96 bits or 128. For hashing, it''s a little different since you have to worry about the birthday paradox. The size of the hashing output depends on the undetectable error rate of the system, along with the maximum number of candidate plaintexts that an adversary could create in finding a hash collision. Most cryptographers (not knowing more about the system) would be conservative and say something like "Use the full 256-bits of SHA-256 to get a minimum of 128-bits of security", but realistically for this system, that would be way overkill. There''s already a 128-bit CCM MAC to fall back to, so here again (given the other safety nets in the system), I think that a 128-bit truncated SHA-256 has would be plenty of assurance for the system. -- Thanks! Matt Ball, Chair, IEEE P1619 Security in Storage Working Group Staff Engineer, Sun Microsystems, Inc. 500 Eldorado Blvd, Bldg #5 BRM05-212, Broomfield, CO 80021 Work: 303-272-7580, Cell: 303-717-2717 -------------- next part -------------- An HTML attachment was scrubbed... URL: <mail.opensolaris.org/pipermail/zfs-crypto-discuss/attachments/20091102/8ccf90d2/attachment.html>
Nicolas Williams
2009-Nov-02 16:39 UTC
Truncating SHA2 hashes vs shortening a MAC for ZFS Crypto
On Sun, Nov 01, 2009 at 10:33:34PM -0700, Zooko Wilcox-O''Hearn wrote:> I don''t understand why you need a MAC when you already have the hash > of the ciphertext. Does it have something to do with the fact that > the checksum is non-cryptographic by default (docs.sun.com/app > docs/doc/819-5461/ftyue?a=view ), and is that still true? Your > original design document [1] said you needed a way to force the > checksum to be SHA-256 if encryption was turned on. But back then > you were planning to support non-authenticating modes like CBC. I > guess once you dropped non-authenticating modes then you could relax > that requirement to force the checksum to be secure.[Not speaking for Darren...] No, the requirement to use a strong hash remains, but since the hash would be there primarily for protection against errors, I don''t the requirement for a strong hash is really needed.> Too bad, though! Not only are you now tight on space in part because > you have two integrity values where one ought to do, but also a > secure hash of the ciphertext is actually stronger than a MAC! A > secure hash of the ciphertext tells whether the ciphertext is right > (assuming the hash function is secure and implemented correctly). > Given that the ciphertext is right, then the plaintext is right > (given that the encryption is implemented correctly and you use the > right decryption key). A MAC on the plaintext tells you only that > the plaintext was chosen by someone who knew the key. See what I > mean? A MAC can''t be used to give someone the ability to read some > data while withholding from them the ability to alter that data. A > secure hash can.Users won''t actually get the data keys, only the data key wrapping keys. Users who can read the disk and find the wrapped keys and know the wrapping keys can find the actual data keys, of course, but add in a host key that the user can''t read and now the user cannot recover their data keys. One goal is to protect a system against its users, but another is to protect user data against maliciou modification by anyone else. A MAC provides the first kind of protection if the user can''t access the data keys, and a MAC provides the second kind of protection if the data keys can be kept secret.> One of the founding ideas of the whole design of ZFS was end-to-end > integrity checking. It does that successfully now, for the case of > accidents, using large checksums. If the checksum is secure then it > also does it for the case of malice. In contrast a MAC doesn''t do > "end-to-end" integrity checking. For example, if you''ve previously > allowed someone to read a filesystem (i.e., you''ve given them access > to the key), but you never gave them permission to write to it, but > they are able to exploit the isses that you mention at the beginning > of [1] such as "Untrusted path to SAN", then the MAC can''t stop them > from altering the file, nor can the non-secure checksum, but a secure > hash can (provided that they can''t overwrite all the way up the > Merkle Tree of the whole pool and any copies of the Merkle Tree root > hash).I think we have to assume that an attacker can write to any part of the pool, including the Merkle tree roots. It''d be odd to assume that the attacker can write anywhere but there -- there''s nothing to make it so! I.e., we have to at least authenticate the Merkle tree roots. That still means depending on collision resistance of the hash function for security. If we authenticate every block we don''t have that dependence (I''ll come back to this). The interesting thing here is that we want the hash _and_ the MAC, not just the MAC. The reason is that we want block pointers (which include the {IV, MAC, hash} for the block being pointed to) to be visible to the layer below the filesystem, so that we can scrub/resilver and evacuate devices from a pool (meaning: re-write all the block pointers point to blocks on the evacuated devices so that they point elsewhere) even without having the data keys at hand (more on this below). We could MAC the Merkle tree roots alone, thus alleviating the space situation in the block pointer structure (and also saving precious CPU cycles). But interestingly we wouldn''t alleviate it that much! We need to store a 96-bit IV, and if we don''t MAC every block then we''ll want the strongest hash we can use, so we''ll need at least another 256 bits, for a total of 352 bits of the 384 that we have to play with. Whereas if we MAC every block we might store a 96-bit IV, a 128-bit authentication tag and 160-bit hash, using all 384 bits. You get more collision resistance from an N-bit MAC than from a hash of the same length. That''s because in the MAC case the forger can''t check the forgery without knowing the key, while in the hash case the attacker can verify that some contents collides with another''s hash. In the MAC case an attacker that hasn''t broken the MAC/key must wait until the system reads the modified block(s) to determine if his/her guess was correct. So a 128-bit MAC provides more protection than a 160-bit hash, and about as much as a 256-bit hash. If we remove the MAC then the hash has to grow longer to compensate, thus the space gained by not including the MAC is minimal, possibly zero. If we MAC every block then we don''t need the hash function for security purposes: its main role would still be to provide integrity protection against errors for scrubbing and resilvering when keys are unavailable. The hash would continue to provide end-to-end integrity protection against errors. The hash would add _some_ security value though: not only must an attacker seeking to modify data forge the right MAC for the new contents, they must also find a hash collision (and they must do this all the way up the Merkle tree).> Likewise, a secure hash can be relied on as a dedupe tag *even* if > someone with malicious intent may have slipped data into the pool.For dedup you want to compare block contents on hash equality. That''s what ZFS will do. That defeats your attack on dedup.> Also, the IVs for GCM don''t need to be random, they need only to be > unique. Can you use a block number and birth number or other such > guaranteed-unique data instead of storing an IV? (Apropos recent > discussion on the cryptography list [2].)The block address can''t be used: a blkptr_t actually stores 1-3 actual block addresses, but these can change if a block is relocated. I think the notion that all encrypted/authenticated filesystems need not be logged in in order to perform certain pool operations is both, very useful and rather odd. Odd because once a filesystem is logged in, an all-powerful administrator could either learn its keys or, if the system were using a token to avoid this, the admin could abuse those keys -- the sysadmin remains so powerful that trying to protect users against the sysadmin seems like a waste of resources. But the ability to perform some pool operations without having the keys is still useful: the sysadmin is a user, after all, and might not be around. Think of a SAN operator reconfiguring pools without having to have the keys to the datasets on those pools. Nico --
Zooko Wilcox-O''Hearn
2009-Nov-03 16:32 UTC
Truncating SHA2 hashes vs shortening a MAC for ZFS Crypto
[adding cc: zfs-crypto-discuss at opensolaris.org] David-Sarah: Yes, a secure hash of the plaintext might give better assurance than a secure hash of the ciphertext, because the implementation of the cipher could be buggy or because the decryption key could be wrong. The latter problem could perhaps be addressed by appending the encryption key to the plaintext before encryption. But my point was about something else: that hashes are actually sometimes more robust than MACs from a security engineering standpoint even though MACs are much stronger than secure hashes from a crypto standpoint. I think your reply best summarized what I was trying to say: On Monday,2009-11-02, at 23:31 , David-Sarah Hopwood wrote:> Right. If hashes are used instead of MACs, then the integrity of > the system does not depend on keeping secrets. It only depends on > preventing the attacker from modifying the root of the Merkle tree. > One consequence of this is that if there are side-channel attacks > against the implementations of crypto algorithms, there is no > information that they can leak to an attacker that would allow > compromising integrity.Yes, and in addition to side-channel attacks and theft of the key, there is also the simple fact that with a secure hash you can give a person or process the ability to verify the integrity of data without thereby giving them the ability to forge data. With a MAC, you can''t. The way this might be relevant to ZFS is that they have these constraints on how much space they have to store crypto material, and they have these issues about integrity and about dedupe, and they *already have* a SHA-256 hash of the ciphertext! So it would seem to me that they should leverage that powerful feature that they already have: don''t allocate a lot of bits to the MAC tag which is mostly redundant. Maybe just allocate 32 bits to it, and think of it as a double-check that you have the right key and that your AES implementation is working right. Also, of course, require that the checksum is SHA-256 and not one of the faster, insecure checksums. Also encourage users (as Jeff Bonwick has already done on his blog [1]) to set dedupe to act solely on hash tags and not do a full comparison of block data.> (Of course, the integrity of the OS also needs to be protected. One > way of doing that would be to have a TPM, or the same hardware that > is used for crypto, store the root hash of the Merkle tree and also > the hash of a boot loader that supports ZFS. Then the boot loader > would load an OS from the ZFS filesystem, and only that OS would be > permitted to update the ZFS root hash.)Wow -- that is a good idea! Regards, Zooko [1] blogs.sun.com/bonwick/en_US/entry/zfs_dedup --- Your cloud storage provider does not need access to your data. Tahoe-LAFS -- allmydata.org
Zooko Wilcox-O''Hearn
2009-Nov-03 17:12 UTC
Truncating SHA2 hashes vs shortening a MAC for ZFS Crypto
following-up to my own post to clarify something important and add some further ideas On Tuesday,2009-11-03, at 9:32 , Zooko Wilcox-O''Hearn wrote:> don''t allocate a lot of bits to the MAC tag which is mostly > redundant. Maybe just allocate 32 bits to it, and think of it as a > double-check that you have the right key and that your AES > implementation is working right.Important note: GCM does *not* have the security properties that you expect from a truncated MAC tag: [1, 2]. If you''re relying on the MAC tag for integrity (i.e., if the SHA256 tag is truncated to be short or if the user is allowed to run with an insecure checksum), then you must use a sufficiently large MAC tag. It seems like the IV field could be mostly or completely optimized out by generating the IV at runtime from other data which is guaranteed to be unique for this version of this block. Note that you really should use a unique IV on *every write* of the block -- i.e. for every unique block''s worth of plaintext -- and not re-use the same IV for successive contents of the same block. Do you already do that? Looking at [3] I don''t see anything that obviously fits the bill. The Birth Transaction ID uniquely identifies this block as far as I understand, but nothing uniquely identifies this particular version of this block. So maybe you could make the IV be the (64-bit) Birth Transaction ID plus a 64-bit counter which gets incremented on every write and is stored in the place where you are currently storing an IV. That counter could roll-over, in the hopes that someone who steals your ciphertext and wants to learn something about your plaintext doesn''t have a copy of your ciphertext from 2^64 versions ago. Of course, a larger counter would be better, if you can fit it in. Regards, Zooko [1] csrc.nist.gov/groups/ST/toolkit/BCM/documents/comments/CWC- GCM/Ferguson2.pdf [2] csrc.nist.gov/groups/ST/toolkit/BCM/documents/comments/CWC- GCM/gcm-update.pdf [3] opensolaris.org/os/community/zfs/docs/ondiskformat0822.pdf --- Your cloud storage provider does not need access to your data. Tahoe-LAFS -- allmydata.org
Nicolas Williams
2009-Nov-03 18:21 UTC
Truncating SHA2 hashes vs shortening a MAC for ZFS Crypto
On Tue, Nov 03, 2009 at 10:12:06AM -0700, Zooko Wilcox-O''Hearn wrote:> following-up to my own post to clarify something important and add > some further ideas > > On Tuesday,2009-11-03, at 9:32 , Zooko Wilcox-O''Hearn wrote: > > >don''t allocate a lot of bits to the MAC tag which is mostly > >redundant. Maybe just allocate 32 bits to it, and think of it as a > >double-check that you have the right key and that your AES > >implementation is working right. > > Important note: GCM does *not* have the security properties that you > expect from a truncated MAC tag: [1, 2]. If you''re relying on the > MAC tag for integrity (i.e., if the SHA256 tag is truncated to be > short or if the user is allowed to run with an insecure checksum), > then you must use a sufficiently large MAC tag.Exactly. I proposed to Darren that he MAC only the Merkle tree roots, and he rejected that as too big a change at this point. That leaves him with the MAC/hash size trade-off. Therefore my recommendation then is to truncate only the hash. Yes, that means that you''ll want to enable dedup block match verification.> It seems like the IV field could be mostly or completely optimized > out by generating the IV at runtime from other data which is > guaranteed to be unique for this version of this block. Note that > you really should use a unique IV on *every write* of the block -- > i.e. for every unique block''s worth of plaintext -- and not re-use > the same IV for successive contents of the same block. Do you > already do that?Note that blocks can be relocated when dataset keys are not available, which means the IV cannot be constructed from block addresses, for example.> Looking at [3] I don''t see anything that obviously fits the bill. > The Birth Transaction ID uniquely identifies this block as far as I > understand, but nothing uniquely identifies this particular version > of this block. So maybe you could make the IV be the (64-bit) Birth > Transaction ID plus a 64-bit counter which gets incremented on every > write and is stored in the place where you are currently storing an > IV. That counter could roll-over, in the hopes that someone who > steals your ciphertext and wants to learn something about your > plaintext doesn''t have a copy of your ciphertext from 2^64 versions > ago. Of course, a larger counter would be better, if you can fit it in.Interesting. If ZFS could make sure no blocks exist in a pool from more than 2^64-1 transactions ago[*], then the txg + a 32-bit per-transaction block write counter would suffice. That way Darren would have to store just 32 bits of the IV. That way he''d have 352 bits to work with, and then it''d be possible to have a 128-bit authentication tag and a 224-bit hash. And if later Darren is able to switch to MACing the Merkle roots then he''d have 352 bits for a hash. [*] Transactions happen a fairly low rate of about a one every few seconds. At that rate 2^64 transactions means over a trillion years before the txg wraps (half a trillion if the rate is 1/sec). Therefore ZFS does not need a cleaner service to re-write really old blocks. If 32 bits for per-transaction block write counters is too low, then transaction rate could increase (and arguably would have to anyways); even with the fastest flash 2^32 IOPS seems a long way away, and there should be enough CPU to jack up the transaction rate by then to compensate. Let''s suppose that we end up with a txg per-microsecond: then we get down to a still comfy (though starting to push it) 584,542 years before we wrap. Nico --
Darren J Moffat
2009-Nov-03 19:19 UTC
Truncating SHA2 hashes vs shortening a MAC for ZFS Crypto
Zooko Wilcox-O''Hearn wrote:> following-up to my own post to clarify something important and add some > further ideas > > On Tuesday,2009-11-03, at 9:32 , Zooko Wilcox-O''Hearn wrote: > >> don''t allocate a lot of bits to the MAC tag which is mostly >> redundant. Maybe just allocate 32 bits to it, and think of it as a >> double-check that you have the right key and that your AES >> implementation is working right. > > Important note: GCM does *not* have the security properties that you > expect from a truncated MAC tag: [1, 2].I never said anything about truncating the GCM MAC and I wouldn''t do that. With GCM you can choose in the params the size of the MAC. I that is what you mean by truncate though in choosing a short tag though. The main thing I get from those two references and the GCM spec is: ever go below 96 bits of MAC but ideally use 128 bits of MAC. So that leads me to think that for ZFS given my space restriction this is probably the best set of sizes for IV, MAC, cryptographic hash of ciphertext: 96 bit IV (stored in block pointer) 96 bit MAC (stored in block pointer) SHA256 truncated to 192 bits. > If you''re relying on the MAC> tag for integrity (i.e., if the SHA256 tag is truncated to be short or > if the user is allowed to run with an insecure checksum), then you must > use a sufficiently large MAC tag.They user can''t choose a checksum other than SHA256 if encryption is enabled. In the future when SHA-3 is choosen we will allow that too. Right, thats the question, how big of a GCM MAC is big enough ?> It seems like the IV field could be mostly or completely optimized out > by generating the IV at runtime from other data which is guaranteed to > be unique for this version of this block. Note that you really should > use a unique IV on *every write* of the block -- i.e. for every unique > block''s worth of plaintext -- and not re-use the same IV for successive > contents of the same block. Do you already do that?Yes the IV is unique for every write already. I have to store the IV because of other features coming in the future. Originally I was calculating the IV based on: object set, object, blockid and transaction group (unsigned 64bit ints). I still do calculate the IV based on those but it needs to be stored.> Looking at [3] I don''t see anything that obviously fits the bill. The > Birth Transaction ID uniquely identifies this block as far as I > understand, but nothing uniquely identifies this particular version of > this block."Version" of the block doesn''t really make sense in ZFS in that way because ZFS is copy on write. Or maybe you can think of the birth transaction id as the version because the other things like the object set, object, level and block id identify the logical filesystem location. The DVA (Data Virtual Address) is the 128 bit disk location but I don''t believe I can use any of that for the IV because in the future we will allow the physical location of the logical block to change (and we need that to work without the crypto keys present). -- Darren J Moffat
Darren J Moffat
2009-Nov-03 19:28 UTC
Truncating SHA2 hashes vs shortening a MAC for ZFS Crypto
Nicolas Williams wrote:> Interesting. If ZFS could make sure no blocks exist in a pool from more > than 2^64-1 transactions ago[*], then the txg + a 32-bit per-transaction > block write counter would suffice. That way Darren would have to store > just 32 bits of the IV. That way he''d have 352 bits to work with, and > then it''d be possible to have a 128-bit authentication tag and a 224-bit > hash.The logical txg (post dedup integration we have physical and logical transaction ids) + a 32 bit counter is interesting. It was actually my very first design for IV''s several years ago! All this assumes that the data encryption key is staying the same - we don''t have to go on that assumption with ZFS since I have the means to start using a new one for new blocks. Currently switching to a new data encryption key (distinct from changing the wrapping key the user looks after) is under the admin/users control but it could be done automagically based on time or volume of blocks written.> If 32 bits for per-transaction block write counters is too low, then > transaction rate could increase (and arguably would have to > anyways); even with the fastest flash 2^32 IOPS seems a long way > away, and there should be enough CPU to jack up the transaction rate > by then to compensate. Let''s suppose that we end up with a txg > per-microsecond: then we get down to a still comfy (though starting > to push it) 584,542 years before we wrap.I suspect that sometime in the next 584,542 years the block pointer size for ZFS will increase and I''ll have more space to store a bigger MAC, hash and IV. In fact I guess that will happen even in the next 50 years. -- Darren J Moffat
Nicolas Williams
2009-Nov-03 19:36 UTC
Truncating SHA2 hashes vs shortening a MAC for ZFS Crypto
On Tue, Nov 03, 2009 at 07:28:15PM +0000, Darren J Moffat wrote:> Nicolas Williams wrote: > >Interesting. If ZFS could make sure no blocks exist in a pool from more > >than 2^64-1 transactions ago[*], then the txg + a 32-bit per-transaction > >block write counter would suffice. That way Darren would have to store > >just 32 bits of the IV. That way he''d have 352 bits to work with, and > >then it''d be possible to have a 128-bit authentication tag and a 224-bit > >hash. > > The logical txg (post dedup integration we have physical and logical > transaction ids) + a 32 bit counter is interesting. It was actually my > very first design for IV''s several years ago!Excellent.> All this assumes that the data encryption key is staying the same - we > don''t have to go on that assumption with ZFS since I have the means to > start using a new one for new blocks. Currently switching to a new data > encryption key (distinct from changing the wrapping key the user looks > after) is under the admin/users control but it could be done > automagically based on time or volume of blocks written.Not really. You can change or not change keys, and still, txg+32-bit counter will give you enough.> > If 32 bits for per-transaction block write counters is too low, then > > transaction rate could increase (and arguably would have to > > anyways); even with the fastest flash 2^32 IOPS seems a long way > > away, and there should be enough CPU to jack up the transaction rate > > by then to compensate. Let''s suppose that we end up with a txg > > per-microsecond: then we get down to a still comfy (though starting > > to push it) 584,542 years before we wrap. > > I suspect that sometime in the next 584,542 years the block pointer size > for ZFS will increase and I''ll have more space to store a bigger MAC, > hash and IV. In fact I guess that will happen even in the next 50 years.Heh. txg + 32-bit counter == 96-bit IVs sounds like the way to go.
Zooko Wilcox-O''Hearn
2009-Nov-04 05:18 UTC
Truncating SHA2 hashes vs shortening a MAC for ZFS Crypto
[Folks, I don''t seem to be getting messages from zfs-crypto-discuss so I am reading the web archives of zfs-crypto-discuss and replying.] in mail.opensolaris.org/pipermail/zfs-crypto-discuss/2009- November/002951.html Darren Moffat wrote: > SHA256 truncated to 192 bits. You know, I''ve thought about this sort of thing quite a lot for Tahoe- LAFS and there''s a very good reason not to truncate SHA-256 at all. That reason is: now you''ve got to do your own cryptanalysis work. Suppose open cryptographers publish better and better attacks on SHA-256 in the future. As the attacks get better and better, we''ll have to decide how urgent it is to upgrade from SHA-256 to (hopefully by that time) SHA-3. If you''re using a truncation of SHA-256 then you might need to jump sooner than other people, and you won''t know whether or not this is the case unless you study the attacks yourself! It is possible that a cryptographer will publish an attack on SHA-256 which is evaluated as "not a realistic threat", but which *is* a realistic threat on SHA-256-trunc-192. None of the open cryptographers will be checking or publicly mentioning whether SHA-256-trunc-192 is vulnerable because SHA-256-trunc-192 isn''t on their radar. > I have to store the IV because of other features coming in the future. Originally I was calculating the IV based on: object set, object, blockid and transaction group (unsigned 64bit ints). I still do calculate the IV based on those but it needs to be stored. > "Version" of the block doesn''t really make sense in ZFS in that way because ZFS is copy on write. Or maybe you can think of the birth transaction id as the version because the other things like the object set, object, level and block id identify the logical filesystem location. I don''t understand the last sentence there. Does this mean that you''ll never be asked to encrypt more than one plaintext under a different birth transaction id? If, so then perfect! -- use the birth transaction id as the IV! What other features coming in the future would need to know the IV and would not already know the birth transaction id? Something that I still don''t understand is: why do you have a MAC tag at all if you already have a SHA-256 hash of the ciphertext? David- Sarah Hopwood suggested a good reason that you might want one [1], but is that your reason? Because how big the MAC tag needs to be is probably determined by why you need it. Regards, Zooko Wilcox-O''Hearn [1] mail-archive.com/cryptography at metzdowd.com/msg11034.html --- Your cloud storage provider does not need access to your data. Tahoe-LAFS -- allmydata.org
Darren J Moffat
2009-Nov-04 15:04 UTC
Truncating SHA2 hashes vs shortening a MAC for ZFS Crypto
Zooko Wilcox-O''Hearn wrote:> [Folks, I don''t seem to be getting messages from zfs-crypto-discuss so I > am reading the web archives of zfs-crypto-discuss and replying.] > > in > mail.opensolaris.org/pipermail/zfs-crypto-discuss/2009-November/002951.html > Darren Moffat wrote: > > > SHA256 truncated to 192 bits. > > You know, I''ve thought about this sort of thing quite a lot for > Tahoe-LAFS and there''s a very good reason not to truncate SHA-256 at > all. That reason is: now you''ve got to do your own cryptanalysis work. > Suppose open cryptographers publish better and better attacks on SHA-256 > in the future. As the attacks get better and better, we''ll have to > decide how urgent it is to upgrade from SHA-256 to (hopefully by that > time) SHA-3. If you''re using a truncation of SHA-256 then you might > need to jump sooner than other people, and you won''t know whether or not > this is the case unless you study the attacks yourself!That is exactly my concern and why I came here for advice. I think I''m now convinced that truncating the SHA256 hash would not be a good idea even though we do have an additional MAC.> I don''t understand the last sentence there. Does this mean that you''ll > never be asked to encrypt more than one plaintext under a different > birth transaction id? If, so then perfect! -- use the birth transaction > id as the IV! What other features coming in the future would need to > know the IV and would not already know the birth transaction id?In a given birth transaction (txg) there may be many blocks being encrypted under the same key, so the txg alone isn''t enough. The combination of txg, objset and block id are unique - but that is 192 bits. One of the possible future features that would need access to the IV is if we do a version of ''zfs send'' (which takes a ZFS filesystem and makes a stream out of it for replication purposes) that transfers the blocks as they are on disk (ie compressed and encrypted). Currently the ''zfs send'' works at the DMU layer of ZFS and doesn''t deal in transactions or even disk blocks - it deals in DMU objects and the send stream is all decrypted and decompressed. To be able to send ciphertext blocks we will need to send the IV to the remote side too. Which is why we need to store the IV rather than calculate it - the remote side won''t be putting that ciphertext on disk in the same txg number. I don''t want to do anything now that would make that difficult to do later.> Something that I still don''t understand is: why do you have a MAC tag at > all if you already have a SHA-256 hash of the ciphertext? David-Sarah > Hopwood suggested a good reason that you might want one [1], but is that > your reason? Because how big the MAC tag needs to be is probably > determined by why you need it.The SHA-256 is unkeyed so there would be nothing to stop an attacker that can write to the disks but doesn''t know the key from modifying the on disk ciphertext and all the SHA-256 hashes up to the top of the Merkle tree to the uberblock. That would create a valid ZFS pool but the data would have been tampered with. I don''t see that as an acceptable risk. I can''t make the SHA-256 keyed because there are ZFS operations that we must be able to perform with the decryption key is not available: resilvering a mirror/raidz, disk removal (raid relayout), hotspare, scrub (proactive resilver). By using a MAC we reduce that risk because now the attacker needs to forge the MAC and modify the SHA-256 Merkle tree all the way to the uberblock as well. Depending on the type of modification the attacker may actually need to forge multiple MAC tags. A given MAC tag applies to a single ZFS block that is between 512 bytes and 128k. So if I don''t truncate the SHA-256 how big does my MAC need to be given every ZFS block has its own IV ? -- Darren J Moffat
Zooko at allmydata.com
2009-Nov-05 05:32 UTC
Truncating SHA2 hashes vs shortening a MAC for ZFS Crypto
On Wednesday,2009-11-04, at 8:04 , Darren J Moffat wrote:> One of the possible future features that would need access to the > IV is if we do a version of ''zfs send'' (which takes a ZFS > filesystem and makes a stream out of it for replication purposes) > that transfers the blocks as they are on disk (ie compressed and > encrypted). Currently the ''zfs send'' works at the DMU layer of ZFS > and doesn''t deal in transactions or even disk blocks - it deals in > DMU objects and the send stream is all decrypted and > decompressed. To be able to send ciphertext blocks we will need > to send the IV to the remote side too. Which is why we need to > store the IV rather than calculate it - the remote side won''t be > putting that ciphertext on disk in the same txg number. I don''t > want to do anything now that would make that difficult to do later.Is there some information that does come with and then get stored by the "zfs recv" side that you could use for (partial) IV? Maybe an objset id? Anything which is stable through zfs send-and-receive and which is unique to this block could be used for (part of) the IV and thus shorten the amount of bits you need to store for the IV. Regards, Zooko
Zooko Wilcox-O''Hearn
2009-Nov-08 21:02 UTC
Truncating SHA2 hashes vs shortening a MAC for ZFS Crypto
On Wednesday,2009-11-04, at 7:04 , Darren J Moffat wrote:> The SHA-256 is unkeyed so there would be nothing to stop an > attacker that can write to the disks but doesn''t know the key from > modifying the on disk ciphertext and all the SHA-256 hashes up to > the top of the Merkle tree to the uberblock. That would create a > valid ZFS pool but the data would have been tampered with. I > don''t see that as an acceptable risk.I see. It is interesting that you and I have different intuitions about this. My intuition is that it is easier to make sure that the Merkle Tree root hash wasn''t unauthorizedly changed than to make sure that an unauthorized person hasn''t learned a secret. Is your intuition the opposite? I suppose in different situations either one could be true. Now I better appreciate why you want to use both a secure hash and a MAC. Now I understand the appeal of Nico Williams''s proposal to MAC just the root of the tree and not every node of the tree. That would save space in all the non-root nodes but would retain the property that you have to both know the secret *and* be able to write to the root hash in order to change the filesystem.> So if I don''t truncate the SHA-256 how big does my MAC need to be > given every ZFS block has its own IV ?I don''t know the answer to this question. I have a hard time understanding if the minimum safe size of the MAC is zero (i.e. you don''t need it anyway) or a 128 bits (i.e. you rely on the MAC and you want 128-bit crypto strength) or something in between. Regards, Zooko
Zooko Wilcox-O''Hearn
2009-Nov-27 18:30 UTC
Truncating SHA2 hashes vs shortening a MAC for ZFS Crypto
I wrote this note on 2009-11-04 22:32:34 but I haven''t seen a reply. I wanted to make sure that the ZFS crypto engineers thought about this, because if there *is* any information which fits the bill then it could help. On Wednesday,2009-11-04, at 8:04 , Darren J Moffat wrote:> One of the possible future features that would need access to the > IV is if we do a version of ''zfs send'' (which takes a ZFS > filesystem and makes a stream out of it for replication purposes) > that transfers the blocks as they are on disk (ie compressed and > encrypted). Currently the ''zfs send'' works at the DMU layer of ZFS > and doesn''t deal in transactions or even disk blocks - it deals in > DMU objects and the send stream is all decrypted and > decompressed. To be able to send ciphertext blocks we will need > to send the IV to the remote side too. Which is why we need to > store the IV rather than calculate it - the remote side won''t be > putting that ciphertext on disk in the same txg number. I don''t > want to do anything now that would make that difficult to do later. >Is there some information that does come with and then get stored by the "zfs recv" side that you could use for (partial) IV? Maybe an objset id? Anything which is stable through zfs send-and-receive and which is unique to this block could be used for (part of) the IV and thus shorten the amount of bits you need to store for the IV. Regards, Zooko
Darren J Moffat
2009-Dec-01 13:45 UTC
Truncating SHA2 hashes vs shortening a MAC for ZFS Crypto
Zooko Wilcox-O''Hearn wrote:> I wrote this note on 2009-11-04 22:32:34 but I haven''t seen a reply. I > wanted to make sure that the ZFS crypto engineers thought about this, > because if there *is* any information which fits the bill then it could > help.I had had seen it and thought about it. In fact I was originally using information from the zbookmark_t and the txg id.> On Wednesday,2009-11-04, at 8:04 , Darren J Moffat wrote: > > >> One of the possible future features that would need access to the IV >> is if we do a version of ''zfs send'' (which takes a ZFS filesystem and >> makes a stream out of it for replication purposes) that transfers the >> blocks as they are on disk (ie compressed and encrypted). Currently >> the ''zfs send'' works at the DMU layer of ZFS and doesn''t deal in >> transactions or even disk blocks - it deals in DMU objects and the >> send stream is all decrypted and decompressed. To be able to send >> ciphertext blocks we will need to send the IV to the remote side too. >> Which is why we need to store the IV rather than calculate it - the >> remote side won''t be putting that ciphertext on disk in the same txg >> number. I don''t want to do anything now that would make that >> difficult to do later. >> > > Is there some information that does come with and then get stored by the > "zfs recv" side that you could use for (partial) IV? Maybe an objset > id? Anything which is stable through zfs send-and-receive and which is > unique to this block could be used for (part of) the IV and thus shorten > the amount of bits you need to store for the IV.The information in the zbookmark_t + txg works for all cases except for: send-recv where the data is sent encrypted deduplication In both of those cases the zbookmark_t and txg will be different. I''ve settled on 160 bits of SHA256, 96 bit stored IV, and 96 bit AuthTag/MAC. -- Darren J Moffat
Zooko Wilcox-O''Hearn
2009-Dec-01 19:27 UTC
Truncating SHA2 hashes vs shortening a MAC for ZFS Crypto
Thanks for the reply. Too bad you can''t wait for SHA3. Now you''ll have to think about whether new cryptanalytic results against SHA-256 mean that SHA-256-trunc-160 is vulnerable and if so what effect that has on the safety of your scheme. But, I don''t have a better solution for you, other than Nico Williams''s proposal to put a MAC on only the root, which you''ve already rejected as being too disruptive of a change at this point. Regards, Zooko --- Your cloud storage provider does not need access to your data. Tahoe-LAFS -- allmydata.org
Darren J Moffat
2009-Dec-01 21:26 UTC
Truncating SHA2 hashes vs shortening a MAC for ZFS Crypto
Zooko Wilcox-O''Hearn wrote:> Thanks for the reply. Too bad you can''t wait for SHA3.Waiting for SHA-3 means waiting until 2012 and that is totally unrealistic.> Now you''ll have > to think about whether new cryptanalytic results against SHA-256 mean > that SHA-256-trunc-160 is vulnerable and if so what effect that has on > the safety of your scheme.That is a risk we have to take but we aren''t dependent on the truncated SHA-256 for security of the ciphertext the MAC and the trunctated SHA2-256 together provides that for us.> But, I don''t have a better solution for you, other than Nico Williams''s > proposal to put a MAC on only the root, which you''ve already rejected as > being too disruptive of a change at this point.It is yes, but it can be investigated for the future - ZFS is versioned on disk and we can thus make changes like this. -- Darren J Moffat
Zooko Wilcox-O''Hearn
2009-Dec-14 01:36 UTC
Truncating SHA2 hashes vs shortening a MAC for ZFS Crypto
On Tuesday, 2009-12-01, at 6:45 , Darren J Moffat wrote:> I''ve settled on 160 bits of SHA256, 96 bit stored IV, and 96 bit > AuthTag/MAC.And this 96-bit stored IV is the only IV, right? Because auxiliary information such as zbookmark_t and txg can''t be used for this purpose? Have you worked out the birthday paradox consequences for a 96-bit IV? The easy one to calculate is that if you had 2^48 blocks then you''d have a 50% chance of IV collision. I don''t know about you, but I would consider 2^48 blocks to be way more than you need to support for the forseeable future. But how many blocks does it take before you suffer a 10^-5 chance of IV collision? How about a 10^-9 chance? Anyway, what is your tolerance for a chance of IV collision? I haven''t worked out the answers to these birthday paradox questions yet, but I intend to, with the help of my brother who is a statistician, and report back. Obviously only you can answer the one about what chance of IV collision you are comfortable with. Another question: does this scheme prevent deduplication? If two blocks have identical plaintext, but independent random IVs and therefore different ciphertext, then how can the deduper figure out that they could be deduped? (Foreshadowing: I have a crypto hack in mind that could address these two issues, if issues they be.) Regards, Zooko --- Your cloud storage provider does not need access to your data. Tahoe-LAFS -- allmydata.org
David-Sarah Hopwood
2009-Dec-14 03:20 UTC
Truncating SHA2 hashes vs shortening a MAC for ZFS Crypto
Zooko Wilcox-O''Hearn wrote:> On Tuesday, 2009-12-01, at 6:45 , Darren J Moffat wrote: > >> I''ve settled on 160 bits of SHA256, 96 bit stored IV, and 96 bit >> AuthTag/MAC. > > And this 96-bit stored IV is the only IV, right? Because auxiliary > information such as zbookmark_t and txg can''t be used for this purpose? > > Have you worked out the birthday paradox consequences for a 96-bit IV? > The easy one to calculate is that if you had 2^48 blocks then you''d have > a 50% chance of IV collision. I don''t know about you, but I would > consider 2^48 blocks to be way more than you need to support for the > forseeable future. But how many blocks does it take before you suffer a > 10^-5 chance of IV collision?sqrt(2 * 10^-5) * 2^48 =~ 2^39.7. (You can do it in your head: nearest power of 2 to 100000 is 2^17, so that''s roughly 2^(48 - 17/2) = 2^39.5.)> How about a 10^-9 chance?sqrt(2 * 10^-9) * 2^48 =~ 2^33.1.> Anyway, what is your tolerance for a chance of IV collision?For me, these figures for number of blocks are not high enough, if the IV is derived randomly. If it is derived in such a way that it''s guaranteed to be unique, then there is no problem. But note the comment I made about repeated IVs in CTR-based modes, in the post referenced below.> Obviously only you can answer the one about what chance of IV collision > you are comfortable with. > > Another question: does this scheme prevent deduplication? If two blocks > have identical plaintext, but independent random IVs and therefore > different ciphertext, then how can the deduper figure out that they > could be deduped? > > (Foreshadowing: I have a crypto hack in mind that could address these > two issues, if issues they be.)In <article.gmane.org/gmane.comp.encryption.general/13719>, I suggested a scheme that would also address both issues (maybe it''s similar to what you''re thinking of). Ideally you would want a 256-bit-block cipher if you''re going to use that approach for CTR mode, though. -- David-Sarah Hopwood ? davidsarah.livejournal.com -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 292 bytes Desc: OpenPGP digital signature URL: <mail.opensolaris.org/pipermail/zfs-crypto-discuss/attachments/20091214/49b64baf/attachment.bin>
Darren J Moffat
2009-Dec-14 11:53 UTC
Truncating SHA2 hashes vs shortening a MAC for ZFS Crypto
Zooko Wilcox-O''Hearn wrote:> On Tuesday, 2009-12-01, at 6:45 , Darren J Moffat wrote: > >> I''ve settled on 160 bits of SHA256, 96 bit stored IV, and 96 bit >> AuthTag/MAC. > > And this 96-bit stored IV is the only IV, right? Because auxiliary > information such as zbookmark_t and txg can''t be used for this purpose?The 96 bit IV is calculated by hashing some of the fields from the zbookmark_t (object, level, block) and the txg.> Have you worked out the birthday paradox consequences for a 96-bit IV?GCM mode will GHASH any IV larger than 96 bits down to 96 bits anyway and 96 bit is considered the "default" IV size. For CCM the maximum possible IV (nonce field in the CCM_PARAMS) is 13 bytes (104 bits) but that limits the data size to 64k which is two low to store the current maximum size of ZFS block (128k) we could encrypt in one operation. So I have to set the CCM nonce size to 12 bytes (96 bits) so that the size of data that can be encrypted goes up to 16777215 bits. This is the way CCM mode is defined, so again my maximum is a 96 bit IV.> Another question: does this scheme prevent deduplication? If two blocks > have identical plaintext, but independent random IVs and therefore > different ciphertext, then how can the deduper figure out that they > could be deduped?The IVs aren''t randomly generated, and yes that does prevent deduplication if I use an IV derived from the zbookmark_t+txg or a random IV. For deduplication the IV generate is done differently to ensure the ciphertext does match, but only when dedup is enabled on those ZFS datasets - this means we can dedup with in a dataset and its clones (since they by default share data encryption keys_ but not any other datasets. -- Darren J Moffat
David-Sarah Hopwood
2009-Dec-14 19:34 UTC
Truncating SHA2 hashes vs shortening a MAC for ZFS Crypto
Darren J Moffat wrote:> Zooko Wilcox-O''Hearn wrote: >> On Tuesday, 2009-12-01, at 6:45 , Darren J Moffat wrote: >> >>> I''ve settled on 160 bits of SHA256, 96 bit stored IV, and 96 bit >>> AuthTag/MAC. >> >> And this 96-bit stored IV is the only IV, right? Because auxiliary >> information such as zbookmark_t and txg can''t be used for this purpose? > > The 96 bit IV is calculated by hashing some of the fields from the > zbookmark_t (object, level, block) and the txg. > >> Have you worked out the birthday paradox consequences for a 96-bit IV? > > GCM mode will GHASH any IV larger than 96 bits down to 96 bits anyway > and 96 bit is considered the "default" IV size.Right, but GCM mode (like other CTR-based modes) is designed under the assumption that the IV is chosen uniquely. If it''s a hash, then even when the inputs to the hash are unique, we expect collisions after sqrt(2*p) * 2^48 blocks with probability p. The consequence of a collision is to leak the exclusive-or of the plaintexts of the colliding blocks (which will often be sufficient to derive both plaintexts). This security level is significantly less than the design strength of 128-bit AES. For most applications, this weakness probably wouldn''t matter very much, but the problem is that you don''t know what will be stored in the filesystem or how significant a leak would be.>> Another question: does this scheme prevent deduplication? If two >> blocks have identical plaintext, but independent random IVs and >> therefore different ciphertext, then how can the deduper figure out >> that they could be deduped? > > The IVs aren''t randomly generated, and yes that does prevent > deduplication if I use an IV derived from the zbookmark_t+txg or a > random IV. > > For deduplication the IV generate is done differently to ensure the > ciphertext does match, but only when dedup is enabled on those ZFS > datasets - this means we can dedup with in a dataset and its clones > (since they by default share data encryption keys_ but not any other > datasets.How is the IV derived when dedup is enabled? -- David-Sarah Hopwood ? davidsarah.livejournal.com -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 292 bytes Desc: OpenPGP digital signature URL: <mail.opensolaris.org/pipermail/zfs-crypto-discuss/attachments/20091214/5cb7dfec/attachment.bin>
Zooko Wilcox-O''Hearn
2009-Dec-14 21:48 UTC
Truncating SHA2 hashes vs shortening a MAC for ZFS Crypto
On Sunday, 2009-12-13, at 20:20 , David-Sarah Hopwood wrote:>> But how many blocks does it take before you suffer a 10^-5 chance >> of IV collision? > > sqrt(2 * 10^-5) * 2^48 =~ 2^39.7. > > (You can do it in your head: nearest power of 2 to 100000 is 2^17, > so that''s roughly 2^(48 - 17/2) = 2^39.5.) > >> How about a 10^-9 chance? > > sqrt(2 * 10^-9) * 2^48 =~ 2^33.1.Hrm, why is this the answer? Can you explain the math to me? In any case, assuming you are right then this means that having a mere 8 billion blocks in your filesystem would incur a one-in-a- billion chance of IV collision. Hm. That''s not something I would be comfortable with.>> (Foreshadowing: I have a crypto hack in mind that could address >> these two issues, if issues they be.) > > In <article.gmane.org/gmane.comp.encryption.general/13719>, > I suggested a scheme that would also address both issues (maybe > it''s similar to what you''re thinking of). Ideally you would want a > 256-bit-block cipher if you''re going to use that approach for CTR > mode, though.Ha ha! Thank you for reminding me of this. Well, I read your proposal when you posted it, but I didn''t really understand it all. Then a couple of days ago I woke up in the morning with a clever idea in mind for how to improve ZFS crypto, which I alluded to, above. Re- reading your post now it appears that my clever invention is nothing but a poor variation of yours. :-) So, just like you said, this proposal of mine -- I mean of yours -- would have several advantages: most importantly, it allows deduplication of encrypted blocks and it does so even if dedupe was not enabled when that encrypted dataset was created and filled with data. Second most important, it allows 128-bit IVs, which means that if you are unwilling to tolerate a 10^-9 chance of IV collision, you can raise your dataset size limit from 2^33 blocks (with a 96-bit IV) to 2^49 blocks (with a 128-bit IV). Thirdly, your scheme allows any block cipher mode of operation including unauthenticated ones. And like you said, the major drawback to this is that you have to process each block in two passes -- once to compute the MAC and then a second time to encrypt. Since the blocks are small enough to fit wholly in RAM, this may not be a significant performance problem in practice -- I''m not sure. If I were making the decision then I would measure that. You mentioned that this would allow a parallelizable computation of the encryption, such as by using CTR mode, and that you don''t know of any unpatented parallelizable MAC. However it is easy to make a parallelizable MAC: use a Merkle Tree! For example, generate a separate MAC tag over each 4 KiB of the block in a separate thread, resulting in 32 MAC tags (if the block were 128 KiB in size). Then concatenate all the MAC tags together and compute a MAC over them. Regards, Zooko
David-Sarah Hopwood
2009-Dec-15 04:59 UTC
Truncating SHA2 hashes vs shortening a MAC for ZFS Crypto
Zooko Wilcox-O''Hearn wrote:> On Sunday, 2009-12-13, at 20:20 , David-Sarah Hopwood wrote: >>> But how many blocks does it take before you suffer a 10^-5 chance of >>> IV collision? >> >> sqrt(2 * 10^-5) * 2^48 =~ 2^39.7. >> >> (You can do it in your head: nearest power of 2 to 100000 is 2^17, so >> that''s roughly 2^(48 - 17/2) = 2^39.5.) >> >>> How about a 10^-9 chance? >> >> sqrt(2 * 10^-9) * 2^48 =~ 2^33.1. > > Hrm, why is this the answer? Can you explain the math to me?Given n random values drawn from a discrete uniform distribution with d >= n elements, let p be the probability that at least two values are the same. Then 1 - p is the probability that all values are distinct. When choosing the (k+1)th value, k values have already been chosen, so the probability that this value will be distinct from those already chosen is 1 - k/d. Therefore 1 - p = product{k = 1..n-1}(1 - k/d) for n <= d Apply the approximation 1 ? x =~ e^-x for each x = k/d: 1 - p =~ product{k = 1..n-1}(e^(-k/d)) = e^sum{k = 1..n-1}(-k/d) = e^-(n(n-1)/2d) ln(1 - p) =~ -n(n-1)/2d ln(1/(1 - p)) =~ n(n-1)/2d n^2 =~ 2d ln(1/(1 - p)) [when d >> n] n =~ sqrt(2d ln(1/(1 - p))) Apply 1 - p =~ e^-p for small p, therefore ln(1/(1 - p)) =~ p: n =~ sqrt(2dp) = sqrt(2p) sqrt(d) This doesn''t tell us how close the approximation is. In fact it is only about 18% out for p = 1/2, and more accurate for all smaller p.> In any case, assuming you are right then this means that having a mere 8 > billion blocks in your filesystem would incur a one-in-a-billion chance > of IV collision. Hm. That''s not something I would be comfortable with.Me neither -- at least not for a general-purpose system where you don''t know the sensitivity of the data.>> In <article.gmane.org/gmane.comp.encryption.general/13719>, I >> suggested a scheme that would also address both issues (maybe it''s >> similar to what you''re thinking of). Ideally you would want a >> 256-bit-block cipher if you''re going to use that approach for CTR >> mode, though. > > Ha ha! Thank you for reminding me of this. Well, I read your proposal > when you posted it, but I didn''t really understand it all. Then a > couple of days ago I woke up in the morning with a clever idea in mind > for how to improve ZFS crypto, which I alluded to, above. Re-reading > your post now it appears that my clever invention is nothing but a poor > variation of yours. :-)This has happened to me many times :-) (You very often need to reinvent something in order to see that it''s possible, and know what to look for in previous research.)> So, just like you said, this proposal of mine -- I mean of yours -- > would have several advantages: most importantly, it allows deduplication > of encrypted blocks and it does so even if dedupe was not enabled when > that encrypted dataset was created and filled with data.Oh, I hadn''t spotted that.> Second most > important, it allows 128-bit IVs, which means that if you are unwilling > to tolerate a 10^-9 chance of IV collision, you can raise your dataset > size limit from 2^33 blocks (with a 96-bit IV) to 2^49 blocks (with a > 128-bit IV). > > Thirdly, your scheme allows any block cipher mode of operation including > unauthenticated ones. > > And like you said, the major drawback to this is that you have to > process each block in two passes -- once to compute the MAC and then a > second time to encrypt. Since the blocks are small enough to fit wholly > in RAM, this may not be a significant performance problem in practice -- > I''m not sure. If I were making the decision then I would measure that. > > You mentioned that this would allow a parallelizable computation of the > encryption, such as by using CTR mode, and that you don''t know of any > unpatented parallelizable MAC. However it is easy to make a > parallelizable MAC: use a Merkle Tree! For example, generate a separate > MAC tag over each 4 KiB of the block in a separate thread, resulting in > 32 MAC tags (if the block were 128 KiB in size). Then concatenate all > the MAC tags together and compute a MAC over them.D''oh. Obvious when you point it out. That''s simple enough that it''s probably worth doing it that way, even if current machines can''t take full advantage because of threading overheads. -- David-Sarah Hopwood ? davidsarah.livejournal.com -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 292 bytes Desc: OpenPGP digital signature URL: <mail.opensolaris.org/pipermail/zfs-crypto-discuss/attachments/20091215/c4d1950a/attachment.bin>
David-Sarah Hopwood
2009-Dec-15 05:11 UTC
Truncating SHA2 hashes vs shortening a MAC for ZFS Crypto
Darren J Moffat wrote:> Zooko Wilcox-O''Hearn wrote: >> Have you worked out the birthday paradox consequences for a 96-bit IV? > > GCM mode will GHASH any IV larger than 96 bits down to 96 bits anyway > and 96 bit is considered the "default" IV size.Note that GCM mode should never be used with an IV other than 96 bits, because of the weakness described in section 2.4 of <csrc.nist.gov/groups/ST/toolkit/BCM/documents/comments/CWC-GCM/Ferguson2.pdf>. -- David-Sarah Hopwood ? davidsarah.livejournal.com -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 292 bytes Desc: OpenPGP digital signature URL: <mail.opensolaris.org/pipermail/zfs-crypto-discuss/attachments/20091215/229faf3c/attachment.bin>
David-Sarah Hopwood
2009-Dec-15 05:52 UTC
Truncating SHA2 hashes vs shortening a MAC for ZFS Crypto
David-Sarah Hopwood wrote:> Darren J Moffat wrote: >> Zooko Wilcox-O''Hearn wrote: >>> Have you worked out the birthday paradox consequences for a 96-bit IV? >> GCM mode will GHASH any IV larger than 96 bits down to 96 bits anyway >> and 96 bit is considered the "default" IV size. > > Note that GCM mode should never be used with an IV other than 96 bits, > because of the weakness described in section 2.4 of > <csrc.nist.gov/groups/ST/toolkit/BCM/documents/comments/CWC-GCM/Ferguson2.pdf>.Also, section 3 of <csrc.nist.gov/groups/ST/toolkit/BCM/documents/Joux_comments.pdf> describes an attack against GCM with repeated IV when the attacker can obtain more than one collision (but still only a small number of collisions). -- David-Sarah Hopwood ? davidsarah.livejournal.com -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 292 bytes Desc: OpenPGP digital signature URL: <mail.opensolaris.org/pipermail/zfs-crypto-discuss/attachments/20091215/0202fd71/attachment.bin>
Darren J Moffat
2009-Dec-15 10:04 UTC
Truncating SHA2 hashes vs shortening a MAC for ZFS Crypto
David-Sarah Hopwood wrote:> Darren J Moffat wrote: >> Zooko Wilcox-O''Hearn wrote: >>> Have you worked out the birthday paradox consequences for a 96-bit IV? >> GCM mode will GHASH any IV larger than 96 bits down to 96 bits anyway >> and 96 bit is considered the "default" IV size. > > Note that GCM mode should never be used with an IV other than 96 bits, > because of the weakness described in section 2.4 of > <csrc.nist.gov/groups/ST/toolkit/BCM/documents/comments/CWC-GCM/Ferguson2.pdf>.In ZFS crypto the IV for GCM is always 96 bits for this reason. -- Darren J Moffat
Darren J Moffat
2009-Dec-15 10:06 UTC
Truncating SHA2 hashes vs shortening a MAC for ZFS Crypto
David-Sarah Hopwood wrote:>> For deduplication the IV generate is done differently to ensure the >> ciphertext does match, but only when dedup is enabled on those ZFS >> datasets - this means we can dedup with in a dataset and its clones >> (since they by default share data encryption keys_ but not any other >> datasets. > > How is the IV derived when dedup is enabled?An HMAC (using a different per filesystem key from the dataset encryption key) of the plaintext. This allows for deduplication when they data encryption keys match, ie within the same ZFS filesystem (but not child filesystems) or a clone of it. At least until someone runs ''zfs key -K'' on the filesystem to start using a new data encryption key. -- Darren J Moffat
Darren J Moffat
2009-Dec-15 10:10 UTC
Truncating SHA2 hashes vs shortening a MAC for ZFS Crypto
Darren J Moffat wrote:> David-Sarah Hopwood wrote: >>> For deduplication the IV generate is done differently to ensure the >>> ciphertext does match, but only when dedup is enabled on those ZFS >>> datasets - this means we can dedup with in a dataset and its clones >>> (since they by default share data encryption keys_ but not any other >>> datasets. >> >> How is the IV derived when dedup is enabled? > > An HMAC (using a different per filesystem key from the dataset > encryption key) of the plaintext. This allows for deduplication when > they data encryption keys match, ie within the same ZFS filesystem (but > not child filesystems) or a clone of it. At least until someone runs > ''zfs key -K'' on the filesystem to start using a new data encryption key.The reason an HMAC is used rather than a hash is to reduce the likely hood of IV precomputation of known plaintexts. The small additional overhead of doing an HMAC over a hash is worth the extra computation in my opinion. -- Darren J Moffat
Zooko Wilcox-O''Hearn
2009-Dec-15 15:54 UTC
Truncating SHA2 hashes vs shortening a MAC for ZFS Crypto
On Monday, 2009-12-14, at 4:53 , Darren J Moffat wrote:> GCM mode will GHASH any IV larger than 96 bits down to 96 bits > anyway and 96 bit is considered the "default" IV size.That''s interesting. That means if you have a deterministic way to generate unique IVs, such as transaction counters and block identifiers and so on, and they generate guaranteed-unique 96-bit IVs and you give them to GCM then you''re good. But if your deterministic method generates 128-bit guaranteed-unique IVs and you give those to GCM then you suffer from the birthday paradox and your security might fail at large scale. That seems like a flaw in the GCM design that it would surprise the user like that. Regards, Zooko --- Your cloud storage provider does not need access to your data. Tahoe-LAFS -- allmydata.org