Let me suggest what seems to be a minor design modification with what I think would be a fairly big difference in functionality. The idea is to have a set of keys for the dataset instead of a single key. (I was calling them "class keys"). So the change is that instead of storing a single dataset key wrapped with the pool key, you''d store a set of (class ID, class key) pairs, encrypted with the pool key. (and for types of information that should be assuredly-deletable, the class key would first be encrypted with one or more ephemerizer public keys. Call this the "encrypted class keys" data structure. For those that are curious what I''m talking about, see http://cs.jhu.edu/~fabian/courses/secure-delete.pdf for more information). Then, for instance, a file can be encrypted with its own key by putting into the dnode "class ID", and then file key encrypted with the class key. This would take space equal to maybe 4 bytes to specify the class key, and then a file key encrypted with the class key, which would be maybe 128 bits. Or if space in the dnode is a problem, it would suffice to just put in the class ID, and directly encrypt the file with the class key. (A minor advantage of using a per-file key instead of directly encrypting in the class key is to be able to change a file''s class without re-encrypting the whole file, for instance, to extend its expiration date). Reasons to have per-file keys, as well as be able to have multiple keys for a dataset are: a) to periodically change the encryption key being used for the dataset without re-encrypting everything in the dataset. b) to have a time-based policy that says, for instance, that each file in a particular directory expires a year from creation date. This is very different from having the entire directory expire at the same time. c) This might allow something along the lines of what Wyllys suggested, of wrapping some subset of the class keys inside the encrypted class key data structure, with something a user provides, so the file system couldn''t actually read the data unless the user enables it. ZFS probably wouldn''t use a user-wrapped class key for metadata -- just for data. The default policy could be a single class key for the entire dataset, and then the only cost is sticking the class ID (basically, a constant), into all the dnodes. Or, if there is no class ID specified in the dnode, the class would be inherited from the parent. Then, let''s assume at some point someone decides they want to change the key with which the dataset is encrypted. New files written after that time would be written with the new class key, and be specified as having been written with that class key by putting the new class key into the dnode. Old files can stay encrypted with the old class key. And then in the future, if UIs are implemented to specify policies such as time-based assured delete, or user-wrapped data encryption keys, then the file structures will be all ready. Radia
I think we need to make sure that there will be at least one bit in the dnode reserved for this when the i-team gets around to implementing per-file keying. Additional data could always be stored in an extended attribute if need be.
I guess it could be done with a single bit. One value of the bit would be "the default class key for this dataset" and the other value would be "different classkey -- find class key ID in the extended attribute". How precious is space? I doubt if we''d ever really need more than, say, 2 bytes'' worth of class keys (a class for each expiration date, with granularity of a day for up to 30 years would only be 10,000 keys). So 2 bytes for specifying the class ID would seem to be sufficient. And, as I said, although it''s a minor advantage to have a unique key K chosen for the data for each file (and store K encrypted with the class key in the dnode), which would take another 128 bits or so to store the encrypted file key), if space was very limited, we could do with *only* storing the 2 byte class key in the dnode, and encrypt all the files in the same class with the same class key. Radia Nicolas Williams wrote:> I think we need to make sure that there will be at least one bit in the > dnode reserved for this when the i-team gets around to implementing > per-file keying. Additional data could always be stored in an extended > attribute if need be. >
On Thu, Jul 12, 2007 at 03:16:20PM -0700, Radia Perlman wrote:> How precious is space? I doubt if we''d ever really need more than, say, > 2 bytes'' worth > of class keys (a class for > each expiration date, with granularity of a day for up to 30 years would > only be 10,000 keys). > So 2 bytes for specifying the class ID would seem to be sufficient.I don''t know. The issue, if there is one, is that there are at least two projects that I know of that may be applying pressure on the dnode size. I don''t know if the way ZFS versions the on-disk layout would allow for multiple versions of the dnode layout in the same dataset/pool -- I hope that it does, but even if it does I imagine that keeping the number dnode layouts to a minimum would be good. Thus my concern about how much space we need to reserve against the dnode right now. Here we really need comment from ZFS stewards. Nico --
Nicolas Williams wrote:> I think we need to make sure that there will be at least one bit in the > dnode reserved for this when the i-team gets around to implementing > per-file keying. Additional data could always be stored in an extended > attribute if need be.There is already padding in the dnode_phys_t and blkptr_t structures for doing this stuff. There is sufficient space that I don''t see it disappearing any time soon. I really really really don''t want to change that padding and take away space from it for something until we are actually going to use it. During the developement of the proof of concept / prototype I had to change the structures about 3 or 4 times to merge with other ZFS projects who used reserved space before I did. I just don''t see the value in doing this type of explicit reservation for something that we might use, I''d much rather wait until we know exactly how these future features will work before we start taking space in on disk structures. -- Darren J Moffat