Edward Shishkin
2014-Aug-07 20:43 UTC
[Gluster-users] Transparent encryption and authentication in distributed systems with non-trusted servers
Hello everyone, Here we provide some basic backgrounds in addition to the manpages at http://www.gluster.org/community/documentation/index.php?title=Features/disk-encryption Comments, questions, any your experience are welcome. Thanks, Edward. Transparent encryption and authentication in distributed systems with non-trusted servers Distributed systems impose tighter requirements to at-rest encryption. This is because your encrypted data will be stored on servers, which are de facto untrusted. In particular, your private encrypted data can be subjected to analysis and tampering, which eventually will lead to its revealing, if it is not properly protected. Specifically, usually it is not enough to just encrypt data. In distributed systems serious protection of your personal data is possible only in conjunction with a special process, which is called authentication. GlusterFS provides such enhanced service: In GlusterFS encryption is enhanced with authentication. Currently we provide protection from "silent tampering". This is a kind of tampering which is hard to detect, because it doesn't break POSIX compliance. Specifically, we protect encryption-specific file's metadata which includes unique file's object id (GFID), cipher algorithm id, cipher block size and other attributes used by the encryption process. Restrictions 1. We encrypt only file content. The feature of transparent encryption doesn't protect file names: they are neither encrypted, nor verified. Protection of file names is not so critical as protection of encryption-specific file's metadata: any attacks based on tampering file names will break POSIX compliance and result in massive corruption, which is easy to detect. 2. The feature of transparent encryption is not supported in NFS mounts of GlusterFS volumes: NFS's file handles introduce security issues, which are hard to resolve. 3. The feature of transparent encryption is incompatible with GlusterFS performance translators (quick-read, write-behind and open-behind). I. Trusted and non-trusted machines Suppose you are a user of this feature (transparent encryption and authentication in GlusterFS). You qualify every machine as either trusted, or non-trusted. Examples: I.1 Your personal laptop, which is under your supervision is trusted machine. I.2 Remote GlusterFS servers, which are not under your supervision are non-trusted machines. They are managed by a good admin, but you don't trust him your private data. I.3 Clouds are important example of a set of non-trusted machines: you don't know what is going on there at all. II. Trusted and non-trusted objects Every machine contains objects (in the RAM, disks, registers, etc). All objects of every non-trusted machine are non-trusted by definition. Trusted machine contains both type of objects (trusted and non-trusted). Sources of non-trusted objects on your trusted machine: . non-trusted media; . non-trusted network; . social engineering; . etc. Sources of trusted objects on your trusted machine: . trusted media; . trusted network; . process of verification of non-trusted objects (authentication). Examples: II.1 email that you have received without any checks is non-trusted object; II.2 email with properly verified digital signature is trusted object; II.3 Your secret key properly generated on your trusted machine, or retrieved from trusted media is trusted object. II.4 You wanted to look at user accounts on your trusted local machine. The string "/etc/passwd" that you have passed to the open(2) is trusted object. II.5 Someone you don't know asked you to check if you have a file "/foo/bar" on your trusted local machine. The string "/foo/bar", that you have passed to readdir(2) is non-trusted object. II.6 Encrypted content of any regular file received from the non-trusted GlusterFS servers before processing by the crypt translator is non-trusted object. II.7 Decrypted content of your regular file after successful processing by the crypt translator is trusted object. II.8 List of file names provided by ls(1) for your mounted encrypted GlusterFS volume is non-trusted object (see subsection 1 of the Restrictions above). Status of some objects on your trusted machine can be changed from "non-trusted" to "trusted" by a special process, which is called authentication. Authentication includes creating/checking a special MAC (Message Authentication Code) for every object that you will want to verify after its storing on the non-trusted machines. III. Encryption and authentication in GlusterFS In GlusterFS a special translator (encryption/crypt) is responsible for both, encryption and authentication. The crypt translator works only on trusted client machines. Data Encryption We encrypt file content by the AES cipher algorithm with XTS cipher mode. This mode provides "weak data authentication". Tampering of ciphertext created in this mode will lead to unpredictable changes in the plain text, i.e. in data corruption, which is easy to detect. Data encryption is performed with unique per-file cipher key generated by master volume key and "salted" by the unique trusted object id (GFID). Metadata Authentication In the feature of transparent encryption the unique object id (GFID) is an important encryption-specific attribute, which needs protection, since it is stored on the non-trusted servers. We protect GFID by creating/checking MACs. Every such MAC is "salted" by the trusted absolute file name. This "salt" is needed to prevent a special kind of tampering, which extends the scope of per-file data cipher key (there are known attacks based on such extending). Every hardlink adds a respective MAC to the file. When the file is renamed, the respective MAC gets updated. Whenever the file is opened, we check all the per-name MACs up to the first match. No matches means failed verification: GlusterFS will refuse to open such file. Otherwise, the status of file's GFID will be changed to "trusted". Encryption-specific file attributes including the array of per-name MACs are stored on the untrusted server as file's xattrs with special key "trusted.glusterfs.crypt.att.cfmt". Let's create a file in our encrypted volume mounted at /mnt/glusterfs: # pwd /mnt/glusterfs # touch file # getfattr -n trusted.glusterfs.crypt.att.cfmt -d -e hex file trusted.glusterfs.crypt.att.cfmt=0x00fe824979347d90af1c8d87d67faae50d243d84d641 The first byte in the string contains version (format) of the string (0 in this example). In this format per-name MAC is 8 bytes long, and array of all MACs locates at the end of the string. In this example file has only one name and the string respectively contains only one MAC (aae50d243d84d641). Let's now create a hardlink: # ln file file-link # getfattr -n trusted.glusterfs.crypt.att.cfmt -d -e hex file trusted.glusterfs.crypt.att.cfmt=0x00fe824979347d90af1c8d87d67faae50d243d84d641c66d01a59a6a2e8c The file has acquired the second name ("file-link"), and the string respectively has been supplemented with the second MAC (c66d01a59a6a2e8c). # mv file file-renamed # getfattr -n trusted.glusterfs.crypt.att.cfmt -d -e hex file-renamed trusted.glusterfs.crypt.att.cfmt=0x00fe824979347d90af1c8d87d67fb46f237c94cb87dec66d01a59a6a2e8c We changed the file name, and the respective MAC was updated (the new value is b46f237c94cb87de) # rm -f file-renamed # getfattr -n trusted.glusterfs.crypt.att.cfmt -d -e hex file-link trusted.glusterfs.crypt.att.cfmt=0x00fe824979347d90af1c8d87d67f c66d01a59a6a2e8c We removed the hardlink, and respective MAC (b46f237c94cb87de) was removed. 13 bytes right after the format id (fe824979347d90af1c8d87d67f in our example) contain the following attributes, which are encrypted in a special AEAD mode: . data cipher algorithm id; . data cipher mode id; . encoded block size; . encryption translator id; . encoded size of the data cipher key; GFID is verified only for file operations, which invoke crypto transforms (cipher, authentication, etc). In particular, we need to vertify GFID during ->open(), ->read(), ->write(), ->truncate(), ->link(), etc. file operations. The example of a file operation, which doesn't require to verify GFID is ->readdir() (see subsection 1 of the Restrictions above). Currently GFID verification procedure is encapsulated in FOP->open() of the crypt translator. So the crypt translator mandatory calls the FOP->open() whenever the trusted GFID is required and is not in the cache (e.g. during FOP->truncate()). IV. Why we don't support the feature of transparent encryption in NFS mounts of GlusterFS volumes In NFS mounts of GlusterFS volumes file operations usually don't have file names. They manipulate with file handles instead (which actually are GFIDs). Respectively, we have to be sure that every file handle in the cache of the client machine is trusted. This is not simple to implement with a guarantee that future changes in GlusterFS code won't add a security hole, which will lead to appearing of non-verified file handles in the cache of the client machine.