Edward Ned Harvey
2010-Apr-30 02:24 UTC
[zfs-discuss] ZFS snapshot versus Netapp - Security and convenience
I finally got it, I think. Somebody (with deep and intimate knowledge of ZFS development) please tell me if I''ve been hitting the crack pipe too hard. But . Part 1 of this email: Netapp snapshot security flaw. Inherent in their implementation of .snapshot directories. Part 2 of this email: How ZFS could do this, much better. (#1) Netapp snapshot security flaw. Inherent in their implementation of .snapshot directories. (as root) # mkdir -p a/b/c # echo "secret info" > a/b/c/info.txt # chmod 777 a # chmod 700 a/b # chmod 777 a/b/c # chmod 666 a/b/c/info.txt # rsh netappfiler snap create vol0 test creating snapshot... # echo "public info" > a/b/c/info.txt # mv a/b/c a/c (as a normal user) $ cat a/c/info.txt public info $ cat a/c/.snapshot/test/info.txt secret info D''oh!!! By changing permissions in the present filesystem, the normal user has been granted access to restricted information in the past. (#2) How ZFS could do this, much better. First let it be said, ZFS doesn''t have this security flaw. (Kudos.) But also let it be said, the user experience of having the .snapshot always conveniently locally available, is a very positive thing. Even if you rename and move some directory all over the place like crazy, with zillions of snapshots being taken in all those locations, when you look in that directory''s .snapshot, you still have access to *all* the previous snapshots of that directory, regardless of what that directory was formerly named, or where in the directory tree it was linked. In short, the user experience of .snapshot is more user friendly. But the .zfs style snapshot requires less development complexity and therefore immune to this sort of flaw. So here''s the idea, in which ZFS could provide the best of both worlds: Each inode contain a link count. In most cases, each inode has a link count of 1, but of course that can''t be assumed. It seems trivially simple to me, that along with the link count in each inode, the filesystem could also store a list of which inodes link to it. If link count is 2, then there''s a list of 2 inodes, which are the "parents" of this inode. In which case, it would be trivially easy to walk back up the whole tree, almost instantly identifying every combination of paths that could possibly lead to this inode, while simultaneously correctly handling security concerns about bypassing security of parent directories and everything. Once the absolute path is generated, if the user doesn''t have access to that path, then the user simply doesn''t get that particular result returned to them. It seems too perfect and too simple. Instead of a one-directional directed graph, simply make a bidirectional. There''s no significant additional overhead as far as I can tell. It seems like it would even be easy. By doing this, it will be very easy for zhist (or anything else) to instantly produce all the names of all the snapshot versions of any file or directory, even if that filename has been changing over time . even if that file is hardlinked in more than one directory path . Then ZFS has a technique, different from ".snapshot" directories, which perform more simply, more reliably, more securely than the netapp implementation. This technique works equally well for files or directories (unlike the netapp method.) And there is no danger of legal infringement upon any netapp "invention." -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100429/75fd1987/attachment.html>
Edward Ned Harvey
2010-Apr-30 03:30 UTC
[zfs-discuss] ZFS snapshot versus Netapp - Security and convenience
> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- > bounces at opensolaris.org] On Behalf Of Edward Ned Harvey > > Each inode contain a link count.? It seems trivially > simple to me, that along with the link count in each inode, the > filesystem could also store a list of which inodes link to it.Others may have better ideas for implementation. But at least for a starting point, here''s how I imagine this: The goal is to be always able to instantly locate all the previous snapshot versions of any file or directory, regardless of whether or not that filename, directory name, or any path leading up to that file or directory may have ever changed. An additional goal is to obey security. Don?t give the user any information that they couldn''t have found by other (slower) means. In this described scenario, these goals has been achieved. Currently, there''s a .zfs directory, which is not a "real" directory. By default, it''s hidden until you explicitly try to access it by name. Inside the .zfs directory, there''s presently a "snapshot" directory, and nothing else. Let''s suppose my system has several snapshots. "snap1", "snap2", "snap3", ... Then these appear as /tank/.zfs/snapshot/{snap1,snap2,snap3,...} And inside there, are all the subdirectories which lead to all the files. Let there be also, an "inodes" directory next to the snapshot directory. /tank/.zfs/snapshot /tank/.zfs/inodes Whenever a snap is created, let it be listed under both "snapshot" and "inodes" /tank/.zfs/snapshot/{snap1,snap2,snap3,...} /tank/.zfs/inodes/{snap1,snap2,snap3,...} If you simply "ls /tank/.zfs/inodes/snap1" then you see nothing. The system will not generate a list of every single inode in the whole filesystem; that would be crazy. But, just as the ".zfs" directory was hidden and appears upon attempted access, let there be text files, whose names are inode numbers, and these text files only appear upon attempted access. ls /tank/.zfs/inodes/snap1 (no result) cat /tank/.zfs/inodes/snap1/12345 (gives the following results) /tank/.zfs/snapshot/snap1/foo/bar/baz (which is the abs path to the file having inode 12345) And so, a mechanism has been created, so a user can do this: ls -i /tank/exports/home/jbond/somefile.txt 12345 cat /tank/.zfs/inodes/snap1/12345 (result is: exports/home/jbond/Some-File.TXT) Thus, we have identified the former name of somefile.txt and ... cat /tank/.zfs/snapshot/snap1/exports/home/jbond/Some-File.TXT Note: the above "ls -i ; cat" process is slightly tedious. I don''t expect many users to do this directly. But I would happily automate and simplify this process By coding zhist to utilize this technique automatically. User could: zhist ls somefile.txt Result would be: /tank/.zfs/snapshot/snap1/exports/home/jbond/Some-File.TXT And of course, once the command-line verson of zhist Can do that, there''s no obstacle preventing the GUI frontend. One important note: Since you''re doing a reverse mapping, from inode number to path name, it''s important to obey filesystem security. Fortunately, the process of generating absolute path names from an inode number is handled by kernel, and only after the complete absolute pathname has been generated, is anything returned to the user. Which means the kernel has the opportunity to test whether or not the user would have access to "ls" the specified inode by pathname, before returning that pathname to the user. In other words, if the user couldn''t get that pathname via "find /tank/.zfs/snapshot/snap1 -inum 12345" then the user could not get that pathname via .zfs/inodes either. The only difference is that the "find" command could run for a very long time, yet the ".zfs/inodes" directory returns that same result nearly instantly.
Peter Jeremy
2010-Apr-30 05:06 UTC
[zfs-discuss] ZFS snapshot versus Netapp - Security and convenience
On 2010-Apr-30 10:24:14 +0800, Edward Ned Harvey <solaris2 at nedharvey.com> wrote:>Each inode contain a link count. In most cases, each inode has a >link count of 1, but of course that can''t be assumed. It seems >trivially simple to me, that along with the link count in each inode, >the filesystem could also store a list of which inodes link to it. >If link count is 2, then there''s a list of 2 inodes, which are the >"parents" of this inode.I''m not sure exactly what you are trying to say here but it don''t think it will work. In a Unix FS (UFS or ZFS), a directory entry contains a filename and a pointer to an inode. The inode itself contains a count of the number of directory entries that point to it and pointers to the actual data. There is currently no provision for a reverse link back to the directory. I gather you are suggesting that the inode be extended to contain a list of the inode numbers of all directories that contain a filename referring to that inode. Whilst I agree that this would simplify inode to filename mapping and provide an alternate mechanism for checking file permissions, I think you are glossing over the issue of how/where to store these links. Whilst files can have a link count of 1 (I''m not sure if this is true in "most" cases), they can have up to 32767 links. Where is this list of (up to) 32767 "parent" inodes going to be stored?>In which case, it would be trivially easy to walk back up the whole >tree, almost instantly identifying every combination of paths that >could possibly lead to this inode, while simultaneously correctly >handling security concerns about bypassing security of parent >directories and everything.Whilst it''s trivially easy to get from the file to the list of directories containing that file, actually getting from one directory to its parent is less so: A directory containing N sub-directories has N+2 links. Whilst the ''.'' link is easy to identify (it points to its own inode), distinguishing between the name of this directory in its parent and the ''..'' entries in its subdirectories is rather messy (requiring directory scans) unless you mandate that the reference to the parent directory is in a fixed location (ie 1st or 2nd entry in the parent inode list).>It seems too perfect and too simple. Instead of a one-directional >directed graph, simply make a bidirectional. There''s no significant >additional overhead as far as I can tell. It seems like it would >even be easy.Well, you need to find somewhere to store up to 32K inode numbers, whilst having minimal space overhead for small numbers of links. Then you will need to patch the vnode operations underlying creat(), link(), unlink(), rename(), mkdir() and rmdir() to manage the backlinks (taking into account transactional consistency). -- Peter Jeremy -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 196 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100430/12b11309/attachment.bin>
Edward Ned Harvey
2010-Apr-30 13:56 UTC
[zfs-discuss] ZFS snapshot versus Netapp - Security and convenience
> From: Peter Jeremy [mailto:peter.jeremy at alcatel-lucent.com] > > I gather you are suggesting that the inode be extended to contain a > list of the inode numbers of all directories that contain a filename > referring to that inode.Correct.> [inodes] can have up to 32767 links [to them]. Where is > this list of (up to) 32767 "parent" inodes going to be stored?Naively, I suggest storing the list of "parents" in the inode itself. Let''s see if that''s unreasonable. How many bytes long is an inode number? I couldn''t find that easily by googling, so for the moment, I''ll guess it''s a fixed size, and I''ll guess 64bits (8 bytes). Which means all inodes of Link Count 1 would be extended by 8 bytes, and an inode could possibly require a maximum 32767*8 256kbytes maximum to store all the parent inode backpointers.> Well, you need to find somewhere to store up to 32K inode numbers, > whilst having minimal space overhead for small numbers of links.I think you''re saying: The number of bytes in an inode is fixed. Not variable. How many bytes is that? Would it be exceptionally difficult to extend and/or make variable? Perhaps all inodes (including files) could have a property similar to directories, where they reference a variable number of bytes written somewhere on disk (kind of like how directories reference variable sized files) and that allows the list of parent inodes to be stored in a block separate from the usual inode information. One important consideration in that hypothetical scenario would be fragmentation. If every inode were fragmented in two, that would be a real drag for performance. Perhaps every inode could be extended (for example) 32 bytes to accommodate a list of up to 4 parent inodes, but whenever the number of parents exceeds 4, the inode itself gets fragmented to store a variable list of parents.> >In which case, it would be trivially easy to walk back up the whole > >tree, almost instantly identifying every combination of paths that > >could possibly lead to this inode, while simultaneously correctly > >handling security concerns about bypassing security of parent > >directories and everything. > > Whilst it''s trivially easy to get from the file to the list of > directories containing that file, actually getting from one directory > to its parent is less so: A directory containing N sub-directories has > N+2 links. Whilst the ''.'' link is easy to identify (it points to its > own inode), distinguishing between the name of this directory in its > parent and the ''..'' entries in its subdirectories is rather messy > (requiring directory scans) unless you mandate that the reference to > the parent directory is in a fixed location (ie 1st or 2nd entry in > the parent inode list).Interesting. In other words, because of the ".." entry in every subdirectory, every parent directory is linked to, not just by its parents, but also by its children. If extending inodes to include the list of "inodes that link to this inode" as I suggested, there would need to be a simple way of distinguishing which inodes in the "inodes that link to this inode" list are actually parents, and which ones are backpointers of children. I would suggest something simple, like this: The only reason to create a list of "parent inodes" is for the sake of quickly identifying the absolute path of any arbitrary inode number, so you can quickly locate all the past snaps of any arbitrary file or directory, even if that file or directory has been renamed, moved, or relocated in the directory tree. Instead of creating a list of all "inodes that link to this inode", just make it a "parent inodes" list. That is: when you create a subdirectory, even though the subdir does link back to its parent, the inode of the subdir is not stored in the parent''s "parent inodes" list. Thus, the Link Count of a directory is allowed to differ from the number of inodes listed in the "parent inodes" field. All inodes listed in the "parent inodes" field would, I think, then be links to a more shallow location in the tree hierarchy.
Edward Ned Harvey
2010-Apr-30 15:08 UTC
[zfs-discuss] ZFS snapshot versus Netapp - Security and convenience
> From: Peter Jeremy [mailto:peter.jeremy at alcatel-lucent.com] > > Whilst it''s trivially easy to get from the file to the list of > directories containing that file, actually getting from one directory > to its parent is less so: A directory containing N sub-directories has > N+2 links. Whilst the ''.'' link is easy to identify (it points to its > own inode), distinguishing between the name of this directory in its > parent and the ''..'' entries in its subdirectories is rather messyOh. Duh. This should have been obvious from the moment you said ''..'' Given: There is exactly one absolute path to every directory. You cannot hardlink subdirectories into multiple parent locations. You can only hardlink files. Every directory has exactly one parent, and the parent inode number is already stored in every directory inode. Given: There is already the ''..'' entry in every directory. Which means it is already trivially easy to identify the absolute path of any directory, given that you know its inode number, and you have some method to open an arbitrary inode by number. Which implies it can only be implemented in kernel, or perhaps by root. (A regular user cannot open an inode by number, due to security reasons, the parent directories may block permission for a regular user to open that inode.) But the fact remains: No change to filesystem or inode structure is necessary, in order to quickly identify the absolute path of an arbitrary directory, when your initial knowledge is only the inode number of the directory. Therefore, it should be very easy to implement proof of concept, by writing a setuid root C program, similar to "sudo" which could then become root, identify the absolute path of a directory by its inode number, and then print that absolute path, only if the real UID has permission to "ls" that path. Fundamentally, the only difficulty is to extend inodes of files, to include a list of parent inode directories. And, how to make all this information available over NFS and CIFS. While not trivial, it''s certainly possible to extend inodes of files, to include parent pointers. Also not trivial, it''s certainly possible to make all this information available under proposed directories, ".zfs/inodes" or something similar. (Again, considering that the ".zfs/inodes" directory would be sufficient for NFS, but some more information would be necessary to support CIFS, because CIFS, as far as I know, has no knowledge of inode numbers, and therefore cannot even begin to look for an inode under the .zfs/inodes directory.)
Peter Jeremy
2010-May-04 00:25 UTC
[zfs-discuss] ZFS snapshot versus Netapp - Security and convenience
On 2010-Apr-30 21:56:46 +0800, Edward Ned Harvey <solaris2 at nedharvey.com> wrote:>How many bytes long is an inode number? I couldn''t find that easily by >googling, so for the moment, I''ll guess it''s a fixed size, and I''ll guess >64bits (8 bytes).Based on a rummage in some header files, it looks like it''s 8 bytes.>How many bytes is that? Would it be exceptionally difficult to extend >and/or make variable?Extending inodes increases the amount of metadata associated with a file, which increases overheads for small files. It looks like a ZFS inode is currently 264 bytes, but is always stored with a dnode and currently has some free space. ZFS code assumes that the physical dnode (dnode+znode+some free space) is a fixed size and making it variable is likely to be quite difficult.>One important consideration in that hypothetical scenario would be >fragmentation. If every inode were fragmented in two, that would be a real >drag for performance. Perhaps every inode could be extended (for example) >32 bytes to accommodate a list of up to 4 parent inodes, but whenever the >number of parents exceeds 4, the inode itself gets fragmented to store a >variable list of parents.ACLs already do something like this. And having parent information stored away from the rest of the inode would not impact the normal inode access time since the parent information is not normally needed. On 2010-Apr-30 23:08:58 +0800, Edward Ned Harvey <solaris2 at nedharvey.com> wrote:>Therefore, it should be very easy to implement proof of concept, by writing >a setuid root C program, similar to "sudo" which could then become root, >identify the absolute path of a directory by its inode number, and then >print that absolute path, only if the real UID has permission to "ls" that >path.It doesn''t need to be setuid. Check out http://minnie.tuhs.org/cgi-bin/utree.pl?file=V6/usr/source/s2/pwd.c http://minnie.tuhs.org/cgi-bin/utree.pl?file=V7/usr/src/cmd/pwd.c (The latter is somewhat more readable)>While not trivial, it''s certainly possible to extend inodes of files, to >include parent pointers.This is a far more significant change and the utility is not clear.>Also not trivial, it''s certainly possible to make all this information >available under proposed directories, ".zfs/inodes" or something similar.HP Tru64 already does something like this. -- Peter Jeremy -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 196 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100504/06c9723a/attachment.bin>
Jason King
2010-May-04 02:03 UTC
[zfs-discuss] ZFS snapshot versus Netapp - Security and convenience
If you''re just wanting to do something like the netapp .snapshot (where it''s in every directory), I''d be curious if the CIFS shadow copy support might already have done a lot of the heavy lifting for this. That might be a good place to look On Mon, May 3, 2010 at 7:25 PM, Peter Jeremy <peter.jeremy at alcatel-lucent.com> wrote:> On 2010-Apr-30 21:56:46 +0800, Edward Ned Harvey <solaris2 at nedharvey.com> wrote: >>How many bytes long is an inode number? ?I couldn''t find that easily by >>googling, so for the moment, I''ll guess it''s a fixed size, and I''ll guess >>64bits (8 bytes). > > Based on a rummage in some header files, it looks like it''s 8 bytes. > >>How many bytes is that? ?Would it be exceptionally difficult to extend >>and/or make variable? > > Extending inodes increases the amount of metadata associated with a > file, which increases overheads for small files. ?It looks like a ZFS > inode is currently 264 bytes, but is always stored with a dnode and > currently has some free space. ?ZFS code assumes that the physical > dnode (dnode+znode+some free space) is a fixed size and making it > variable is likely to be quite difficult. > >>One important consideration in that hypothetical scenario would be >>fragmentation. ?If every inode were fragmented in two, that would be a real >>drag for performance. ?Perhaps every inode could be extended (for example) >>32 bytes to accommodate a list of up to 4 parent inodes, but whenever the >>number of parents exceeds 4, the inode itself gets fragmented to store a >>variable list of parents. > > ACLs already do something like this. ?And having parent information > stored away from the rest of the inode would not impact the normal > inode access time since the parent information is not normally needed. > > On 2010-Apr-30 23:08:58 +0800, Edward Ned Harvey <solaris2 at nedharvey.com> wrote: >>Therefore, it should be very easy to implement proof of concept, by writing >>a setuid root C program, similar to "sudo" which could then become root, >>identify the absolute path of a directory by its inode number, and then >>print that absolute path, only if the real UID has permission to "ls" that >>path. > > It doesn''t need to be setuid. ?Check out > http://minnie.tuhs.org/cgi-bin/utree.pl?file=V6/usr/source/s2/pwd.c > http://minnie.tuhs.org/cgi-bin/utree.pl?file=V7/usr/src/cmd/pwd.c > (The latter is somewhat more readable) > >>While not trivial, it''s certainly possible to extend inodes of files, to >>include parent pointers. > > This is a far more significant change and the utility is not clear. > >>Also not trivial, it''s certainly possible to make all this information >>available under proposed directories, ".zfs/inodes" or something similar. > > HP Tru64 already does something like this. > > -- > Peter Jeremy > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > >
Edward Ned Harvey
2010-May-04 03:13 UTC
[zfs-discuss] ZFS snapshot versus Netapp - Security and convenience
> From: Peter Jeremy [mailto:peter.jeremy at alcatel-lucent.com] > > >Therefore, it should be very easy to implement proof of concept, by > writing > >a setuid root C program, similar to "sudo" which could then become > root, > >identify the absolute path of a directory by its inode number, and > then > >print that absolute path, only if the real UID has permission to "ls" > that > >path. > > It doesn''t need to be setuid. Check out > http://minnie.tuhs.org/cgi-bin/utree.pl?file=V6/usr/source/s2/pwd.c > http://minnie.tuhs.org/cgi-bin/utree.pl?file=V7/usr/src/cmd/pwd.c > (The latter is somewhat more readable)The difference here is ... In pwd, you''ve already got a present working directory, ''.'' and therefore you never need to find any directory based on inode number. Suppose your pwd is /tank/foo/bar/baz. Suppose you want to locate all the snapshot versions of this directory. If you can safely assume /tank/.zfs/snapshot/*/foo/bar/baz then great, no problem. But if "foo" was formerly called "doo" or if baz was formerly a child of some other directory ... then ''.'' isn''t going to help you find the former snapshot version of that directory.> >While not trivial, it''s certainly possible to extend inodes of files, > to > >include parent pointers. > > This is a far more significant change and the utility is not clear.As you said, something like this is already done for ACL''s. I won''t say it''s trivial, because I simply don''t know how difficult it would be, but it''s certainly possible. The utility is clear: At present, every directory has a reference to its parent. At present, files do not have any reference to their parent(s). Therefore, even if you know the inode number of some file, there''s no clear or reliable way to find its parent(s) quickly. But if you are the kernel, and you want to find the path to some inode number of a directory on some device, all you need to do is follow the ''..'' entries to discover the path of that directory. The utility of parent reference(s) inside file inodes, is to have the ability to quickly identify the path(s) of any inode (file or directory) based on inode number. Without this parent reference in file inodes, you can only perform this reverse lookup on directory inode numbers.
Edward Ned Harvey
2010-May-04 03:16 UTC
[zfs-discuss] ZFS snapshot versus Netapp - Security and convenience
> From: jason.brian.king at gmail.com [mailto:jason.brian.king at gmail.com] On > Behalf Of Jason King > > If you''re just wanting to do something like the netapp .snapshot > (where it''s in every directory), I''d be curious if the CIFS shadow > copy support might already have done a lot of the heavy lifting for > this. That might be a good place to lookThis is a wonderful suggestion. Although I''m not happy with the GUI implementation of CIFS shadow copy, it certainly does seem that they would have to tackle a lot of the same issues. Heheheh. Not that I have any clue how to start answering that question. ;-)
Jason King
2010-May-04 03:24 UTC
[zfs-discuss] ZFS snapshot versus Netapp - Security and convenience
Well the GUI I think is just Windows, it''s all just APIs that are presented to windows. On Mon, May 3, 2010 at 10:16 PM, Edward Ned Harvey <solaris2 at nedharvey.com> wrote:>> From: jason.brian.king at gmail.com [mailto:jason.brian.king at gmail.com] On >> Behalf Of Jason King >> >> If you''re just wanting to do something like the netapp .snapshot >> (where it''s in every directory), I''d be curious if the CIFS shadow >> copy support might already have done a lot of the heavy lifting for >> this. That might be a good place to look > > This is a wonderful suggestion. ?Although I''m not happy with the GUI > implementation of CIFS shadow copy, it certainly does seem that they would > have to tackle a lot of the same issues. > > Heheheh. ?Not that I have any clue how to start answering that question. > ;-) > >