Hi all, after the Chirs (Ball) email, I thought about a possible btrfs file-system layout, which may permit to snapshot the root and mount (if required) an old snapshot of the root. A btrfs file-system has the capability to be partitioned in subvolumes. Every subvolume has a name. The root of the btrfs file-system is itself a volume, called "." (dot). At the mount time using the option subvol= it is possible to mount a specific subvolume. If nothing is specified the subvol "." (dot) is mounted. My idea is that the root of the file-system has to be a subvolume of a btrfs file-system. Then the subvolume "." is mounted under a subdirectory and is used only for managing the snapshot of the root file-system. In order to be more clear I will use the following nomenclature: - root of the file-system (or fs root): the root of system. This directory contains /bin, /sbin, /usr... - root of a btrfs files-system (or btrfs root): the root of a btrfs file-system which contains the subvolumes. This is still a subvolume called "." (dot) Under the btrfs root a subvolume named "rootfs" is created, this will contain the fs root. Also the fs root snapshots are created under the btrfs root. In order to access the btrfs root and handle the fs root subvolume and its snapshots, the btrfs root is mounted under /var/run/btrfs. To mount the fs root, the option "subvol=rootfs" has to be used used (for example in the initramfs). # mount -t btrfs -o subvol=rootfs /dev/sdxx / To mount the btrfs root, the option "subvol=." has to be used . # mount -t btrfs -o subvol=. /dev/sdxx /var/run/btrfs Note 1) the "rootfs" volume is a portion of the real btrfs. Note 2) if only the "rootfs" volume is mounted is impossible to access to all data contained in the btrfs file-system. In order to access to the btrfs root, this has to be mounted under a sub-directory of the "rootfs" volume Note 3) the files contained in the "rootfs" volume appear two times: - under / (where the "rootfs" volume is mounted) - under /var/run/btrfs/rootfs (if the "." volume is mounted under /var/run/btrfs) Note 4) snapshotting a volume doesn''t affect the other volumes, even though these volumes are mounted or located under the volume snapshotted. This is a time for the ascii art: *Real* btrfs filesystem layout (or the volume called ".") / \ <--- "." volume rootfs \ <--- "rootfs" volume !- bin !- sbin !- etc !- usr !- [...] snap1 \ <---- 1st snapshot of the rootfs volume !- bin !- sbin !- etc !- usr !- [...] snap1 \ <---- 2st snapshot of the rootfs volume !- bin !- sbin !- etc !- usr !- [...] *Effective* file-system layout: / \ <--- "rootfs" volume !- bin !- sbin !- etc !- usr [...] !- var \ !- run \ ! - btrfs \ <--- "." volume ! - rootfs \ <--- "rootfs" volume(2nd time) !- bin !- sbin [...] ! - snap1 \ <--- 1st snapshot [...] ! - snap2 \ <--- 1st snapshot [...] Below I will show some use cases: 1) system install 2) fs root snapshotting 3) exchange the root with a old snapshot (a reboot is required) ** 1) System install root@host:/> mkfs.btrfs /dev/sdxx root@host:/> mount -t btrfs /dev/sdxx /media/btrfs-test root@host:/ cd /media/btrfs-test root@host:/media/btrfs-test> btrfsctl -S rootfs . root@host:/media/btrfs-test> ls -l drwx------ 1 root root 160 2009-11-20 17:22 rootfs root@host:/media/btrfs-test> cd rootfs root@host:/media/btrfs-test/rootfs> debootstrap sid . # install packages # under the rootfs volume [... config the system ...] root@host:/media/btrfs-test/rootfs> ls bin dev home lib64 media opt root selinux sys usr boot etc lib lost+found mnt proc sbin srv tmp var root@host:/media/btrfs-test/rootfs> mkdir /var/run/btrfs root@host:/media/btrfs-test/rootfs> cd / root@host:/> umount /media/btrfs.test root@host:/> mount -t btrfs -o subvol=rootfs /dev/sdxx /media/btrfs-test root@host:/> chroot /media/btrfs-test root@guest:/> ls / bin dev home lib64 media opt root selinux sys usr boot etc lib lost+found mnt proc sbin srv tmp var root@guest:/> mount -t btrfs -o subvol=. /dev/sdxx /var/run/btrfs root@guest:/> ls /var/run/btrfs rootfs root@guest:/> # to mount automatically the btrfs root under /var/run/btrfs root@guest:/> echo "/dev/sdxx /var/run/btrfs btrfs subvol=. 0 0" >>/etc/fstab ** 2) system snapshot root@guest:/> cd /var/run/btrfs root@guest:/var/run/btrfs> ls rootfs root@guest:/var/run/btrfs> btrfsctl -s snap-of-root /. root@guest:/var/run/btrfs> ls rootfs snap-of-root root@guest:/var/run/btrfs> ls snap-of-root bin dev home lib64 media opt root selinux sys usr boot etc lib lost+found mnt proc sbin srv tmp var root@guest:/var/run/btrfs> touch /root/old-root-witness root@guest:/var/run/btrfs> ls rootfs/root old-root-witness root@guest:/var/run/btrfs> ls snap-of-root/root root@guest:/var/run/btrfs> ** 3) exchange the fs root with an its (older) snapshot root@guest:/var/run/btrfs> ls /root old-root-witness root@guest:/var/run/btrfs> ls rootfs snap-of-root root@guest:/var/run/btrfs> mv rootfs rootfs-old root@guest:/var/run/btrfs> mv snap-of-root rootfs root@guest:/var/run/btrfs> ls rootfs rootfs-old root@guest:/var/run/btrfs> reboot [...] root@guest:/> ls root root@guest:/> ls /var/run/btrfs rootfs rootfs-old Any comments ? BR G.Baroncelli -- gpg key@ keyserver.linux.it: Goffredo Baroncelli (ghigo) <kreijackATinwind.it> Key fingerprint = 4769 7E51 5293 D36C 814E C054 BF04 F161 3DC5 0512 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Nov 20, 2009 at 12:50 PM, Goffredo Baroncelli <kreijack@gmail.com> wrote:> Any comments ? > BR > G.Baroncellisince COW semantics require touching directory entries all the way up to the root of the subvolume, for transaction-intensive applications it would make sense to provide something that works like "mkdir" but creates a subvolume that will be mounted at a point rather than a simple directory entry. To avoid contention serializing updates to the root directory. Is something like that already in place? -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Nov 20, 2009 at 01:24:42PM -0600, David Nicol wrote:> On Fri, Nov 20, 2009 at 12:50 PM, Goffredo Baroncelli > <kreijack@gmail.com> wrote: > > > Any comments ? > > BR > > G.Baroncelli > > > since COW semantics require touching directory entries all the way up > to the root of the subvolume, for transaction-intensive applications > it would make sense to provide something that works like "mkdir" but > creates a subvolume that will be mounted at a point rather than a > simple directory entry. To avoid contention serializing updates to the > root directory. Is something like that already in place?COW semantics require touching btree nodes all the way up to the root of the btree, but this is different from the directory. Directories are stored in the btree, but you won''t have to touch more than 8 or so btree levels regardless of how deep your directory tree is. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Nov 20, 2009 at 07:50:06PM +0100, Goffredo Baroncelli wrote:> Hi all, > > after the Chirs (Ball) email, I thought about a possible btrfs file-system > layout, which may permit to snapshot the root and mount (if required) an old > snapshot of the root.[ very clear description of a filesystem tree layout ]> > > Any comments ?This is definitely possible, but not strictly required. We''ll be able to create an ioctl (or mount option) that replaces the default subvolume (''.'' in your examples) pointer with a pointer to another subvolume, and an ioctl to delete the old root. Basically it will end up hiding the extra layer of indirection your proposal adds. This doesn''t mean your ideas were bad, my plan all along has been to leave this up to the distros to work out with the users, and give them enough tools that they have the flexibility to do what they need. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Nov 20, 2009 at 1:50 PM, Chris Mason <chris.mason@oracle.com> wrote:> COW semantics require touching btree nodes all the way up to the root of > the btree, but this is different from the directory. Directories are > stored in the btree, but you won''t have to touch more than 8 or so btree > levels regardless of how deep your directory tree is. > > -chrisThanks for straightening me out on that point. Still, 8 might be a lot. Regardless of the decoupling of btrees and directories, am I right in thinking that mounted subvolumes instead of directories would (1) reduce contention (2) reduce the number of levels touched since number of levels is a function of the number of fs entities in the volume, therefore (3) defining a file system entity that transparently becomes a mounted subvolume (by transparently I mean without an additional mount command) and (4) crafting a utility to streamline creation-and-implied-mounting of the entity type from #3 would make sense? -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Goffredo Baroncelli
2009-Nov-20 23:31 UTC
Re: [RFC] proposal for a btrfs filesystem layout
Hi Chris, On Friday 20 November 2009, Chris Mason wrote:> On Fri, Nov 20, 2009 at 07:50:06PM +0100, Goffredo Baroncelli wrote: > > Hi all, > > > > after the Chirs (Ball) email, I thought about a possible btrfs file-system > > layout, which may permit to snapshot the root and mount (if required) anold> > snapshot of the root. > > [ very clear description of a filesystem tree layout ] > > > > > > Any comments ? > > This is definitely possible, but not strictly required. We''ll be able > to create an ioctl (or mount option) that replaces the default subvolume > (''.'' in your examples) pointer with a pointer to another subvolume, and > an ioctl to delete the old root.That was my first thought... but so the risk is to implement with ioctl(s) commands (rename, delete, list ) that a) already exist in the VFS abstraction. b) refer to objects that are like "directories" (in fact the differences between a volume and a directory are very small)> Basically it will end up hiding the extra layer of indirection your > proposal adds.Yes, my idea introduces an extra layer, that a) is different from all other file-systems b) is not useful if you don''t use the snapshot at all And both a) and b) are not good point :((> This doesn''t mean your ideas were bad, my plan all along > has been to leave this up to the distros to work out with the users, and > give them enough tools that they have the flexibility to do what they > need.My concern is about the btrfs user interface. The biggest difficult that I had to learn the btrfs capabilities is its "user- interface". I have to admit to be not the smartest person, but I spent a lot of time in order to understand which was the difference between a btrfs subvolume creation and a "mkfs + mount". Finally I concluded that there no is difference (except the COW behaviour and other implementation detail). My impression was that in some area too often the VFS and btrfs do the same things. [*] The point is that if btrfs do the same things of VFS, this may be called as "flexibility". But the history has highlight that from a long term point of view is the orthogonality of the subsystems that leads to the flexibility of the system.. I definitively need to sleep. Now in Italy is deep night and it is too late to email about "orthogonality of the subsystems".. In any case, thank you Chris for your work for btrfs. And read my comment only as suggestion to improve btrfs.> -chrisGoffredo [*] May be that my confusion is due to the fact that when btrfs creates a subvolume, two different actions were performed - a new subvolume is create (and definitely this is a btrfs competence) - the new subvolume is mounted in a new directory (and I think that this is a VFS competence). That raise me a question: what about to separate the subvolume creation from the subvolume mounting ? That definitely put the two kind of objects (directories and subvolume) in two different name-spaces: Btrfs will be responsible to create subvolume (and handle the COW/snapshot semantic), and the mounting will be responsible to the VFS. -- gpg key@ keyserver.linux.it: Goffredo Baroncelli (ghigo) <kreijackATinwind.it> Key fingerprint = 4769 7E51 5293 D36C 814E C054 BF04 F161 3DC5 0512 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Nov 20, 2009 at 02:05:11PM -0600, David Nicol wrote:> On Fri, Nov 20, 2009 at 1:50 PM, Chris Mason <chris.mason@oracle.com> wrote: > > COW semantics require touching btree nodes all the way up to the root of > > the btree, but this is different from the directory. Directories are > > stored in the btree, but you won''t have to touch more than 8 or so btree > > levels regardless of how deep your directory tree is. > > > > -chris > > Thanks for straightening me out on that point. > > Still, 8 might be a lot. > > Regardless of the decoupling of btrees and directories, am I right in > thinking that mounted subvolumes instead of directories would (1) > reduce contentionIt might, it depends on the workload. But yes, one point of big contention is the root node of the btree and each subvolume has its own root.> (2) reduce the number of levels touched since number > of levels is a function of the number of fs entities in the volume, > thereforeIt depends on the overall btree size. Probably.> (3) defining a file system entity that transparently becomes > a mounted subvolume (by transparently I mean without an additional > mount command) and (4) crafting a utility to streamline > creation-and-implied-mounting of the entity type from #3 would make > sense?Sure. It definitely makes sense to explore the subvolume and snapshotting user interfaces. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, Nov 21, 2009 at 12:31:06AM +0100, Goffredo Baroncelli wrote:> Hi Chris, > > On Friday 20 November 2009, Chris Mason wrote: > > On Fri, Nov 20, 2009 at 07:50:06PM +0100, Goffredo Baroncelli wrote: > > > Hi all, > > > > > > after the Chirs (Ball) email, I thought about a possible btrfs file-system > > > layout, which may permit to snapshot the root and mount (if required) an > old > > > snapshot of the root. > > > > [ very clear description of a filesystem tree layout ] > > > > > > > > > Any comments ? > > > > This is definitely possible, but not strictly required. We''ll be able > > to create an ioctl (or mount option) that replaces the default subvolume > > (''.'' in your examples) pointer with a pointer to another subvolume, and > > an ioctl to delete the old root. > > That was my first thought... but so the risk is to implement with ioctl(s) > commands (rename, delete, list ) that > a) already exist in the VFS abstraction. > b) refer to objects that are like "directories" (in fact the differences > between a volume and a directory are very small)The ''default'' subvolume actually does live in a directory, just one you can''t see without the ioctl ;)> > > Basically it will end up hiding the extra layer of indirection your > > proposal adds. > > Yes, my idea introduces an extra layer, that > a) is different from all other file-systems > b) is not useful if you don''t use the snapshot at all > And both a) and b) are not good point :(( > > > This doesn''t mean your ideas were bad, my plan all along > > has been to leave this up to the distros to work out with the users, and > > give them enough tools that they have the flexibility to do what they > > need. > > My concern is about the btrfs user interface. > The biggest difficult that I had to learn the btrfs capabilities is its "user- > interface". I have to admit to be not the smartest person, but I spent a lot > of time in order to understand which was the difference between a btrfs > subvolume creation and a "mkfs + mount". > Finally I concluded that there no is difference (except the COW behaviour and > other implementation detail). My impression was that in some area too often > the VFS and btrfs do the same things. [*] > The point is that if btrfs do the same things of VFS, this may be called as > "flexibility". > But the history has highlight that from a long term point of view is the > orthogonality of the subsystems that leads to the flexibility of the system..Well, btrfs is using the VFS to expose the subvolumes. Basically a subvolume is a special directory. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Goffredo Baroncelli
2009-Nov-24 18:27 UTC
Re: [RFC] proposal for a btrfs filesystem layout
Chris Mason wrote:> On Sat, Nov 21, 2009 at 12:31:06AM +0100, Goffredo Baroncelli wrote:[...]> > My concern is about the btrfs user interface. > > The biggest difficult that I had to learn the btrfs capabilities is its "user- > > interface". I have to admit to be not the smartest person, but I spent a lot > > of time in order to understand which was the difference between a btrfs > > subvolume creation and a "mkfs + mount". > > Finally I concluded that there no is difference (except the COW behaviour and > > other implementation detail). My impression was that in some area too often > > the VFS and btrfs do the same things. [*] > > The point is that if btrfs do the same things of VFS, this may be called as > > "flexibility". > > But the history has highlight that from a long term point of view is the > > orthogonality of the subsystems that leads to the flexibility of the system.. > > Well, btrfs is using the VFS to expose the subvolumes. Basically a > subvolume is a special directory.Let me explain better: what I would say was to expose the sub-volume content *only* with a command like "mount -o subvol=<name>". To day when I create a snapshot, automatically it is placed in the btrfs filesystem with the snapshot name. IIRC when I move (rename) the directory I change the subvolume/snapshot name also. Yes, I can remount the sub-volume with the mount command anywhere and with an arbitrary name. In fact the thing that seems strange to me is that when I create a snapshot, immediately it is mounted: it is not a real problem, it is only a strange behaviour. The "standard" behaviour is to create the file-system and then mount it: two separate actions.> -chrisGoffredo -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html