Hi, I''m a contributor of the Arch Linux package mkinitcpio-btrfs [1]. The goal of this hook is to provide Btrfs rollback support for root filesystems directly from initrd. Technically we are using a subvolume to store the root filesystem. The user can snapshot it entirely and boot from this snapshot. In case of rollback our hook snapshots the snapshot again, to keep its original unchanged. The boot subvolume is then set with ''btrfs subvolume set-default'' and mounted without subvol/subvolid option by Arch''s default mount handler. That way, we ensure the best compatibility and lowest maintenance, as we don''t overwrite default init functions. Assuming we have the following setup: # btrfs su li -p / ID 256 gen 86 parent 5 top level 5 path root ID 259 gen 86 parent 256 top level 256 path var ID 260 gen 86 parent 256 top level 256 path usr The use case for that is to set quotas for the child subvolumes. Now, if we snapshots the root subvolume, the child subvolumes are not snapshoted with it. There is no back reference which would allow Btrfs to auto-mount the original child subvolumes when we mount the snapshot as new root filesystem. Of cause we could snapshot the childs separately into their desired directories. But this would not help, because our hook snapshots the snapshot again, to keep it''s original untouched while rolling back. And we don''t have fstab to find out the correct mount points at this early boot stage. Atm. all scenarios results in /usr/bin/init not found. So here comes my question: Wouldn''t it be helpful to add a --recursive option to ''btrfs subvolume snapshot'' to snapshot child subvolumes together with their parent? Or maybe it is possible to add some functionality to reference the child subvolumes on the snapshots fs-tree to allow auto-mounting? I appreciate other ideas or opinions too. Thanks, Michael [1] https://aur.archlinux.org/packages/mkinitcpio-btrfs/ -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Nov 7, 2013, at 4:45 PM, Michael Göhler <visit@myjm.de> wrote:> The boot subvolume is then set with ''btrfs subvolume set-default'' and mounted without subvol/subvolid option by Arch''s default mount handler.I''m unconvinced it''s a good idea for it to be used behind the scenes for the described purpose. Consider the following: 1. btrfs subvolume set-default uses a user space tool and in my view it''s primarily a user domain behavior modifier for the mount command. 2. It makes the actual mounted subvolume obscured, cat /proc/self/mountinfo doesn''t show what subvolume is mounted unless subvol= is explicitly used in /etc/fstab. 3. Grub2 (and I''m pretty sure grubby) do not use the set-default. The GRUB intent for the prefix to search an absolute path, not one relative to the default subvolume. There''s a bug that should very recently (week) be fixed, where GRUB fails to find prefix if set-default is changed. This maybe isn''t affecting the particular layout you describe where only rootfs is on btrfs, rather than /boot being on its own subvolume.> That way, we ensure the best compatibility and lowest maintenance, as we don''t overwrite default init functions.I''m sympathetic to the alternative problem, which is that you need to alter grub.cfg to use the proper rootflags=subvol= to explicitly use the proper snapshot, and also it would mean altering the /etc/fstab within that snapshot.> > Now, if we snapshots the root subvolume, the child subvolumes are not snapshoted with it. There is no back reference which would allow Btrfs to auto-mount the original child subvolumes when we mount the snapshot as new root filesystem. Of cause we could snapshot the childs separately into their desired directories. But this would not help, because our hook snapshots the snapshot again, to keep it''s original untouched while rolling back. And we don''t have fstab to find out the correct mount points at this early boot stage.The fact of the matter is, we don''t have the necessary metadata support in btrfs to understand the relationships between snapshots/subvolumes. There is a need for this, not least of which is the use case you''re describing. This has come up with Fedora also for their offline updates rollback they want to eventually do. And it''s also an issue with distribution installers which see these snapshots as wholly unique instances of existing installations, rather than as related snapshots.> > Atm. all scenarios results in /usr/bin/init not found. > > So here comes my question: > Wouldn''t it be helpful to add a --recursive option to ''btrfs subvolume snapshot'' to snapshot child subvolumes together with their parent? > Or maybe it is possible to add some functionality to reference the child subvolumes on the snapshots fs-tree to allow auto-mounting?Well and it raises the problem of nested subvolumes making the parent subvolume undeletable. So I''d question the significant benefit of making nested subvolume in particular /var. It''s complicating how the OS is to be put back together again. Chris Murphy-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Michael Göhler posted on Fri, 08 Nov 2013 00:45:32 +0100 as excerpted:> The use case for that is to set quotas for the child subvolumes.Quite apart from the main thread subject, you''re aware that there are major bugs with btrfs quotas/qgroups ATM, right? I''d certainly be wary of depending on them for anything at the distribution level you''re discussing. The known quota bugs appear to be in two areas and may or may not be related. First, people have been complaining about huge and unaccounted memory usage on the order of multiple gigabytes while using quotas. I believe one related memory leak has recently been fixed (tho I''m not sure the fix is actually in mainline yet, it''s that new), but there could well be more. Second, people are reporting negative quota numbers. Apparently this is the result of deleting snapshots or turning off qgroups on some subvolumes/snapshots but not others and goes away when the quotas are fully rescanned, but that''s a time-intensive process on multi-terabyte spinning rust partitions. Again, there''s work being done to address the issue, but it''s nothing I''d want to rely on at this point. Given the above situation, I''d strongly suggest leaving the quota feature off on btrfs at this point, certainly at the distro level. If individuals want to test it that''s fine, as long as they know the risk (which given that btrfs itself remains officially experimental, is simply a known stronger area of the risk that continues to apply to testing btrfs as a whole), but I''d argue it''s inappropriate for distro level at this point. If quotas are dictated by the use-case, there are other more stable filesystem options available with stable quota support. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Chris thanks for taking the time.>> The boot subvolume is then set with ''btrfs subvolume set-default'' and >> mounted without subvol/subvolid option by Arch''s default mount >> handler. > > I''m unconvinced it''s a good idea for it to be used behind the scenes > for the described purpose. Consider the following: > > 1. btrfs subvolume set-default uses a user space tool and in my view > it''s primarily a user domain behavior modifier for the mount command.The only drawback I can imagine is that the initrd can have a different version of the btrfs command, as it isn''t repacked on package upgrades of btrfs-utils. But this isn''t related to set-default only, but also for the snapshot command. Of cause, if we find a better way to achieve our goal, we are willing to drop it.> 2. It makes the actual mounted subvolume obscured, cat > /proc/self/mountinfo doesn''t show what subvolume is mounted unless > subvol= is explicitly used in /etc/fstab.We use btrfs subvolume get-default to check on which subvolume we are. You are right, an average user will find it confusing. But if he does, he will also not look into the /proc filesystem. He will run mount without options to look for it, and have no luck because it isn''t displayed there too.> 3. Grub2 (and I''m pretty sure grubby) do not use the set-default. The > GRUB intent for the prefix to search an absolute path, not one > relative to the default subvolume. There''s a bug that should very > recently (week) be fixed, where GRUB fails to find prefix if > set-default is changed. This maybe isn''t affecting the particular > layout you describe where only rootfs is on btrfs, rather than /boot > being on its own subvolume.Actually 2.00.5086-1 which is latest on Arch is respecting set-default and searches for the kernel on the default subvolume without prefix and rootflags. Thanks for pointing this out, so I can watch out for the next grub update and fix the hook accordingly.> I''m sympathetic to the alternative problem, which is that you need to > alter grub.cfg to use the proper rootflags=subvol= to explicitly use > the proper snapshot, and also it would mean altering the /etc/fstab > within that snapshot.In my opinion, the whole set-default/subvolid= discussion is not relevant for my question (beside I appreciate it). Because the /etc/fstab is first read by init after switching to real root. So, if init itself is on a nested subvolume, /etc/fstab will not help at all.> The fact of the matter is, we don''t have the necessary metadata > support in btrfs to understand the relationships between > snapshots/subvolumes. There is a need for this, not least of which is > the use case you''re describing. This has come up with Fedora also for > their offline updates rollback they want to eventually do. And it''s > also an issue with distribution installers which see these snapshots > as wholly unique instances of existing installations, rather than as > related snapshots.[snip]>> So here comes my question: >> Wouldn''t it be helpful to add a --recursive option to ''btrfs subvolume >> snapshot'' to snapshot child subvolumes together with their parent? >> Or maybe it is possible to add some functionality to reference the >> child subvolumes on the snapshots fs-tree to allow auto-mounting? > > Well and it raises the problem of nested subvolumes making the parent > subvolume undeletable.We could make deletion recursive, the same as snapshotting, if that''s the only reason.> So I''d question the significant benefit of > making nested subvolume in particular /var. It''s complicating how the > OS is to be put back together again.Some apps are using the /var filesystem for caching and if they go nuts, your root filesystem runs out of space. Of cause you can question that. And I''m with you (at least for /usr). But this has been a common practice for years, and some of the oldschool sysadmins aren''t to convince on that ;-) To answer Duncans mail, the whole discussion on nested subvolumes is not relevant as long as quota support isn''t stable. But I still wanted to bring this up to maybe open a feature request. And maybe it could be relevant to other projects (like Fedora) too. Regards, Michael -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Nov 8, 2013, at 6:41 AM, Michael Göhler <visit@myjm.de> wrote:> Hi Chris > > thanks for taking the time. > >>> The boot subvolume is then set with ''btrfs subvolume set-default'' and mounted without subvol/subvolid option by Arch''s default mount handler. >> I''m unconvinced it''s a good idea for it to be used behind the scenes >> for the described purpose. Consider the following: >> 1. btrfs subvolume set-default uses a user space tool and in my view >> it''s primarily a user domain behavior modifier for the mount command. > The only drawback I can imagine is that the initrd can have a different version of the btrfs command, as it isn''t repacked on package upgrades of btrfs-utils. But this isn''t related to set-default only, but also for the snapshot command. Of cause, if we find a better way to achieve our goal, we are willing to drop it.Consider the muliboot scenario were I want opensuse to boot with snapshotX persistently, and I want Fedora to boot with snapshotY persistently. If both distributions claim the right to change the default subvolume this now breaks horribly because neither distribution actually knows which snapshot is to be booted because that knowledge is only stored as the default subvolume, the value of which isn''t preserved once changed.>> 2. It makes the actual mounted subvolume obscured, cat >> /proc/self/mountinfo doesn''t show what subvolume is mounted unless >> subvol= is explicitly used in /etc/fstab. > We use btrfs subvolume get-default to check on which subvolume we are.And what if that get-default presents you with a subvolume that isn''t yours? It sounds like a setup for distributions stepping on each other.> You are right, an average user will find it confusing. But if he does, he will also not look into the /proc filesystem. He will run mount without options to look for it, and have no luck because it isn''t displayed there too. > >> 3. Grub2 (and I''m pretty sure grubby) do not use the set-default. The >> GRUB intent for the prefix to search an absolute path, not one >> relative to the default subvolume. There''s a bug that should very >> recently (week) be fixed, where GRUB fails to find prefix if >> set-default is changed. This maybe isn''t affecting the particular >> layout you describe where only rootfs is on btrfs, rather than /boot >> being on its own subvolume. > Actually 2.00.5086-1 which is latest on Arch is respecting set-default and searches for the kernel on the default subvolume without prefix and rootflags. Thanks for pointing this out, so I can watch out for the next grub update and fix the hook accordingly.Maybe I''m not following what you mean by fix. If distributions were to have different GRUB behaviors in this respect, multiboot on Btrfs will be fundamentally broken: e.g. if one distribution expects GRUB to interpret prefix and rootflags as absolute paths from subvolid=5, and another distro interprets them as relative to the default subvolume.> >> I''m sympathetic to the alternative problem, which is that you need to >> alter grub.cfg to use the proper rootflags=subvol= to explicitly use >> the proper snapshot, and also it would mean altering the /etc/fstab >> within that snapshot. > In my opinion, the whole set-default/subvolid= discussion is not relevant for my question (beside I appreciate it). Because the /etc/fstab is first read by init after switching to real root. So, if init itself is on a nested subvolume, /etc/fstab will not help at all.I think this emphasizes that the necessary pieces aren''t actually present to enable booting from snapshots, it becomes very use case specific and effectively proprietary to a distribution.> Some apps are using the /var filesystem for caching and if they go nuts, your root filesystem runs out of space.By making /var a subvolume on a file system that also hosts a root subvolume, you''ve not changed the risk of /var causing rootfs to run out of space, unless you also implement quotas for /var. And I think that''s too risky to implement.> Of cause you can question that. And I''m with you (at least for /usr). But this has been a common practice for years, and some of the oldschool sysadmins aren''t to convince on that ;-)I don''t feel that implementing best practices that differs from oldschool convention requires convincing oldschoolers of anything. They can retroactively implement whatever they want if they wish. But today''s developers doing this new work should consider best practices, not old practices. And should consider newschoolers, not just oldschoolers, who will become confused by antiquated conventions that have little (or no) efficacy. Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html