Hello, We are using btrfs filesystems in our infrastructure and, at some point of time, they start refusing to create new subvolumes. Each file system is being quota initialized immediately after its creation (with "btrfs quota enable") and then all subfolders under the root directory are created as subvolumes (btrfs subvolume create). Over time, these subvolumes may also be deleted. What''s under subvolumes are just various files and directories, should not be related to this problem. After a while of using this setup, without any obvious steps to reproduce it, the filesystem goes into a state where the following happens: # btrfs subvolume create btrfs_mount/test_subvolume Create subvolume ''btrfs_mount/test_subvolume'' ERROR: cannot create subvolume - File exists In regards to data, the filesystem is pretty empty, it only has a single empty directory. I don''t know about the metadata, at this point. The problem goes away if we disable and re-enable the quota. It all seems to be some dead metadata lying around. Next are some facts about this. Since we found that it''s the ioctl call which returns EEXIST, the place to further track the problem down was into the kernel module, which assumes that the userspace tools are not generating the problem. Here is a high level traceback of the problem: ioctl.c:create_subvol() returns -EEXIST cgroup.c:btrfs_qgroup_inherit() returns -EEXIST qgroup.c:add_qgroup_item() returns -EEXIST ctree.c:btrfs_insert_empty_item() returns -EEXIST ctree.c:btrfs_search_slot() returns 0 ctree.c:key_search() returns 0 The problem appeared before our current kernel, which is a 3.8 version (along with Btrfs progs v0.19), however mounting an already broken filesystem in a 3.12 kernel (with Btrfs progs v0.20-rc1-358-g194aa4a) doesn''t do any better. Any thoughts on this? We can provide you with more information, if needed, even the broken filesystem itself. Cheers, Alin. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Nov 15, 2013 at 02:33:58PM +0000, Alin Dobre wrote:> We are using btrfs filesystems in our infrastructure and, at some > point of time, they start refusing to create new subvolumes. > > Each file system is being quota initialized immediately after its > creation (with "btrfs quota enable") and then all subfolders under > the root directory are created as subvolumes (btrfs subvolume > create). Over time, these subvolumes may also be deleted. What''s > under subvolumes are just various files and directories, should not > be related to this problem. > > After a while of using this setup, without any obvious steps to > reproduce it, the filesystem goes into a state where the following > happens: > # btrfs subvolume create btrfs_mount/test_subvolume > Create subvolume ''btrfs_mount/test_subvolume'' > ERROR: cannot create subvolume - File existsWe''ve had someone else with this kind of symptom (snapshot/subvol creation fails unexpectedly) on IRC recently. I don''t think they''ve got to the bottom of it yet, but the investigation is ongoing. I''ve cc''d Carey in on this, because he was the one trying to debug it. Hugo.> In regards to data, the filesystem is pretty empty, it only has a > single empty directory. I don''t know about the metadata, at this > point. > > The problem goes away if we disable and re-enable the quota. It all > seems to be some dead metadata lying around. > > Next are some facts about this. Since we found that it''s the ioctl > call which returns EEXIST, the place to further track the problem > down was into the kernel module, which assumes that the userspace > tools are not generating the problem. Here is a high level traceback > of the problem: > ioctl.c:create_subvol() returns -EEXIST > cgroup.c:btrfs_qgroup_inherit() returns -EEXIST > qgroup.c:add_qgroup_item() returns -EEXIST > ctree.c:btrfs_insert_empty_item() returns -EEXIST > ctree.c:btrfs_search_slot() returns 0 > ctree.c:key_search() returns 0 > > The problem appeared before our current kernel, which is a 3.8 > version (along with Btrfs progs v0.19), however mounting an already > broken filesystem in a 3.12 kernel (with Btrfs progs > v0.20-rc1-358-g194aa4a) doesn''t do any better. > > Any thoughts on this? We can provide you with more information, if > needed, even the broken filesystem itself.-- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk == PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- "What are we going to do tonight?" "The same thing we do --- every night, Pinky. Try to take over the world!"
Alin Dobre posted on Fri, 15 Nov 2013 14:33:58 +0000 as excerpted:> We are using btrfs filesystems in our infrastructure and, at some point > of time, they start refusing to create new subvolumes. > > Each file system is being quota initialized immediately after its > creationI''d suggest staying away from quotas on btrfs at this point. There''s something going on there that just doesn''t work correctly, and a lot of folks have been reporting quota-related bugs. So if your use-case needs quotes, try a different filesystem. If it doesn''t, consider disabling them for the time being (and reboot, since apparently some bits don''t fully disable until after a reboot). You can read the list for more detail. AFAIK there''s some very recent patches that address at least part of the problem, but they''re recent enough I don''t believe they''re in 3.12, and the 3.13 commit-window pull was just requested in the last 24 hours or so, so they''re likely not even in 3.13 yet. Even then, however, I''d test for awhile before relying on btrfs quotas, because I''m not sure if the new patches fix all the bugs or simply fix enough for the next batch to make their appearance. Meanwhile, it''s worth noting what both the btrfs kernel option and the wiki[1] say about btrfs state: btrfs is still an experimental filesystem. Don''t use it with data you can''t afford to lose (either make and test your backups to the point that you''re comfortable with the possibility of full btrfs loss should the worst occur, or use only scratch data you don''t care about losing in the first place), and DO keep updated, as btrfs development is quite rapid and every kernel includes fixes for known problems, some of which don''t get ported back to stable- series kernels. If you''re running 3.8, that means you''re missing the fixes in 3.9-3.12, four full release kernel series! There are reasons one may wish to run an older kernel, but in general they''re not compatible with the reasons one might have for running an experimental filesystem such as btrfs. Therefore, if you''re going to test btrfs, please try to run a current release kernel, now 3.12, if not the development kernel, now 3.13. (FWIW, while I personally do run development kernels, I prefer not to switch to them until rc2 or so, at which point hopefully the worst data-risk bugs will be addressed.) If you prefer to be more conservative and run older, more tested kernels, it''s quite likely that the still experimental btrfs doesn''t fit your use- case very well either, and you should probably be using something considered more stable. The same applies, tho to a bit lessor degree, to btrfs-progs. 0.19 is OLD. Even 0.20-rc1, the latest rc, is about a year old! Btrfs-progs development happens in branches that are only integrated to master once they''re considered stable enough for release. As a result and due to continued development, a build from a recent btrfs-progs git-master pull is always going to be the most stable and ideal testing version you can run. FWIW, here''s the version string from my btrfs-progs, updated a couple days ago (tho there''s one known bug in it, related to doing a mkfs.btrfs on a sub-GiB filesystem, patch posted to list but as I''ve not updated in a couple days I don''t know whether it''s in master yet). Btrfs v0.20-rc1-591-gc652e4e [1] https://btrfs.wiki.kernel.org -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Nov 15, 2013 at 9:27 AM, Hugo Mills <hugo@carfax.org.uk> wrote:> On Fri, Nov 15, 2013 at 02:33:58PM +0000, Alin Dobre wrote: >> We are using btrfs filesystems in our infrastructure and, at some >> point of time, they start refusing to create new subvolumes. >> >> Each file system is being quota initialized immediately after its >> creation (with "btrfs quota enable") and then all subfolders under >> the root directory are created as subvolumes (btrfs subvolume >> create). Over time, these subvolumes may also be deleted. What''s >> under subvolumes are just various files and directories, should not >> be related to this problem. >> >> After a while of using this setup, without any obvious steps to >> reproduce it, the filesystem goes into a state where the following >> happens: >> # btrfs subvolume create btrfs_mount/test_subvolume >> Create subvolume ''btrfs_mount/test_subvolume'' >> ERROR: cannot create subvolume - File exists > > We''ve had someone else with this kind of symptom (snapshot/subvol > creation fails unexpectedly) on IRC recently. I don''t think they''ve > got to the bottom of it yet, but the investigation is ongoing. I''ve > cc''d Carey in on this, because he was the one trying to debug it. > > Hugo. > >> In regards to data, the filesystem is pretty empty, it only has a >> single empty directory. I don''t know about the metadata, at this >> point. >> >> The problem goes away if we disable and re-enable the quota. It all >> seems to be some dead metadata lying around.And indeed, it turns out I did have quotas enabled, and disabling them restores the ability to create subvolumes. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Duncan posted on Fri, 15 Nov 2013 16:07:22 +0000 as excerpted:> FWIW, here''s the version string from my btrfs-progs, updated a couple > days ago (tho there''s one known bug in it, related to doing a mkfs.btrfs > on a sub-GiB filesystem, patch posted to list but as I''ve not updated in > a couple days I don''t know whether it''s in master yet). > > Btrfs v0.20-rc1-591-gc652e4eFWIW, that patch is in now. Btrfs v0.20-rc1-596-ge9ac73b commit e9ac73b441b1b05b57ce99be1aff02eac6929448 Author: Anand Jain <Anand.Jain@oracle.com> Date: Fri Nov 15 19:11:09 2013 +0800 btrfs-progs: for mixed group check opt before default raid profile is enforced This fixes the regression introduced with the patch btrfs-progs: avoid write to the disk before sure to create fs what happened with this patch is it missed the check to see if the user has the option set before pushing the defaults. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 16/11/13 18:57, Duncan wrote:> Duncan posted on Fri, 15 Nov 2013 16:07:22 +0000 as excerpted: > >> FWIW, here''s the version string from my btrfs-progs, updated a couple >> days ago (tho there''s one known bug in it, related to doing a mkfs.btrfs >> on a sub-GiB filesystem, patch posted to list but as I''ve not updated in >> a couple days I don''t know whether it''s in master yet). >> >> Btrfs v0.20-rc1-591-gc652e4e > > FWIW, that patch is in now.Thanks for all the info, Duncan. Cheers, Alin. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
It seems that the problem was that we didn''t delete the corresponding qgroup when deleting the subvolume, which was filling the metadata with unused information. Removing all the stale qgroups fixes the problem and allows subsequent subvolume creation without any quota disable/enable action. Also, we are now automatically deleting the corresponding qgroup after the subvolume is removed. Cheers, Alin. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Alin, On 11/28/2013 06:01 PM, Alin Dobre wrote:> It seems that the problem was that we didn''t delete the corresponding > qgroup when deleting the subvolume, which was filling the metadata > with unused information. Removing all the stale qgroups fixes the > problem and allows subsequent subvolume creation without any quota > disable/enable action. Also, we are now automatically deleting the > corresponding qgroup after the subvolume is removed.Until now, we won''t delete subvolume''s corresponding qgroup automactiallly yet. The main reason is: Deleting subvolume is ''async'', we still need to make qgroup accounting correct(differ ''rfer'' and ''excl''). In theory, we can only remove a qgroup safely when it''s referenced goes 0. This is the main reason why we don''t delete subvolume qgroup directly. And the point is deleting subvolume won''t walk the whole fs tree, and qgroup is dependent on this, now, deleting subvolume may break qgroup accounting.(though qgroup rescan can make it right, we can not rely on this so much). Anyway, though we don''t remove qgroup automactially, it should not affect subvolume creation This is because qgroup is corresponding subvolume, it will not be resued since subvolumeid is although going larger. Thanks, Wang> > Cheers, > Alin. > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html