Hello, A few weeks ago I replaced a ZFS backup system with one backed by btrfs. A script loops over a bunch of hosts rsyncing them to each their own subvolume. After each rsync I snapshot the "host-specific" subvolume. The "disk" is an iscsi disk that in my benchmarks performs roughly like a local raid with 2-3 SATA disks. It worked fine for about a week (~150 snapshots from ~20 sub volumes) before it "suddenly" exploded in disk io wait. Doing anything (in particular changes) on the file system is just insanely slow, rsync basically can''t complete (an rsync that should take 10-20 minutes takes 24 hours; I have a directory of 60k files I tried deleting and it''s deleting one file every few minutes, that sort of thing). I am using 3.8.2-206.fc18.x86_64 (Fedora 18). I tried rebooting, it doesn''t make a difference. As soon as I boot "[btrfs-cleaner]" and "[btrfs-transacti]" gets really busy. I wonder if it''s because I deleted a few snapshots at some point? The file system is mounted with "-o compress=zlib,noatime" # mount | grep tank /dev/sdc on /tank type btrfs (rw,noatime,seclabel,compress=zlib,space_cache,_netdev) I don''t recall mounting it with space_cache; though I don''t think that''s the default so I wonder if I did do that at some point. Could that be what''s messing me up? btrfs-cleaner stack: # cat /proc/1117/stack [<ffffffffa022598a>] btrfs_commit_transaction+0x36a/0xa70 [btrfs] [<ffffffffa022677f>] start_transaction+0x23f/0x460 [btrfs] [<ffffffffa0226cb8>] btrfs_start_transaction+0x18/0x20 [btrfs] [<ffffffffa021487f>] btrfs_drop_snapshot+0x3ef/0x5d0 [btrfs] [<ffffffffa0226e1f>] btrfs_clean_old_snapshots+0x9f/0x120 [btrfs] [<ffffffffa021eda9>] cleaner_kthread+0xa9/0x120 [btrfs] [<ffffffff81081f90>] kthread+0xc0/0xd0 [<ffffffff816584ac>] ret_from_fork+0x7c/0xb0 [<ffffffffffffffff>] 0xffffffffffffffff btrfs-transaction stack: # cat /proc/1118/stack [<ffffffffa0256b35>] btrfs_tree_read_lock+0x95/0x110 [btrfs] [<ffffffffa020033b>] btrfs_read_lock_root_node+0x3b/0x50 [btrfs] [<ffffffffa0205649>] btrfs_search_slot+0x3f9/0x7a0 [btrfs] [<ffffffffa020be5e>] lookup_inline_extent_backref+0x8e/0x4d0 [btrfs] [<ffffffffa020dd38>] __btrfs_free_extent+0xc8/0x870 [btrfs] [<ffffffffa0211f29>] run_clustered_refs+0x459/0xb50 [btrfs] [<ffffffffa0215e48>] btrfs_run_delayed_refs+0xc8/0x2f0 [btrfs] [<ffffffffa02256a6>] btrfs_commit_transaction+0x86/0xa70 [btrfs] [<ffffffffa021e7c5>] transaction_kthread+0x1a5/0x220 [btrfs] [<ffffffff81081f90>] kthread+0xc0/0xd0 [<ffffffff816584ac>] ret_from_fork+0x7c/0xb0 [<ffffffffffffffff>] 0xffffffffffffffff Thank you for reading this far. Any suggestions would be most appreciated! Ask -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Mar 21, 2013 at 1:56 PM, Ask Bjørn Hansen <ask@develooper.com> wrote:> Hello, > > A few weeks ago I replaced a ZFS backup system with one backed by btrfs. A script loops over a bunch of hosts rsyncing them to each their own subvolume. After each rsync I snapshot the "host-specific" subvolume. > > The "disk" is an iscsi disk that in my benchmarks performs roughly like a local raid with 2-3 SATA disks. > > It worked fine for about a week (~150 snapshots from ~20 sub volumes) before it "suddenly" exploded in disk io wait. Doing anything (in particular changes) on the file system is just insanely slow, rsync basically can''t complete (an rsync that should take 10-20 minutes takes 24 hours; I have a directory of 60k files I tried deleting and it''s deleting one file every few minutes, that sort of thing). > > I am using 3.8.2-206.fc18.x86_64 (Fedora 18). I tried rebooting, it doesn''t make a difference. As soon as I boot "[btrfs-cleaner]" and "[btrfs-transacti]" gets really busy. > > I wonder if it''s because I deleted a few snapshots at some point? > > The file system is mounted with "-o compress=zlib,noatime" > > # mount | grep tank > /dev/sdc on /tank type btrfs (rw,noatime,seclabel,compress=zlib,space_cache,_netdev) > > I don''t recall mounting it with space_cache; though I don''t think that''s the default so I wonder if I did do that at some point. Could that be what''s messing me up? > > btrfs-cleaner stack: > > # cat /proc/1117/stack > [<ffffffffa022598a>] btrfs_commit_transaction+0x36a/0xa70 [btrfs] > [<ffffffffa022677f>] start_transaction+0x23f/0x460 [btrfs] > [<ffffffffa0226cb8>] btrfs_start_transaction+0x18/0x20 [btrfs] > [<ffffffffa021487f>] btrfs_drop_snapshot+0x3ef/0x5d0 [btrfs] > [<ffffffffa0226e1f>] btrfs_clean_old_snapshots+0x9f/0x120 [btrfs] > [<ffffffffa021eda9>] cleaner_kthread+0xa9/0x120 [btrfs] > [<ffffffff81081f90>] kthread+0xc0/0xd0 > [<ffffffff816584ac>] ret_from_fork+0x7c/0xb0 > [<ffffffffffffffff>] 0xffffffffffffffff > > > btrfs-transaction stack: > > # cat /proc/1118/stack > [<ffffffffa0256b35>] btrfs_tree_read_lock+0x95/0x110 [btrfs] > [<ffffffffa020033b>] btrfs_read_lock_root_node+0x3b/0x50 [btrfs] > [<ffffffffa0205649>] btrfs_search_slot+0x3f9/0x7a0 [btrfs] > [<ffffffffa020be5e>] lookup_inline_extent_backref+0x8e/0x4d0 [btrfs] > [<ffffffffa020dd38>] __btrfs_free_extent+0xc8/0x870 [btrfs] > [<ffffffffa0211f29>] run_clustered_refs+0x459/0xb50 [btrfs] > [<ffffffffa0215e48>] btrfs_run_delayed_refs+0xc8/0x2f0 [btrfs] > [<ffffffffa02256a6>] btrfs_commit_transaction+0x86/0xa70 [btrfs] > [<ffffffffa021e7c5>] transaction_kthread+0x1a5/0x220 [btrfs] > [<ffffffff81081f90>] kthread+0xc0/0xd0 > [<ffffffff816584ac>] ret_from_fork+0x7c/0xb0 > [<ffffffffffffffff>] 0xffffffffffffffff > > > Thank you for reading this far. Any suggestions would be most appreciated! >The space_cache option is probably not the issue. As you''ve guessed, this gets activated by default. The cleaner runs to remove deleted snapshots. Responsiveness while the cleaner is running has been an issue that has come up, but it is usually just an inconvenience. I can''t recall hearing about a slowdown of this degree while the cleaner is running. I haven''t noticed many discussions on the Btrfs mailing list where Btrfs is used in the context of iSCSI, so you may be seeing new issues in your use case. If you can, it would be interesting to know how well the cleaner runs across iSCSI if nothing else is running. If you could delete a single snapshot, and make note of the space used before and after the cleaner finishes and the time required, this might help isolate the issue. As a work-around, I would suggest using a script to delete the files in the subvolume before removing the snapshot. This way, you will have more control over the priority given to the deletion process. Once the subvolume is empty, the cleaner usually runs much better. :) -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 21 Mar 2013 11:56:37 -0700 Ask Bjørn Hansen <ask@develooper.com> wrote:> Hello, > > A few weeks ago I replaced a ZFS backup system with one backed by btrfs. A script loops over a bunch of hosts rsyncing them to each their own subvolume. After each rsync I snapshot the "host-specific" subvolume. > > The "disk" is an iscsi disk that in my benchmarks performs roughly like a local raid with 2-3 SATA disks.I think you should re-verify if this is still the case. Maybe your block device performance suddenly plummeted for some other unrelated issue? The simplest test would be "hdparm -t /dev/sdc". Personally I use btrfs on top of an MD raid accessed over network via NBD (and AoE before), without any major issues. Though my workload is perhaps somewhat lighter than what you describe. -- With respect, Roman
On Mar 22, 2013, at 11:12 AM, Roman Mamedov <rm@romanrm.ru> wrote:>> The "disk" is an iscsi disk that in my benchmarks performs roughly like a local raid with 2-3 SATA disks. > > I think you should re-verify if this is still the case. Maybe your block > device performance suddenly plummeted for some other unrelated issue? > > The simplest test would be "hdparm -t /dev/sdc".If I boot without mounting the btrfs file system then I get: Timing buffered disk reads: 268 MB in 3.01 seconds = 89.14 MB/sec Timing buffered disk reads: 268 MB in 3.02 seconds = 88.86 MB/sec Timing buffered disk reads: 268 MB in 3.01 seconds = 89.18 MB/sec Timing buffered disk reads: 266 MB in 3.00 seconds = 88.59 MB/sec Timing buffered disk reads: 272 MB in 3.01 seconds = 90.26 MB/sec Timing buffered disk reads: 268 MB in 3.01 seconds = 88.99 MB/sec Timing buffered disk reads: 266 MB in 3.01 seconds = 88.48 MB/sec I also made a new LUN on the iscsi target and tested with bonnie++ on a fresh btrfs file system on that device and got reasonable results (mounted without compression to make sure that wouldn''t make it cheat on actual IO): Version 1.96 ------Sequential Output------ --Sequential Input- --Random- Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP gunnarr.bn.d 15456M 476 99 122194 13 42724 8 2327 95 113553 14 494.8 21 Latency 44109us 465ms 268ms 19309us 300ms 1161ms Version 1.96 ------Sequential Create------ --------Random Create-------- gunnarr.bn.dev -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 14550 57 +++++ +++ 20353 67 15617 61 +++++ +++ 19165 72 Latency 295us 566us 532us 373us 29us 561us For what it''s worth the iscsi target (a Synology box, some sort of linux) shows lower load running bonnie++ than when the other btrfs file system is mounted with the btrfs-cleaner process running (and nothing happening). Ask-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mar 22, 2013, at 10:37, Mitch Harder <mitch.harder@sabayonlinux.org> wrote:> If you can, it would be interesting to know how well the cleaner runs > across iSCSI if nothing else is running. If you could delete a single > snapshot, and make note of the space used before and after the cleaner > finishes and the time required, this might help isolate the issue.It seems to me like the file system has gotten corrupted in some way. I think I left out one bit of information that''s important: It''s been doing this for ~3 weeks and never finished even with the file system otherwise idle. I left it completely alone for about a week at some point thinking it''d recover but it didn''t. After a week I tried rebooting it and that didn''t make a difference, either. :-) I tried to delete a few more snapshots and the space used didn''t change. I''m checking by running ''btrfs file show'' # btrfs file show tank failed to read /dev/sr0 Label: ''tank'' uuid: 6df950f2-e3e7-4f07-913b-ee34157e28b3 Total devices 1 FS bytes used 477.67GB devid 1 size 1.17TB used 499.04GB path /dev/sdc I am using 3.8.2 from fedora 18 (3.8.3 on next reboot), but I''m happy to recompile and try any patches that might reveal what''s going on. Ask-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
I have a remarkably similar problem as Ask Bjørn Hansen. I have a 2TB btrfs filesystem on a SATA drive, on top of an msdos partition table, partition 1, without any form of RAID. I have been using it to make backups using rsync like this: 1) snapshot the previous backup subvolume 2) rsync into the new snapshot 3) If we are overwriting a previous backup "level", drop that previous subvolume and rename this one into it''s place. I keep 8 levels in tower-of-hanoi strategy. It''s the same backup script I''ve used for the last 5 years, except I removed the rsync --link-dest and added in the btrfs snapshotting, just because dropping a subvolume is so damn quick compared to deleting all the files. After about 400 GB of backups taken all in the same day, but having already dropped probably 12 or 13 subvolumes after snapshotting, I get the behavior. The behavior is: *) After mounting, things are usually ok for 20 seconds or so, then it starts *) The disk activity lite burns constantly *) btrfs-cleaner is fairly high up in ''top'' *) Read/writes to the device are painfully slow *) I cannot unmount the device, nor cleanly shutdown. *) Once the system froze hard, no mouse cursor movement, not even the reset button worked. I''m running linux 3.8.3 pulled from git, so no distribution mods or patches. Happy to run any tests. -Mike [not on this list, so need to be on the reply] -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Mar 21, 2013 at 11:56:37AM -0700, Ask Bjørn Hansen wrote:> A few weeks ago I replaced a ZFS backup system with one backed by > btrfs. A script loops over a bunch of hosts rsyncing them to each > their own subvolume. After each rsync I snapshot the "host-specific" > subvolume. > > The "disk" is an iscsi disk that in my benchmarks performs roughly > like a local raid with 2-3 SATA disks. > > It worked fine for about a week (~150 snapshots from ~20 sub volumes) > before it "suddenly" exploded in disk io wait. Doing anything (in > particular changes) on the file system is just insanely slow, rsync > basically can''t complete (an rsync that should take 10-20 minutes > takes 24 hours; I have a directory of 60k files I tried deleting and > it''s deleting one file every few minutes, that sort of thing).I''m seeing similar problem after a test that produces tons of snapshots and snap deletions at the same time. Accessing the directory (eg. via ls) containing the snapshots blocks for a long time. The contention point is a mutex of the directory entry, used for lookups on the ''ls'' side, and the snapshot deletion process holds the mutex as well with obvious consequences. The contention is multiplied by the number of snapshots waiting to be deleted and eagerly grabbing the mutex, making other waiters starve. You''ve observed this as deletion progressing very slowly and rsync blocked. That''s really annoying and I''m working towards fixing it.> I am using 3.8.2-206.fc18.x86_64 (Fedora 18). I tried rebooting, it > doesn''t make a difference. As soon as I boot "[btrfs-cleaner]" and > "[btrfs-transacti]" gets really busy. > > I wonder if it''s because I deleted a few snapshots at some point?Yes. The progress or performance impact depends on amount of data shared among the snapshots and used / free space fragmentation. david -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Mar 27, 2013 at 07:03:49PM +1300, Mike Dilger wrote:> I have been using it to make backups using rsync like this: > 1) snapshot the previous backup subvolume > 2) rsync into the new snapshot > 3) If we are overwriting a previous backup "level", drop that previous subvolume and rename this one into it''s place. > I keep 8 levels in tower-of-hanoi strategy. It''s the same backup > script I''ve used for the last 5 years, except I removed the rsync > --link-dest and added in the btrfs snapshotting, just because dropping > a subvolume is so damn quick compared to deleting all the files.It is very quick to mark the snapshot deleted (compare it to adding an item to todo vs actually doing it).> After about 400 GB of backups taken all in the same day, but having > already dropped probably 12 or 13 subvolumes after snapshotting, I get > the behavior.This implies lots of data to process during the cleaning phase.> The behavior is: > *) After mounting, things are usually ok for 20 seconds or so, then it startsUnless cleaner was active, it kicks at most after 30 seconds when the the regular transaction commit kicks in.> *) The disk activity lite burns constantly > *) btrfs-cleaner is fairly high up in ''top'' > *) Read/writes to the device are painfully slowCleaner doing its work.> *) I cannot unmount the device, nor cleanly shutdown.Known problem, the umount should be a bit more responsive with https://patchwork.kernel.org/patch/2256801/> *) Once the system froze hard, no mouse cursor movement, not even the reset button worked.It could be the BUG_ON in btrfs_clean_old_snapshots that could happen when there a transaction abort occurs and the filesystem turns RO -- also fixed by the patch above. david -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi David, On Fri, Mar 29, 2013 at 8:12 PM, David Sterba <dsterba@suse.cz> wrote:> On Thu, Mar 21, 2013 at 11:56:37AM -0700, Ask Bjørn Hansen wrote: >> A few weeks ago I replaced a ZFS backup system with one backed by >> btrfs. A script loops over a bunch of hosts rsyncing them to each >> their own subvolume. After each rsync I snapshot the "host-specific" >> subvolume. >> >> The "disk" is an iscsi disk that in my benchmarks performs roughly >> like a local raid with 2-3 SATA disks. >> >> It worked fine for about a week (~150 snapshots from ~20 sub volumes) >> before it "suddenly" exploded in disk io wait. Doing anything (in >> particular changes) on the file system is just insanely slow, rsync >> basically can''t complete (an rsync that should take 10-20 minutes >> takes 24 hours; I have a directory of 60k files I tried deleting and >> it''s deleting one file every few minutes, that sort of thing). > > I''m seeing similar problem after a test that produces tons of snapshots > and snap deletions at the same time. Accessing the directory (eg. via > ls) containing the snapshots blocks for a long time. > > The contention point is a mutex of the directory entry, used for lookups > on the ''ls'' side, and the snapshot deletion process holds the mutex as > well with obvious consequences. The contention is multiplied by the > number of snapshots waiting to be deleted and eagerly grabbing the > mutex, making other waiters starve.Can you pls clarify what mutex do you mean? Do you mean the dir->i_mutex, taken by btrfs_ioctl_snap_destroy()? If yes, then this mutex is held only while "adding a snap to todo deletion list", and not during snap deletion itself. Otherwise, I don''t see btrfs_drop_snapshot() locking any mutex, for example.> > You''ve observed this as deletion progressing very slowly and rsync > blocked. That''s really annoying and I''m working towards fixing it. > >> I am using 3.8.2-206.fc18.x86_64 (Fedora 18). I tried rebooting, it >> doesn''t make a difference. As soon as I boot "[btrfs-cleaner]" and >> "[btrfs-transacti]" gets really busy. >> >> I wonder if it''s because I deleted a few snapshots at some point? > > Yes. The progress or performance impact depends on amount of data shared > among the snapshots and used / free space fragmentation. > > david > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.htmlThanks, Alex. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html