At work we have a backup server system running btrfs. The main backup pool is a 1.3TB partition (on LVM). Every night, a series of backups are done with rsync, with each backup onto a separate subvolume, and a snapshot made of each backup. Compression is used to minimize disk space used. When the system was first provisioned, all the operations ran impressively fast, but now it is almost unusably slow. It has taken 2 days to delete snapshots that recovered 30GB of space, and some of the backups are taking longer than 24 hours to complete. Is there any way to monitor the progress of the snapshot deletions? What could be causing it to run so slow? What could we do to speed things up? Kernel is 2.6.39-gentoo-r3. It was last rebooted 2 days ago, and rebooting does not seem to improve the situation. Mount options according to /proc are "rw,nosuid,noatime,nodiratime,compress=zlib,noacl" btrfs filesystem df shows: Data: total=1.15TB, used=1.10TB System, DUP: total=8.00MB, used=184.00KB System: total=4.00MB, used=0.00 Metadata, DUP: total=49.38GB, used=31.59GB Metadata: total=8.00MB, used=0.00 -- Bruce Guenter <bruce@untroubled.org> http://untroubled.org/
Excerpts from Bruce Guenter''s message of 2011-07-28 16:04:45 -0400:> > At work we have a backup server system running btrfs. The main backup > pool is a 1.3TB partition (on LVM). Every night, a series of backups > are done with rsync, with each backup onto a separate subvolume, and a > snapshot made of each backup. Compression is used to minimize disk > space used. > > When the system was first provisioned, all the operations ran > impressively fast, but now it is almost unusably slow. It has taken 2 > days to delete snapshots that recovered 30GB of space, and some of the > backups are taking longer than 24 hours to complete. > > Is there any way to monitor the progress of the snapshot deletions? > > What could be causing it to run so slow? > > What could we do to speed things up? > > Kernel is 2.6.39-gentoo-r3. It was last rebooted 2 days ago, and > rebooting does not seem to improve the situation. Mount options > according to /proc are "rw,nosuid,noatime,nodiratime,compress=zlib,noacl" > btrfs filesystem df shows: > > Data: total=1.15TB, used=1.10TB > System, DUP: total=8.00MB, used=184.00KB > System: total=4.00MB, used=0.00 > Metadata, DUP: total=49.38GB, used=31.59GB > Metadata: total=8.00MB, used=0.00 >The slow performance is probably coming from reading in the metadata associated with the snapshot extents. The new readahead extentions from Arne should help once we''ve adapted them to it. The easiest way to make sure is to hit sysrq-w a few times while it is slow (or run a patched latencytop). -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Excerpts from Chris Mason''s message of 2011-07-28 16:28:04 -0400:> Excerpts from Bruce Guenter''s message of 2011-07-28 16:04:45 -0400: > > > > At work we have a backup server system running btrfs. The main backup > > pool is a 1.3TB partition (on LVM). Every night, a series of backups > > are done with rsync, with each backup onto a separate subvolume, and a > > snapshot made of each backup. Compression is used to minimize disk > > space used. > > > > When the system was first provisioned, all the operations ran > > impressively fast, but now it is almost unusably slow. It has taken 2 > > days to delete snapshots that recovered 30GB of space, and some of the > > backups are taking longer than 24 hours to complete. > > > > Is there any way to monitor the progress of the snapshot deletions? > > > > What could be causing it to run so slow? > > > > What could we do to speed things up? > > > > Kernel is 2.6.39-gentoo-r3. It was last rebooted 2 days ago, and > > rebooting does not seem to improve the situation. Mount options > > according to /proc are "rw,nosuid,noatime,nodiratime,compress=zlib,noacl" > > btrfs filesystem df shows: > > > > Data: total=1.15TB, used=1.10TB > > System, DUP: total=8.00MB, used=184.00KB > > System: total=4.00MB, used=0.00 > > Metadata, DUP: total=49.38GB, used=31.59GB > > Metadata: total=8.00MB, used=0.00 > > > > The slow performance is probably coming from reading in the metadata > associated with the snapshot extents. The new readahead extentions from > Arne should help once we''ve adapted them to it. The easiest way to make > sure is to hit sysrq-w a few times while it is slow (or run a patched > latencytop).Sorry, hit send too soon. Here is the latencytop patch, after you recompile run latencytop -c for a few minutes. Send the output here. http://oss.oracle.com/~mason/latencytop.patch -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Jul 28, 2011 at 04:28:04PM -0400, Chris Mason wrote:> The slow performance is probably coming from reading in the metadata > associated with the snapshot extents. The new readahead extentions from > Arne should help once we''ve adapted them to it. The easiest way to make > sure is to hit sysrq-w a few times while it is slow (or run a patched > latencytop).We shut down the rsync backups for now, and are waiting for the snapshots to complete deletion so we can unmount the filesystem to reboot (missing kernel options for latencytop). In the mean time I hit SysRq-W quite a number of times, and nothing is showing up in kernel messages. However, btrfs-cleaner and/or btrfs-transaction (sometimes both, sometimes just one) have been running high in top, taking up all the CPU time. -- Bruce Guenter <bruce@untroubled.org> http://untroubled.org/
On Mon, Aug 1, 2011 at 4:26 PM, Bruce Guenter <bruce@untroubled.org> wrote:> On Thu, Jul 28, 2011 at 04:28:04PM -0400, Chris Mason wrote: >> The slow performance is probably coming from reading in the metadata >> associated with the snapshot extents. The new readahead extentions from >> Arne should help once we''ve adapted them to it. The easiest way to make >> sure is to hit sysrq-w a few times while it is slow (or run a patched >> latencytop). > > We shut down the rsync backups for now, and are waiting for the > snapshots to complete deletion so we can unmount the filesystem to > reboot (missing kernel options for latencytop). In the mean time I hit > SysRq-W quite a number of times, and nothing is showing up in kernelThe keystroke should be alt-sysrq-w.> messages. However, btrfs-cleaner and/or btrfs-transaction (sometimes > both, sometimes just one) have been running high in top, taking up all > the CPU time.-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Aug 01, 2011 at 04:41:10PM -0600, cwillu wrote:> The keystroke should be alt-sysrq-w.Right. What I meant is that no backtraces are showing up. I was actually using echo w > /proc/sysrq-trigger and all I was seeing is: SysRq : Show Blocked State task PC stack pid father -- Bruce Guenter <bruce@untroubled.org> http://untroubled.org/
Excerpts from Bruce Guenter''s message of 2011-08-01 18:59:15 -0400:> On Mon, Aug 01, 2011 at 04:41:10PM -0600, cwillu wrote: > > The keystroke should be alt-sysrq-w. > > Right. What I meant is that no backtraces are showing up. I was > actually using > > echo w > /proc/sysrq-trigger > > and all I was seeing is: > > SysRq : Show Blocked State > task PC stack pid father ># once per boot, run dmesg -n8 dmesg -n8 echo w > /proc/sysrq-trigger -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Jul 28, 2011 at 04:51:24PM -0400, Chris Mason wrote:> Sorry, hit send too soon. Here is the latencytop patch, after you > recompile run latencytop -c for a few minutes. Send the output here. > > http://oss.oracle.com/~mason/latencytop.patchOk, I ran it for more than a few minutes. Early on (shortly after boot), the output shows this for a long time: =============== Mon Aug 1 21:09:30 2011 Globals: Cause Maximum Percentage Process details: Process ksoftirqd/0 (3) Total: 3.9 msec [run_ksoftirqd] 3.9 msec 100.0 % run_ksoftirqd kthread kernel_thread_helper Process kworker/1:1 (983) Total: 31.2 msec . 3.9 msec 100.0 % worker_thread kthread kernel_thread_helper Process sshd (3948) Total: 3.9 msec Waiting for event (select) 3.9 msec 100.0 % poll_schedule_timeout do_select core_sys_select sys_select system_call_fastpath Process btrfs-cleaner (4345) Total: 2433.6 msec [sleep_on_page] 31.3 msec 100.0 % sleep_on_page wait_on_page_bit read_extent_buffer_pages btree_read_extent_buffer_pages read_tree_block read_block_for_search btrfs_search_slot lookup_inline_extent_backref __btrfs_free_extent run_clustered_refs btrfs_run_delayed_refs __btrfs_end_transaction Process btrfs-transacti (4346) Total: 2437.5 msec [sleep_on_page] 27.3 msec 100.0 % sleep_on_page wait_on_page_bit read_extent_buffer_pages btree_read_extent_buffer_pages read_tree_block read_block_for_search btrfs_search_slot lookup_inline_extent_backref __btrfs_free_extent run_clustered_refs btrfs_run_delayed_refs btrfs_commit_transaction This repeats identically (all numbers and backtrace is very stable) for a long time. Much later, I see this: =============== Tue Aug 2 13:46:29 2011 Globals: Cause Maximum Percentage Process details: Process btrfs-submit-0 (4332) Total: 199.2 msec Creating block layer request 58.6 msec 100.0 % get_request_wait __make_request generic_make_request submit_bio run_scheduled_bios pending_bios_fn worker_loop kthread kernel_thread_helper Process btrfs-cleaner (4345) Total: 3.9 msec Creating block layer request 3.9 msec 100.0 % get_request_wait __make_request generic_make_request submit_bio btrfs_map_bio btree_submit_bio_hook submit_one_bio read_extent_buffer_pages readahead_tree_block reada_walk_down do_walk_down walk_down_tree Process btrfs-endio-met (4700) Total: 35.2 msec [worker_loop] 3.9 msec 100.0 % worker_loop kthread kernel_thread_helper Process kworker/1:2 (26315) Total: 50.8 msec . 3.9 msec 100.0 % worker_thread kthread kernel_thread_helper [...snip...] =============== Thu Aug 4 12:34:08 2011 Globals: Cause Maximum Percentage Process details: Process btrfs-submit-0 (4332) Total: 199.2 msec Creating block layer request 58.6 msec 100.0 % get_request_wait __make_request generic_make_request submit_bio run_scheduled_bios pending_bios_fn worker_loop kthread kernel_thread_helper Process btrfs-cleaner (4345) Total: 3.9 msec Creating block layer request 3.9 msec 100.0 % get_request_wait __make_request generic_make_request submit_bio btrfs_map_bio btree_submit_bio_hook submit_one_bio read_extent_buffer_pages readahead_tree_block reada_walk_down do_walk_down walk_down_tree Process btrfs-endio-met (4700) Total: 35.2 msec [worker_loop] 3.9 msec 100.0 % worker_loop kthread kernel_thread_helper -- Bruce Guenter <bruce@untroubled.org> http://untroubled.org/
On Thu, Aug 04, 2011 at 10:35:36AM -0600, Bruce Guenter wrote:> Ok, I ran it for more than a few minutes. Early on (shortly after boot), the output shows this for a long time:Does anybody have any ideas what might be going on? -- Bruce Guenter <bruce@untroubled.org> http://untroubled.org/
On Thu, Aug 11, 2011 at 09:04:56AM -0600, Bruce Guenter wrote:> On Thu, Aug 04, 2011 at 10:35:36AM -0600, Bruce Guenter wrote: > > Ok, I ran it for more than a few minutes. Early on (shortly after boot), the output shows this for a long time: > > Does anybody have any ideas what might be going on?By chance, does "vmstat 1" or top show a high system cpu% while this is happening, and if so, does cat /proc/<pid>/stack of the PIDs in R state sometimes show things under btrfs_run_delayed_refs like setup_cluster_no_bitmap() and rb_next()? I haven''t been able to build a 2.6.39 to verify yet, but 2.6.38.5 does not show the problem for some reason. This is on a backup server, too, with 20 3 TB btrfs volumes. You should also be able to see a stack trace in this case (NOT the uninterruptible sleep case) with echo l > /rpoc/sysrq-trigger, or use oprofile or "perf top", which is what I am using. I see output such as: samples pcnt function DSO _______ _____ __________________________________ ______________ 13537.00 59.2% rb_next [kernel] 3539.00 15.5% _raw_spin_lock [kernel] 1668.00 7.3% setup_cluster_no_bitmap [kernel] 799.00 3.5% tree_search_offset [kernel] 476.00 2.1% fill_window [kernel] and and perf record -g; perf report --stdio, which gives me: 47.60% btrfs-transacti [kernel.kallsyms] [k] rb_next | --- rb_next | |--97.36%-- setup_cluster_no_bitmap | btrfs_find_space_cluster | find_free_extent | btrfs_reserve_extent | btrfs_alloc_free_block | __btrfs_cow_block | btrfs_cow_block | | | |--99.95%-- btrfs_search_slot | | | | | |--55.01%-- lookup_inline_extent_backref | | | | | | | |--97.17%-- __btrfs_free_extent | | | | run_clustered_refs | | | | btrfs_run_delayed_refs | | | | | | | | | |--88.72%-- btrfs_commit_transaction | | | | | transaction_kthread At one point, I rebooted the box without unmounting, and mounting of the 60 TB of btrfs volumes never actually finished within 8 hours. It seemed to get stuck on one of them about half way through, constantly writing (slowly) and taking lots of CPU. This is just from mounting, not any use. Booting back into 2.6.38.5 was fine. The kernel giving problems was 3.0, and now 3.1-rc1. Before, when it actually was mounted, "sync" (with nothing else dirtying pages) also took 4 minutes to run and pegged the CPU with a profile similar to the above. I tried started trying to do a bisection but tons of other things have changed and it takes quite a while to verify each pass. It also doesn''t help that I can''t seem to reproduce it on any other non-production boxes. Simon- -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Aug 11, 2011 at 05:40:36PM -0700, Simon Kirby wrote:> By chance, does "vmstat 1" or top show a high system cpu% while this is > happening,Absolutely, usually around 80-90%> and if so, does cat /proc/<pid>/stack of the PIDs in R state > sometimes show things under btrfs_run_delayed_refs like > setup_cluster_no_bitmap() and rb_next()?No, I am not seeing what you describe. The stack traces are usually showing what I originally quoted. I don''t know if I''ve seen setup_cluster_no_bitmap or rb_* at all. -- Bruce Guenter <bruce@untroubled.org> http://untroubled.org/
On Fri, Aug 12, 2011 at 12:21:44PM -0600, Bruce Guenter wrote:> On Thu, Aug 11, 2011 at 05:40:36PM -0700, Simon Kirby wrote: > > By chance, does "vmstat 1" or top show a high system cpu% while this is > > happening, > > Absolutely, usually around 80-90%...which is perhaps why you had difficult obtaining traces with sysrq-w, since most of the time there were no D-state processes. Instead, sysrq-l give a backtrace of each running CPU.> > and if so, does cat /proc/<pid>/stack of the PIDs in R state > > sometimes show things under btrfs_run_delayed_refs like > > setup_cluster_no_bitmap() and rb_next()? > > No, I am not seeing what you describe. The stack traces are usually > showing what I originally quoted. I don''t know if I''ve seen > setup_cluster_no_bitmap or rb_* at all.Ok, oh well. :) Simon- -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html