thr3ads.net - Btrfs devel - Performance degradation [Sep 2010]

If this information is useful, please help other people find it:
Share via:

Dave Cundiff

2010-Sep-20 21:24 UTC

Performance degradation

Hi,

I have a btrfs backup system that has been running great. Its been
backing up 6 systems with 500-750 gigs of data using Rsync and
snapshots. Every night I snapshot the backup and overwrite it.
Recently it has been unable to complete the backups. Its running soooo
slow that it can barely finish 1 backup a day. When I first started
backing up the systems they would all complete in at the most 8 hours.
I noticed it started happening after I deleted a large number of
snapshots. This caused the cleaner process to run for a few days.
After it finished the filesystem has still been to slow to use.

I''m running kernel 2.6.36-rc4.

Filesystem was built with -m single -d single and using the compress
mount option.

Block device is an Areca RAID5 with 4 Western Digital 2TB drives.

-- 
Dave Cundiff
System Administrator
A2Hosting, Inc
http://www.a2hosting.com
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

K. Richard Pixley

2010-Sep-21 01:19 UTC

head link

Re: Performance degradation

Do you have any kernel processes running constantly?  Any taking near 100% of a
CPU?

--rich

On Sep 20, 2010, at 14:24, Dave Cundiff <syshackmin@gmail.com> wrote:
> Hi,
> 
> I have a btrfs backup system that has been running great. Its been
> backing up 6 systems with 500-750 gigs of data using Rsync and
> snapshots. Every night I snapshot the backup and overwrite it.
> Recently it has been unable to complete the backups. Its running soooo
> slow that it can barely finish 1 backup a day. When I first started
> backing up the systems they would all complete in at the most 8 hours.
> I noticed it started happening after I deleted a large number of
> snapshots. This caused the cleaner process to run for a few days.
> After it finished the filesystem has still been to slow to use.
> 
> I''m running kernel 2.6.36-rc4.
> 
> Filesystem was built with -m single -d single and using the compress
> mount option.
> 
> Block device is an Areca RAID5 with 4 Western Digital 2TB drives.
> 
> -- 
> Dave Cundiff
> System Administrator
> A2Hosting, Inc
> http://www.a2hosting.com
> --
> To unsubscribe from this list: send the line "unsubscribe
linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Dave Cundiff

2010-Sep-21 04:27 UTC

head link

Re: Performance degradation

On Mon, Sep 20, 2010 at 9:19 PM, K. Richard Pixley <rich@noir.com>
wrote:> Do you have any kernel processes running constantly?  Any taking near 100%
of a CPU?
>
> --rich
>
Now that I look again, it seems that the btrfs-cleaner process is
still running. CPU isn''t much of an issue on this box. Its a dual quad
E5620. 8 cores, 16 threads.
Looks like most of my load is wait from the cleaner process hammering
the disks. I deleted about 10 snapshots I think. Should this take 4
days to clean up from that?

Cpu0  :  0.0%us,  0.2%sy,  0.0%ni, 85.1%id, 14.7%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu1  :  0.0%us,  0.4%sy,  0.0%ni, 66.9%id, 32.6%wa,  0.0%hi,  0.1%si,  0.0%st
Cpu2  :  0.0%us,  0.2%sy,  0.0%ni, 88.7%id, 11.1%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu3  :  0.0%us,  0.1%sy,  0.0%ni, 96.7%id,  3.2%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu4  :  0.0%us,  0.2%sy,  0.0%ni, 92.3%id,  7.5%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu5  :  0.0%us,  0.1%sy,  0.0%ni, 97.6%id,  2.4%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu6  :  0.0%us,  0.1%sy,  0.0%ni, 98.5%id,  1.5%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu7  :  0.0%us,  0.0%sy,  0.0%ni, 99.8%id,  0.2%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu8  :  0.0%us,  0.1%sy,  0.0%ni, 95.7%id,  4.3%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu9  :  0.0%us,  1.6%sy,  0.0%ni, 30.2%id, 68.2%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu10 :  0.0%us,  0.0%sy,  0.0%ni, 95.1%id,  4.8%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu11 :  0.0%us,  0.0%sy,  0.0%ni, 99.6%id,  0.4%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu12 :  0.0%us,  0.4%sy,  0.0%ni, 83.8%id, 15.7%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu13 :  0.0%us,  0.0%sy,  0.0%ni, 98.1%id,  1.8%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu14 :  0.0%us,  0.0%sy,  0.0%ni, 99.7%id,  0.3%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu15 :  0.0%us,  0.0%sy,  0.0%ni, 99.8%id,  0.2%wa,  0.0%hi,  0.0%si,  0.0%st

Time: 12:24:01 AM
Device:         rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz
avgqu-sz   await  svctm  %util
sdb              16.25     6.49 127.45 330.65  1173.03  3946.46
11.18    70.55  154.01   2.13  97.71

Time: 12:24:21 AM
Device:         rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz
avgqu-sz   await  svctm  %util
sdb               5.20     2.70 92.55 270.70   782.00  2795.20
9.85    70.12  189.42   2.69  97.85

Time: 12:24:41 AM
Device:         rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz
avgqu-sz   await  svctm  %util
sdb               0.25    11.90  5.20 863.35    43.60  9391.60
10.86   144.84  168.01   1.15 100.00

Time: 12:25:01 AM
Device:         rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz
avgqu-sz   await  svctm  %util
sdb              17.50     0.60 204.35 49.70  1776.00   524.00
9.05    18.64   74.23   3.74  95.03

-- 
Dave Cundiff
System Administrator
A2Hosting, Inc
http://www.a2hosting.com
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Chris Mason

2010-Sep-22 00:02 UTC

head link

Re: Performance degradation

On Tue, Sep 21, 2010 at 12:27:03AM -0400, Dave Cundiff
wrote:> On Mon, Sep 20, 2010 at 9:19 PM, K. Richard Pixley <rich@noir.com>
wrote:
> > Do you have any kernel processes running constantly?  Any taking near
100% of a CPU?
> >
> > --rich
> >
> 
> Now that I look again, it seems that the btrfs-cleaner process is
> still running. CPU isn''t much of an issue on this box. Its a dual
quad
> E5620. 8 cores, 16 threads.
> Looks like most of my load is wait from the cleaner process hammering
> the disks. I deleted about 10 snapshots I think. Should this take 4
> days to clean up from that?
It shouldn''t but it depends on how much metadata we have to read in to
process the snapshots.  Could you do a few sysrq-w?  We''ll see where
you
are spending your time based on the traces.

-chris
> 
> Cpu0  :  0.0%us,  0.2%sy,  0.0%ni, 85.1%id, 14.7%wa,  0.0%hi,  0.0%si, 
0.0%st
> Cpu1  :  0.0%us,  0.4%sy,  0.0%ni, 66.9%id, 32.6%wa,  0.0%hi,  0.1%si, 
0.0%st
> Cpu2  :  0.0%us,  0.2%sy,  0.0%ni, 88.7%id, 11.1%wa,  0.0%hi,  0.0%si, 
0.0%st
> Cpu3  :  0.0%us,  0.1%sy,  0.0%ni, 96.7%id,  3.2%wa,  0.0%hi,  0.0%si, 
0.0%st
> Cpu4  :  0.0%us,  0.2%sy,  0.0%ni, 92.3%id,  7.5%wa,  0.0%hi,  0.0%si, 
0.0%st
> Cpu5  :  0.0%us,  0.1%sy,  0.0%ni, 97.6%id,  2.4%wa,  0.0%hi,  0.0%si, 
0.0%st
> Cpu6  :  0.0%us,  0.1%sy,  0.0%ni, 98.5%id,  1.5%wa,  0.0%hi,  0.0%si, 
0.0%st
> Cpu7  :  0.0%us,  0.0%sy,  0.0%ni, 99.8%id,  0.2%wa,  0.0%hi,  0.0%si, 
0.0%st
> Cpu8  :  0.0%us,  0.1%sy,  0.0%ni, 95.7%id,  4.3%wa,  0.0%hi,  0.0%si, 
0.0%st
> Cpu9  :  0.0%us,  1.6%sy,  0.0%ni, 30.2%id, 68.2%wa,  0.0%hi,  0.0%si, 
0.0%st
> Cpu10 :  0.0%us,  0.0%sy,  0.0%ni, 95.1%id,  4.8%wa,  0.0%hi,  0.0%si, 
0.0%st
> Cpu11 :  0.0%us,  0.0%sy,  0.0%ni, 99.6%id,  0.4%wa,  0.0%hi,  0.0%si, 
0.0%st
> Cpu12 :  0.0%us,  0.4%sy,  0.0%ni, 83.8%id, 15.7%wa,  0.0%hi,  0.0%si, 
0.0%st
> Cpu13 :  0.0%us,  0.0%sy,  0.0%ni, 98.1%id,  1.8%wa,  0.0%hi,  0.0%si, 
0.0%st
> Cpu14 :  0.0%us,  0.0%sy,  0.0%ni, 99.7%id,  0.3%wa,  0.0%hi,  0.0%si, 
0.0%st
> Cpu15 :  0.0%us,  0.0%sy,  0.0%ni, 99.8%id,  0.2%wa,  0.0%hi,  0.0%si, 
0.0%st
> 
> Time: 12:24:01 AM
> Device:         rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz
> avgqu-sz   await  svctm  %util
> sdb              16.25     6.49 127.45 330.65  1173.03  3946.46
> 11.18    70.55  154.01   2.13  97.71
> 
> Time: 12:24:21 AM
> Device:         rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz
> avgqu-sz   await  svctm  %util
> sdb               5.20     2.70 92.55 270.70   782.00  2795.20
> 9.85    70.12  189.42   2.69  97.85
> 
> Time: 12:24:41 AM
> Device:         rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz
> avgqu-sz   await  svctm  %util
> sdb               0.25    11.90  5.20 863.35    43.60  9391.60
> 10.86   144.84  168.01   1.15 100.00
> 
> Time: 12:25:01 AM
> Device:         rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz
> avgqu-sz   await  svctm  %util
> sdb              17.50     0.60 204.35 49.70  1776.00   524.00
> 9.05    18.64   74.23   3.74  95.03
> 
> -- 
> Dave Cundiff
> System Administrator
> A2Hosting, Inc
> http://www.a2hosting.com
> --
> To unsubscribe from this list: send the line "unsubscribe
linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Daniel J Blueman

2010-Sep-22 12:03 UTC

head link

Re: Performance degradation

On 22 September 2010 01:02, Chris Mason <chris.mason@oracle.com>
wrote:> On Tue, Sep 21, 2010 at 12:27:03AM -0400, Dave Cundiff wrote:
>> On Mon, Sep 20, 2010 at 9:19 PM, K. Richard Pixley
<rich@noir.com> wrote:
>> > Do you have any kernel processes running constantly?  Any taking
near 100% of a CPU?
>> >
>> > --rich
>> >
>>
>> Now that I look again, it seems that the btrfs-cleaner process is
>> still running. CPU isn''t much of an issue on this box. Its a
dual quad
>> E5620. 8 cores, 16 threads.
>> Looks like most of my load is wait from the cleaner process hammering
>> the disks. I deleted about 10 snapshots I think. Should this take 4
>> days to clean up from that?
>
> It shouldn''t but it depends on how much metadata we have to read
in to
> process the snapshots.  Could you do a few sysrq-w?  We''ll see
where you
> are spending your time based on the traces.
Also, using ''perf'' may give a good picture of where the time
is spent, eg:

$ sudo perf record -a sleep 20
$ sudo perf report | tee profile.txt
-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Dave Cundiff

2010-Sep-24 19:39 UTC

head link

Re: Performance degradation

>> It shouldn''t but it depends on how much metadata we have to
read in to
>> process the snapshots.  Could you do a few sysrq-w?  We''ll see
where you
>> are spending your time based on the traces.
>
> Also, using ''perf'' may give a good picture of where the
time is spent, eg:
>
> $ sudo perf record -a sleep 20
> $ sudo perf report | tee profile.txt
> --
> Daniel J Blueman
>
The cleaner finished before I was able to get any debug, however,
here''s sysrq''s and perf data from my very slow running
backups. I also
did a test on the same box with rsync between the ext3 and btrfs
filesystem with 1 large file. rsync can pretty much saturate the
100mbit port writing to the ext3 filesystem. The ext3 and btrfs
filesystems are on the same RAID5 container. Just different
partitions.

btrfs filesystem:
rsync --progress -v -e ''ssh'' a2s55:/var/log/exim_mainlog .
exim_mainlog
    64019026 100%    1.76MB/s    0:00:34 (xfer#1, to-check=0/1)

Ext3:
rsync --progress -v -e ''ssh'' a2s55:/var/log/exim_mainlog .
exim_mainlog
    64032337 100%    8.40MB/s    0:00:07 (xfer#1, to-check=0/1)

Here''s an iostat across 20 seconds while the backups were running and
actively trying to copy a 10gig file:
sdb               0.00     0.00  0.25 27.45     2.00  4000.80   144.51
    0.01    0.28   0.22   0.60

The sysrq and perf data is also from the backup trying to copy a 10gig
file. Its not gonna complete for almost 2 hours... :(

home/dmportal/public_html/data3m7_data_wiki.tgz
1782775808  15%    1.57MB/s    1:40:45

The only process I could get to show blocked in the sysrq was ssh
which is how rsync is transmitting its data.

[336155.004946]   task                        PC stack   pid father
[336155.005018] ssh           D 00000001140c8669  4424 20748  20747 0x00000080
[336155.005024]  ffff880402421628 0000000000000086 ffff8804024215d8
0000000000004000
[336155.008563]  0000000000013140 ffff8801cd458378 ffff8801cd458000
ffff880236dd16b0
[336155.008567]  ffffffff02421678 ffffffff810e23d8 000000000000000a
0000000000000001
[336155.008572] Call Trace:
[336155.008581]  [<ffffffff810e23d8>] ? free_page_list+0xe8/0x100
[336155.008589]  [<ffffffff815cde98>] schedule_timeout+0x138/0x2b0
[336155.008595]  [<ffffffff81058920>] ? process_timeout+0x0/0x10
[336155.008599]  [<ffffffff815ccee0>] io_schedule_timeout+0x80/0xe0
[336155.008606]  [<ffffffff810ed9a1>] congestion_wait+0x71/0x90
[336155.008610]  [<ffffffff8106a280>] ? autoremove_wake_function+0x0/0x40
[336155.008614]  [<ffffffff810e4171>] shrink_inactive_list+0x2e1/0x310
[336155.008619]  [<ffffffff810e449f>] shrink_zone+0x2ff/0x4e0
[336155.008625]  [<ffffffff810e16b9>] ? shrink_slab+0x149/0x180
[336155.008628]  [<ffffffff810e486e>] zone_reclaim+0x1ee/0x290
[336155.008632]  [<ffffffff810d926a>] ? zone_watermark_ok+0x2a/0x100
[336155.008637]  [<ffffffff810db55a>] get_page_from_freelist+0x5fa/0x7b0
[336155.008642]  [<ffffffff81130956>] ? free_poll_entry+0x26/0x30
[336155.008647]  [<ffffffff81037f6e>] ? select_idle_sibling+0x9e/0x120
[336155.008652]  [<ffffffff810dc860>] __alloc_pages_nodemask+0x130/0x7e0
[336155.008658]  [<ffffffff8103f7cf>] ? enqueue_task_fair+0x4f/0x60
[336155.008661]  [<ffffffff8103d8f8>] ? enqueue_task+0x68/0xa0
[336155.008665]  [<ffffffff814bd8f1>] ? __alloc_skb+0x51/0x180
[336155.008671]  [<ffffffff8110f02c>] kmalloc_large_node+0x5c/0xb0
[336155.008676]  [<ffffffff81112166>]
__kmalloc_node_track_caller+0xf6/0x1f0
[336155.008682]  [<ffffffff814ba2f0>] ? sock_alloc_send_pskb+0x1a0/0x2f0
[336155.008685]  [<ffffffff814bd920>] __alloc_skb+0x80/0x180
[336155.008689]  [<ffffffff814ba2f0>] sock_alloc_send_pskb+0x1a0/0x2f0
[336155.008693]  [<ffffffff81037a6e>] ? __wake_up_sync_key+0x5e/0x80
[336155.008697]  [<ffffffff814ba455>] sock_alloc_send_skb+0x15/0x20
[336155.008704]  [<ffffffff81567c22>] unix_stream_sendmsg+0x282/0x3c0
[336155.008709]  [<ffffffff814b6b20>] sock_aio_write+0x170/0x180
[336155.008715]  [<ffffffff8111ff95>] do_sync_write+0xd5/0x120
[336155.008720]  [<ffffffff812204af>] ? security_file_permission+0x1f/0x70
[336155.008724]  [<ffffffff8112034e>] vfs_write+0x17e/0x190
[336155.008727]  [<ffffffff81120d25>] sys_write+0x55/0x90
[336155.008733]  [<ffffffff81002e1b>] system_call_fastpath+0x16/0x1b


and 60 seconds of perf data while the 10gig file was languishing. I
trimmed it just to the top entries. Lemme know if you need the entire
thing.

# Samples: 140516
#
# Overhead          Command                   Shared Object  Symbol
# ........  ...............  ..............................  ......
#
    25.92%              ssh  /lib64/libcrypto.so.0.9.8e      [.]
0x000000000e0bb8
     5.19%            rsync  /usr/bin/rsync                  [.] md5_process
     5.18%           [idle]  [kernel]                        [k] intel_idle
     4.35%              ssh  /usr/bin/ssh                    [.]
0x000000000254ae
     4.00%           [idle]  [kernel]                        [k] read_hpet
     3.67%              ssh  /lib64/libcrypto.so.0.9.8e      [.]
md5_block_asm_data_order
     2.64%            rsync  [kernel]                        [k]
__etree_search [btrfs]
     2.47%            rsync  [kernel]                        [k]
buffer_search  [btrfs]
     2.40%            rsync  [kernel]                        [k] read_hpet
     1.30%  btrfs-endio-wri  [kernel]                        [k]
buffer_search  [btrfs]
     1.16%              ssh  /lib64/libcrypto.so.0.9.8e      [.] AES_encrypt
     1.02%  btrfs-endio-wri  [kernel]                        [k]
__etree_search [btrfs]
     0.89%              ssh  [kernel]                        [k]
copy_user_generic_string
     0.76%            rsync  [kernel]                        [k] _raw_spin_lock
     0.53%            rsync  /usr/bin/rsync                  [.] read_timeout
     0.50%            rsync  [kernel]                        [k]
generic_bin_search     [btrfs]
     0.48%            rsync  [kernel]                        [k]
copy_user_generic_string
     0.48%   btrfs-worker-0  [kernel]                        [k]
crc32c_intel_le_hw     [crc32c_intel]
     0.42%            rsync  [kernel]                        [k]
btrfs_search_slot      [btrfs]
     0.38%            rsync  [kernel]                        [k]
map_private_extent_buffer      [btrfs]
     0.34%            rsync  [kernel]                        [k]
comp_keys      [btrfs]
     0.34%            rsync  [kernel]                        [k]
kmem_cache_alloc
     0.34%              ssh  /lib64/libc-2.5.so              [.] __GI_memcpy
     0.31%            rsync  [kernel]                        [k]
btrfs_comp_cpu_keys    [btrfs]


 and lastly, the same giant file writing to the ext3 partition on the
same RAID container while the btrfs transfer was still trying to copy
it. It was able to saturate the 100mbit port.

rsync --progress -v -e''ssh''
a2s55:/home/dmportal/public_html/data3m7_data_wiki.tgz .
data3m7_data_wiki.tgz 8.10MB/s    0:22:19


Thanks for any input!

-- 
Dave Cundiff
System Administrator
A2Hosting, Inc
http://www.a2hosting.com
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Daniel J Blueman

2010-Sep-26 14:05 UTC

head link

Re: Performance degradation

Hi Dave,

On 24 September 2010 20:39, Dave Cundiff <syshackmin@gmail.com>
wrote:>>> It shouldn''t but it depends on how much metadata we have
to read in to
>>> process the snapshots.  Could you do a few sysrq-w?  We''ll
see where you
>>> are spending your time based on the traces.
>>
>> Also, using ''perf'' may give a good picture of where
the time is spent, eg:
>>
>> $ sudo perf record -a sleep 20
>> $ sudo perf report | tee profile.txt
>> --
>> Daniel J Blueman
>>
>
> The cleaner finished before I was able to get any debug, however,
> here''s sysrq''s and perf data from my very slow running
backups. I also
> did a test on the same box with rsync between the ext3 and btrfs
> filesystem with 1 large file. rsync can pretty much saturate the
> 100mbit port writing to the ext3 filesystem. The ext3 and btrfs
> filesystems are on the same RAID5 container. Just different
> partitions.
>
> btrfs filesystem:
> rsync --progress -v -e ''ssh'' a2s55:/var/log/exim_mainlog
.
> exim_mainlog
>    64019026 100%    1.76MB/s    0:00:34 (xfer#1, to-check=0/1)
>
> Ext3:
> rsync --progress -v -e ''ssh'' a2s55:/var/log/exim_mainlog
.
> exim_mainlog
>    64032337 100%    8.40MB/s    0:00:07 (xfer#1, to-check=0/1)
>
> Here''s an iostat across 20 seconds while the backups were running
and
> actively trying to copy a 10gig file:
> sdb               0.00     0.00  0.25 27.45     2.00  4000.80   144.51
>    0.01    0.28   0.22   0.60
>
> The sysrq and perf data is also from the backup trying to copy a 10gig
> file. Its not gonna complete for almost 2 hours... :(
>
> home/dmportal/public_html/data3m7_data_wiki.tgz
> 1782775808  15%    1.57MB/s    1:40:45
>
> The only process I could get to show blocked in the sysrq was ssh
> which is how rsync is transmitting its data.
>
> [336155.004946]   task                        PC stack   pid father
> [336155.005018] ssh           D 00000001140c8669  4424 20748  20747
0x00000080
> [336155.005024]  ffff880402421628 0000000000000086 ffff8804024215d8
> 0000000000004000
> [336155.008563]  0000000000013140 ffff8801cd458378 ffff8801cd458000
> ffff880236dd16b0
> [336155.008567]  ffffffff02421678 ffffffff810e23d8 000000000000000a
> 0000000000000001
> [336155.008572] Call Trace:
> [336155.008581]  [<ffffffff810e23d8>] ? free_page_list+0xe8/0x100
> [336155.008589]  [<ffffffff815cde98>] schedule_timeout+0x138/0x2b0
> [336155.008595]  [<ffffffff81058920>] ? process_timeout+0x0/0x10
> [336155.008599]  [<ffffffff815ccee0>] io_schedule_timeout+0x80/0xe0
> [336155.008606]  [<ffffffff810ed9a1>] congestion_wait+0x71/0x90
> [336155.008610]  [<ffffffff8106a280>] ?
autoremove_wake_function+0x0/0x40
> [336155.008614]  [<ffffffff810e4171>]
shrink_inactive_list+0x2e1/0x310
> [336155.008619]  [<ffffffff810e449f>] shrink_zone+0x2ff/0x4e0
> [336155.008625]  [<ffffffff810e16b9>] ? shrink_slab+0x149/0x180
> [336155.008628]  [<ffffffff810e486e>] zone_reclaim+0x1ee/0x290
> [336155.008632]  [<ffffffff810d926a>] ? zone_watermark_ok+0x2a/0x100
> [336155.008637]  [<ffffffff810db55a>]
get_page_from_freelist+0x5fa/0x7b0
> [336155.008642]  [<ffffffff81130956>] ? free_poll_entry+0x26/0x30
> [336155.008647]  [<ffffffff81037f6e>] ?
select_idle_sibling+0x9e/0x120
> [336155.008652]  [<ffffffff810dc860>]
__alloc_pages_nodemask+0x130/0x7e0
> [336155.008658]  [<ffffffff8103f7cf>] ? enqueue_task_fair+0x4f/0x60
> [336155.008661]  [<ffffffff8103d8f8>] ? enqueue_task+0x68/0xa0
> [336155.008665]  [<ffffffff814bd8f1>] ? __alloc_skb+0x51/0x180
> [336155.008671]  [<ffffffff8110f02c>] kmalloc_large_node+0x5c/0xb0
> [336155.008676]  [<ffffffff81112166>]
__kmalloc_node_track_caller+0xf6/0x1f0
> [336155.008682]  [<ffffffff814ba2f0>] ?
sock_alloc_send_pskb+0x1a0/0x2f0
> [336155.008685]  [<ffffffff814bd920>] __alloc_skb+0x80/0x180
> [336155.008689]  [<ffffffff814ba2f0>]
sock_alloc_send_pskb+0x1a0/0x2f0
> [336155.008693]  [<ffffffff81037a6e>] ? __wake_up_sync_key+0x5e/0x80
> [336155.008697]  [<ffffffff814ba455>] sock_alloc_send_skb+0x15/0x20
> [336155.008704]  [<ffffffff81567c22>] unix_stream_sendmsg+0x282/0x3c0
> [336155.008709]  [<ffffffff814b6b20>] sock_aio_write+0x170/0x180
> [336155.008715]  [<ffffffff8111ff95>] do_sync_write+0xd5/0x120
> [336155.008720]  [<ffffffff812204af>] ?
security_file_permission+0x1f/0x70
> [336155.008724]  [<ffffffff8112034e>] vfs_write+0x17e/0x190
> [336155.008727]  [<ffffffff81120d25>] sys_write+0x55/0x90
> [336155.008733]  [<ffffffff81002e1b>] system_call_fastpath+0x16/0x1b
>
>
> and 60 seconds of perf data while the 10gig file was languishing. I
> trimmed it just to the top entries. Lemme know if you need the entire
> thing.
>
> # Samples: 140516
> #
> # Overhead          Command                   Shared Object  Symbol
> # ........  ...............  ..............................  ......
> #
>    25.92%              ssh  /lib64/libcrypto.so.0.9.8e      [.]
> 0x000000000e0bb8
>     5.19%            rsync  /usr/bin/rsync                  [.] md5_process
>     5.18%           [idle]  [kernel]                        [k] intel_idle
>     4.35%              ssh  /usr/bin/ssh                    [.]
> 0x000000000254ae
>     4.00%           [idle]  [kernel]                        [k] read_hpet
>     3.67%              ssh  /lib64/libcrypto.so.0.9.8e      [.]
> md5_block_asm_data_order
>     2.64%            rsync  [kernel]                        [k]
> __etree_search [btrfs]
>     2.47%            rsync  [kernel]                        [k]
> buffer_search  [btrfs]
>     2.40%            rsync  [kernel]                        [k] read_hpet
>     1.30%  btrfs-endio-wri  [kernel]                        [k]
> buffer_search  [btrfs]
>     1.16%              ssh  /lib64/libcrypto.so.0.9.8e      [.] AES_encrypt
>     1.02%  btrfs-endio-wri  [kernel]                        [k]
> __etree_search [btrfs]
>     0.89%              ssh  [kernel]                        [k]
> copy_user_generic_string
>     0.76%            rsync  [kernel]                        [k]
_raw_spin_lock
>     0.53%            rsync  /usr/bin/rsync                  [.]
read_timeout
>     0.50%            rsync  [kernel]                        [k]
> generic_bin_search     [btrfs]
>     0.48%            rsync  [kernel]                        [k]
> copy_user_generic_string
>     0.48%   btrfs-worker-0  [kernel]                        [k]
> crc32c_intel_le_hw     [crc32c_intel]
>     0.42%            rsync  [kernel]                        [k]
> btrfs_search_slot      [btrfs]
>     0.38%            rsync  [kernel]                        [k]
> map_private_extent_buffer      [btrfs]
>     0.34%            rsync  [kernel]                        [k]
> comp_keys      [btrfs]
>     0.34%            rsync  [kernel]                        [k]
> kmem_cache_alloc
>     0.34%              ssh  /lib64/libc-2.5.so              [.] __GI_memcpy
>     0.31%            rsync  [kernel]                        [k]
> btrfs_comp_cpu_keys    [btrfs]
>
>
>  and lastly, the same giant file writing to the ext3 partition on the
> same RAID container while the btrfs transfer was still trying to copy
> it. It was able to saturate the 100mbit port.
>
> rsync --progress -v -e''ssh''
> a2s55:/home/dmportal/public_html/data3m7_data_wiki.tgz .
> data3m7_data_wiki.tgz 8.10MB/s    0:22:19
The profile here shows relatively little time spent in BTRFS, thus
most likely the high system-load is due to processes waiting for disk
(which is counted as busy, even though the process isn''t scheduled).

The 25% in libcrypto is saturating one core I guess, thus the limiting
factor here. I would suggest to use:

$ rsync --progress -v -e''ssh -o ciphers=arcfour -o
compression=no''
a2s55:/home/dmportal/public_html/data3m7_data_wiki.tgz .

SSH compression destroys throughput, and arcfour is really much faster
than AES (until you get Intel Westmere + OS support). I was finding
that BTRFS compression was really hurting writeback, due to a known
regression.

Are you using it here on your backup system? Anyway, I guess you need
to grab profile information when you are getting the slow snapshot
deletion...

Dan
-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Btrfs devel - Sep 2010 - Performance degradation

Performance degradation

Re: Performance degradation

Re: Performance degradation

Re: Performance degradation

Re: Performance degradation

Re: Performance degradation

Re: Performance degradation