thr3ads.net - Btrfs devel - 3.14.0-rc3: btrfs send/receive blocks btrfs IO on other devices (near deadlocks) [Mar 2014]

If this information is useful, please help other people find it:
Share via:
Marc MERLIN
2014-Mar-12 15:18 UTC
3.14.0-rc3: btrfs send/receive blocks btrfs IO on other devices (near deadlocks)

I have a file server with 4 cpu cores and 5 btrfs devices:
Label: btrfs_boot  uuid: e4c1daa8-9c39-4a59-b0a9-86297d397f3b
        Total devices 1 FS bytes used 48.92GiB
        devid    1 size 79.93GiB used 73.04GiB path /dev/mapper/cryptroot

Label: varlocalspace  uuid: 9f46dbe2-1344-44c3-b0fb-af2888c34f18
        Total devices 1 FS bytes used 1.10TiB
        devid    1 size 1.63TiB used 1.50TiB path /dev/mapper/cryptraid0

Label: btrfs_pool1  uuid: 6358304a-2234-4243-b02d-4944c9af47d7
        Total devices 1 FS bytes used 7.16TiB
        devid    1 size 14.55TiB used 7.50TiB path /dev/mapper/dshelf1

Label: btrfs_pool2  uuid: cb9df6d3-a528-4afc-9a45-4fed5ec358d6
        Total devices 1 FS bytes used 3.34TiB
        devid    1 size 7.28TiB used 3.42TiB path /dev/mapper/dshelf2

Label: bigbackup  uuid: 024ba4d0-dacb-438d-9f1b-eeb34083fe49
        Total devices 5 FS bytes used 6.02TiB
        devid    1 size 1.82TiB used 1.43TiB path /dev/dm-9
        devid    2 size 1.82TiB used 1.43TiB path /dev/dm-6
        devid    3 size 1.82TiB used 1.43TiB path /dev/dm-5
        devid    4 size 1.82TiB used 1.43TiB path /dev/dm-7
        devid    5 size 1.82TiB used 1.43TiB path /dev/dm-8


I have a very long running btrfs send/receive from btrfs_pool1 to bigbackup
(long running meaning that it's been slowly copying over 5 days)

The problem is that this is blocking IO to btrfs_pool2 which is using
totally different drives.
By blocking IO I mean that IO to pool2 kind of works sometimes, and
hangs for very long times at other times.

It looks as if one rsync to btrfs_pool2 or one piece of IO hangs on a shared
lock
and once that happens, all IO to btrfs_pool2 stops for a long time.
It does recover eventually without reboot, but the wait times are ridiculous (it
could be 1H or more).

As I write this, I have a killall -9 rsync that waited for over 10mn before
these processes would finally die:
23555       07:36 wait_current_trans.isra.15     rsync -av -SH --delete (...)
23556       07:36 exit                           [rsync] <defunct>
25387  2-04:41:22 wait_current_trans.isra.15     rsync --password-file  (...)
27481       31:26 wait_current_trans.isra.15     rsync --password-file  (...)
29268    04:41:34 wait_current_trans.isra.15     rsync --password-file  (...)
29343    04:41:31 exit                           [rsync] <defunct>
29492    04:41:27 wait_current_trans.isra.15     rsync --password-file  (...)

14559    07:14:49 wait_current_trans.isra.15     cp -i -al current
20140312-feisty

This is all stuck in btrfs kernel code.
If someeone wants sysrq-w, there it is.
http://marc.merlins.org/tmp/btrfs_full.txt

A quick summary:
SysRq : Show Blocked State
  task                        PC stack   pid father
btrfs-cleaner   D ffff8802126b0840     0  3332      2 0x00000000
 ffff8800c5dc9d00 0000000000000046 ffff8800c5dc9fd8 ffff8800c69f6310
 00000000000141c0 ffff8800c69f6310 ffff88017574c170 ffff880211e671e8
 0000000000000000 ffff880211e67000 ffff8801e5936e20 ffff8800c5dc9d10
Call Trace:
 [<ffffffff8160b0d9>] schedule+0x73/0x75
 [<ffffffff8122a3c7>] wait_current_trans.isra.15+0x98/0xf4
 [<ffffffff81085062>] ? finish_wait+0x65/0x65
 [<ffffffff8122b86c>] start_transaction+0x48e/0x4f2
 [<ffffffff8122bc4f>] ? __btrfs_end_transaction+0x2a1/0x2c6
 [<ffffffff8122b8eb>] btrfs_start_transaction+0x1b/0x1d
 [<ffffffff8121c5cd>] btrfs_drop_snapshot+0x443/0x610
 [<ffffffff8160d7b3>] ? _raw_spin_unlock+0x17/0x2a
 [<ffffffff81074efb>] ? finish_task_switch+0x51/0xdb
 [<ffffffff8160afbf>] ? __schedule+0x537/0x5de
 [<ffffffff8122c08d>] btrfs_clean_one_deleted_snapshot+0x103/0x10f
 [<ffffffff81224859>] cleaner_kthread+0x103/0x136
 [<ffffffff81224756>] ? btrfs_alloc_root+0x26/0x26
 [<ffffffff8106bc1b>] kthread+0xae/0xb6
 [<ffffffff8106bb6d>] ? __kthread_parkme+0x61/0x61
 [<ffffffff816141bc>] ret_from_fork+0x7c/0xb0
 [<ffffffff8106bb6d>] ? __kthread_parkme+0x61/0x61
btrfs-transacti D ffff88021387eb00     0  3333      2 0x00000000
 ffff8800c5dcb890 0000000000000046 ffff8800c5dcbfd8 ffff88021387e5d0
 00000000000141c0 ffff88021387e5d0 ffff88021f2141c0 ffff88021387e5d0
 ffff8800c5dcb930 ffffffff810fe574 0000000000000002 ffff8800c5dcb8a0
Call Trace:
 [<ffffffff810fe574>] ? wait_on_page_read+0x3c/0x3c
 [<ffffffff8160b0d9>] schedule+0x73/0x75
 [<ffffffff8160b27e>] io_schedule+0x60/0x7a
 [<ffffffff810fe582>] sleep_on_page+0xe/0x12
 [<ffffffff8160b510>] __wait_on_bit+0x48/0x7a
 [<ffffffff810fe522>] wait_on_page_bit+0x7a/0x7c
 [<ffffffff81085096>] ? autoremove_wake_function+0x34/0x34
 [<ffffffff81245c70>] read_extent_buffer_pages+0x1bf/0x204
 [<ffffffff81223710>] ? free_root_pointers+0x5b/0x5b
 [<ffffffff81224412>]
btree_read_extent_buffer_pages.constprop.45+0x66/0x100
 [<ffffffff81225367>] read_tree_block+0x2f/0x47
 [<ffffffff8120e4b6>] read_block_for_search.isra.26+0x24a/0x287
 [<ffffffff8120fcf7>] btrfs_search_slot+0x4f4/0x6bb
 [<ffffffff81214c3d>] lookup_inline_extent_backref+0xda/0x3fb
 [<ffffffff812167e1>] __btrfs_free_extent+0xf4/0x712
 [<ffffffff8121ba57>] __btrfs_run_delayed_refs+0x939/0xbdf
 [<ffffffff8121d896>] btrfs_run_delayed_refs+0x81/0x18f
 [<ffffffff8122af3e>] btrfs_commit_transaction+0x3a9/0x849
 [<ffffffff81085062>] ? finish_wait+0x65/0x65
 [<ffffffff81227598>] transaction_kthread+0xf8/0x1ab
 [<ffffffff812274a0>] ? btrfs_cleanup_transaction+0x43f/0x43f
 [<ffffffff8106bc1b>] kthread+0xae/0xb6
 [<ffffffff8106bb6d>] ? __kthread_parkme+0x61/0x61
 [<ffffffff816141bc>] ret_from_fork+0x7c/0xb0
 [<ffffffff8106bb6d>] ? __kthread_parkme+0x61/0x61


Worse, taking that dump gave me:
gargamel:/etc/udev/rules.d# echo w > /proc/sysrq-trigger 
Message from syslogd@gargamel at Mar 12 07:13:16 ...
 kernel:[1234536.531251] BUG: soft lockup - CPU#1 stuck for 22s! [mysqld:12552]
Message from syslogd@gargamel at Mar 12 07:13:16 ...
 kernel:[1234536.559276] BUG: soft lockup - CPU#2 stuck for 22s! [mysqld:4955]
Message from syslogd@gargamel at Mar 12 07:13:16 ...
 kernel:[1234540.538636] BUG: soft lockup - CPU#0 stuck for 22s! [kswapd0:48]
(there are more in the logs attached on what those lockups are)

Thankfully my btrfs send/receive is still working and should finish
today, but the amount of time it's been taking has been painful, and
the effect it's been having on the rest of my system, making it hang
or rendering its devices unusable for long periods of time, has been
punishing.

Can someone figure out from the kernel logs what is causing those near
deadlocks?

Actually this was so bad apparently that sysrq w didn't even all make it
to syslog/disk (also on btrfs) but thankfully I got it on serial console.


I also found this suring sysrq. Shouldit be reported to someone else?
INFO: rcu_preempt detected stalls on CPUs/tasks:
	3: (1 GPs behind) idle=395/140000000000000/0 softirq=284540927/284540928
last_accelerate: ed62/2821, nonlazy_posted: 1, ..
	(detected by 0, t=15002 jiffies, g=100566635, c=100566634, q=87438)
sending NMI to all CPUs:
NMI backtrace for cpu 3
CPU: 3 PID: 21730 Comm: bash Not tainted 3.14.0-rc3-amd64-i915-preempt-20140216
#2
Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3806
08/20/2012
task: ffff88001cf3a710 ti: ffff880037f78000 task.ti: ffff880037f78000
RIP: 0010:[<ffffffff81309a80>]  [<ffffffff81309a80>]
paravirt_read_tsc+0x0/0xd
RSP: 0018:ffff880037f79ac0  EFLAGS: 00000046
RAX: 0000000000000003 RBX: 0000000094b043ff RCX: 0000000000000000
RDX: 0000000000000004 RSI: 00000000000003fd RDI: 0000000000000001
RBP: ffff880037f79ae8 R08: ffffffff81cf24d0 R09: 00000000fffffffe
R10: 0000000000001a18 R11: 0000000000000000 R12: 00000000000009fb
R13: 0000000000000003 R14: 0000000094b047d7 R15: 0000000000000036
FS:  0000000000000000(0000) GS:ffff88021f380000(0063) knlGS:00000000f754b6c0
CS:  0010 DS: 002b ES: 002b CR0: 0000000080050033
CR2: 00000000ffc13412 CR3: 0000000145622000 CR4: 00000000000407e0
Stack:
 ffffffff81309b59 ffffffff81f27560 00000000000026f0 0000000000000020
 ffffffff813c9e1b ffff880037f79af8 ffffffff81309ac9 ffff880037f79b08
 ffffffff81309aef ffff880037f79b30 ffffffff813c9cc6 ffffffff81f27560
Call Trace:
 [<ffffffff81309b59>] ? delay_tsc+0x3d/0xa4
 [<ffffffff813c9e1b>] ? serial8250_console_write+0x10d/0x10d
 [<ffffffff81309ac9>] __delay+0xf/0x11
 [<ffffffff81309aef>] __const_udelay+0x24/0x26
 [<ffffffff813c9cc6>] wait_for_xmitr+0x49/0x91
 [<ffffffff813c9e37>] serial8250_console_putchar+0x1c/0x2e
 [<ffffffff813c5d97>] uart_console_write+0x3f/0x54
 [<ffffffff813c9dc8>] serial8250_console_write+0xba/0x10d
 [<ffffffff8109363b>] call_console_drivers.constprop.6+0xbc/0xf0
 [<ffffffff81093bf7>] console_unlock+0x269/0x302
 [<ffffffff8109405e>] vprintk_emit+0x3ce/0x3ff
 [<ffffffff81604702>] printk+0x54/0x56
 [<ffffffff81089799>] ? arch_local_irq_save+0x15/0x1b
 [<ffffffff8108752e>] print_cfs_rq+0x4fc/0xd71
 [<ffffffff81080fff>] print_cfs_stats+0x5a/0x9e
 [<ffffffff81086c65>] print_cpu+0x538/0x8e3
 [<ffffffff81087f7e>] sysrq_sched_debug_show+0x1f/0x42
 [<ffffffff81078874>] show_state_filter+0x92/0x9f
 [<ffffffff813b7c7a>] sysrq_handle_showstate_blocked+0x13/0x15
 [<ffffffff813b82c3>] __handle_sysrq+0xa0/0x138
 [<ffffffff813b8630>] write_sysrq_trigger+0x28/0x37
 [<ffffffff811a565a>] proc_reg_write+0x5a/0x7c
 [<ffffffff81155417>] vfs_write+0xab/0x107
 [<ffffffff81155b19>] SyS_write+0x46/0x79
 [<ffffffff81615f6c>] sysenter_dispatch+0x7/0x21
Code: 89 e5 e8 a2 fe ff ff 89 c2 66 31 c0 c1 e2 10 01 d0 15 ff ff 00 00 f7 d0 c1
e8 10 5d c3 90 90 90 90 90 90 90 90 90 90 90 90 90 90 <55> 48 89 e5 e8 9c
c4 d0 ff 66 90 5d c3 66 66 66 66 90 55 48 89


Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" -
A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Btrfs devel - Mar 2014 - 3.14.0-rc3: btrfs send/receive blocks btrfs IO on other devices (near deadlocks)

3.14.0-rc3: btrfs send/receive blocks btrfs IO on other devices (near deadlocks)