Marc MERLIN
2014-Mar-12 15:18 UTC
3.14.0-rc3: btrfs send/receive blocks btrfs IO on other devices (near deadlocks)
I have a file server with 4 cpu cores and 5 btrfs devices:
Label: btrfs_boot uuid: e4c1daa8-9c39-4a59-b0a9-86297d397f3b
Total devices 1 FS bytes used 48.92GiB
devid 1 size 79.93GiB used 73.04GiB path /dev/mapper/cryptroot
Label: varlocalspace uuid: 9f46dbe2-1344-44c3-b0fb-af2888c34f18
Total devices 1 FS bytes used 1.10TiB
devid 1 size 1.63TiB used 1.50TiB path /dev/mapper/cryptraid0
Label: btrfs_pool1 uuid: 6358304a-2234-4243-b02d-4944c9af47d7
Total devices 1 FS bytes used 7.16TiB
devid 1 size 14.55TiB used 7.50TiB path /dev/mapper/dshelf1
Label: btrfs_pool2 uuid: cb9df6d3-a528-4afc-9a45-4fed5ec358d6
Total devices 1 FS bytes used 3.34TiB
devid 1 size 7.28TiB used 3.42TiB path /dev/mapper/dshelf2
Label: bigbackup uuid: 024ba4d0-dacb-438d-9f1b-eeb34083fe49
Total devices 5 FS bytes used 6.02TiB
devid 1 size 1.82TiB used 1.43TiB path /dev/dm-9
devid 2 size 1.82TiB used 1.43TiB path /dev/dm-6
devid 3 size 1.82TiB used 1.43TiB path /dev/dm-5
devid 4 size 1.82TiB used 1.43TiB path /dev/dm-7
devid 5 size 1.82TiB used 1.43TiB path /dev/dm-8
I have a very long running btrfs send/receive from btrfs_pool1 to bigbackup
(long running meaning that it's been slowly copying over 5 days)
The problem is that this is blocking IO to btrfs_pool2 which is using
totally different drives.
By blocking IO I mean that IO to pool2 kind of works sometimes, and
hangs for very long times at other times.
It looks as if one rsync to btrfs_pool2 or one piece of IO hangs on a shared
lock
and once that happens, all IO to btrfs_pool2 stops for a long time.
It does recover eventually without reboot, but the wait times are ridiculous (it
could be 1H or more).
As I write this, I have a killall -9 rsync that waited for over 10mn before
these processes would finally die:
23555 07:36 wait_current_trans.isra.15 rsync -av -SH --delete (...)
23556 07:36 exit [rsync] <defunct>
25387 2-04:41:22 wait_current_trans.isra.15 rsync --password-file (...)
27481 31:26 wait_current_trans.isra.15 rsync --password-file (...)
29268 04:41:34 wait_current_trans.isra.15 rsync --password-file (...)
29343 04:41:31 exit [rsync] <defunct>
29492 04:41:27 wait_current_trans.isra.15 rsync --password-file (...)
14559 07:14:49 wait_current_trans.isra.15 cp -i -al current
20140312-feisty
This is all stuck in btrfs kernel code.
If someeone wants sysrq-w, there it is.
http://marc.merlins.org/tmp/btrfs_full.txt
A quick summary:
SysRq : Show Blocked State
task PC stack pid father
btrfs-cleaner D ffff8802126b0840 0 3332 2 0x00000000
ffff8800c5dc9d00 0000000000000046 ffff8800c5dc9fd8 ffff8800c69f6310
00000000000141c0 ffff8800c69f6310 ffff88017574c170 ffff880211e671e8
0000000000000000 ffff880211e67000 ffff8801e5936e20 ffff8800c5dc9d10
Call Trace:
[<ffffffff8160b0d9>] schedule+0x73/0x75
[<ffffffff8122a3c7>] wait_current_trans.isra.15+0x98/0xf4
[<ffffffff81085062>] ? finish_wait+0x65/0x65
[<ffffffff8122b86c>] start_transaction+0x48e/0x4f2
[<ffffffff8122bc4f>] ? __btrfs_end_transaction+0x2a1/0x2c6
[<ffffffff8122b8eb>] btrfs_start_transaction+0x1b/0x1d
[<ffffffff8121c5cd>] btrfs_drop_snapshot+0x443/0x610
[<ffffffff8160d7b3>] ? _raw_spin_unlock+0x17/0x2a
[<ffffffff81074efb>] ? finish_task_switch+0x51/0xdb
[<ffffffff8160afbf>] ? __schedule+0x537/0x5de
[<ffffffff8122c08d>] btrfs_clean_one_deleted_snapshot+0x103/0x10f
[<ffffffff81224859>] cleaner_kthread+0x103/0x136
[<ffffffff81224756>] ? btrfs_alloc_root+0x26/0x26
[<ffffffff8106bc1b>] kthread+0xae/0xb6
[<ffffffff8106bb6d>] ? __kthread_parkme+0x61/0x61
[<ffffffff816141bc>] ret_from_fork+0x7c/0xb0
[<ffffffff8106bb6d>] ? __kthread_parkme+0x61/0x61
btrfs-transacti D ffff88021387eb00 0 3333 2 0x00000000
ffff8800c5dcb890 0000000000000046 ffff8800c5dcbfd8 ffff88021387e5d0
00000000000141c0 ffff88021387e5d0 ffff88021f2141c0 ffff88021387e5d0
ffff8800c5dcb930 ffffffff810fe574 0000000000000002 ffff8800c5dcb8a0
Call Trace:
[<ffffffff810fe574>] ? wait_on_page_read+0x3c/0x3c
[<ffffffff8160b0d9>] schedule+0x73/0x75
[<ffffffff8160b27e>] io_schedule+0x60/0x7a
[<ffffffff810fe582>] sleep_on_page+0xe/0x12
[<ffffffff8160b510>] __wait_on_bit+0x48/0x7a
[<ffffffff810fe522>] wait_on_page_bit+0x7a/0x7c
[<ffffffff81085096>] ? autoremove_wake_function+0x34/0x34
[<ffffffff81245c70>] read_extent_buffer_pages+0x1bf/0x204
[<ffffffff81223710>] ? free_root_pointers+0x5b/0x5b
[<ffffffff81224412>]
btree_read_extent_buffer_pages.constprop.45+0x66/0x100
[<ffffffff81225367>] read_tree_block+0x2f/0x47
[<ffffffff8120e4b6>] read_block_for_search.isra.26+0x24a/0x287
[<ffffffff8120fcf7>] btrfs_search_slot+0x4f4/0x6bb
[<ffffffff81214c3d>] lookup_inline_extent_backref+0xda/0x3fb
[<ffffffff812167e1>] __btrfs_free_extent+0xf4/0x712
[<ffffffff8121ba57>] __btrfs_run_delayed_refs+0x939/0xbdf
[<ffffffff8121d896>] btrfs_run_delayed_refs+0x81/0x18f
[<ffffffff8122af3e>] btrfs_commit_transaction+0x3a9/0x849
[<ffffffff81085062>] ? finish_wait+0x65/0x65
[<ffffffff81227598>] transaction_kthread+0xf8/0x1ab
[<ffffffff812274a0>] ? btrfs_cleanup_transaction+0x43f/0x43f
[<ffffffff8106bc1b>] kthread+0xae/0xb6
[<ffffffff8106bb6d>] ? __kthread_parkme+0x61/0x61
[<ffffffff816141bc>] ret_from_fork+0x7c/0xb0
[<ffffffff8106bb6d>] ? __kthread_parkme+0x61/0x61
Worse, taking that dump gave me:
gargamel:/etc/udev/rules.d# echo w > /proc/sysrq-trigger
Message from syslogd@gargamel at Mar 12 07:13:16 ...
kernel:[1234536.531251] BUG: soft lockup - CPU#1 stuck for 22s! [mysqld:12552]
Message from syslogd@gargamel at Mar 12 07:13:16 ...
kernel:[1234536.559276] BUG: soft lockup - CPU#2 stuck for 22s! [mysqld:4955]
Message from syslogd@gargamel at Mar 12 07:13:16 ...
kernel:[1234540.538636] BUG: soft lockup - CPU#0 stuck for 22s! [kswapd0:48]
(there are more in the logs attached on what those lockups are)
Thankfully my btrfs send/receive is still working and should finish
today, but the amount of time it's been taking has been painful, and
the effect it's been having on the rest of my system, making it hang
or rendering its devices unusable for long periods of time, has been
punishing.
Can someone figure out from the kernel logs what is causing those near
deadlocks?
Actually this was so bad apparently that sysrq w didn't even all make it
to syslog/disk (also on btrfs) but thankfully I got it on serial console.
I also found this suring sysrq. Shouldit be reported to someone else?
INFO: rcu_preempt detected stalls on CPUs/tasks:
3: (1 GPs behind) idle=395/140000000000000/0 softirq=284540927/284540928
last_accelerate: ed62/2821, nonlazy_posted: 1, ..
(detected by 0, t=15002 jiffies, g=100566635, c=100566634, q=87438)
sending NMI to all CPUs:
NMI backtrace for cpu 3
CPU: 3 PID: 21730 Comm: bash Not tainted 3.14.0-rc3-amd64-i915-preempt-20140216
#2
Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3806
08/20/2012
task: ffff88001cf3a710 ti: ffff880037f78000 task.ti: ffff880037f78000
RIP: 0010:[<ffffffff81309a80>] [<ffffffff81309a80>]
paravirt_read_tsc+0x0/0xd
RSP: 0018:ffff880037f79ac0 EFLAGS: 00000046
RAX: 0000000000000003 RBX: 0000000094b043ff RCX: 0000000000000000
RDX: 0000000000000004 RSI: 00000000000003fd RDI: 0000000000000001
RBP: ffff880037f79ae8 R08: ffffffff81cf24d0 R09: 00000000fffffffe
R10: 0000000000001a18 R11: 0000000000000000 R12: 00000000000009fb
R13: 0000000000000003 R14: 0000000094b047d7 R15: 0000000000000036
FS: 0000000000000000(0000) GS:ffff88021f380000(0063) knlGS:00000000f754b6c0
CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033
CR2: 00000000ffc13412 CR3: 0000000145622000 CR4: 00000000000407e0
Stack:
ffffffff81309b59 ffffffff81f27560 00000000000026f0 0000000000000020
ffffffff813c9e1b ffff880037f79af8 ffffffff81309ac9 ffff880037f79b08
ffffffff81309aef ffff880037f79b30 ffffffff813c9cc6 ffffffff81f27560
Call Trace:
[<ffffffff81309b59>] ? delay_tsc+0x3d/0xa4
[<ffffffff813c9e1b>] ? serial8250_console_write+0x10d/0x10d
[<ffffffff81309ac9>] __delay+0xf/0x11
[<ffffffff81309aef>] __const_udelay+0x24/0x26
[<ffffffff813c9cc6>] wait_for_xmitr+0x49/0x91
[<ffffffff813c9e37>] serial8250_console_putchar+0x1c/0x2e
[<ffffffff813c5d97>] uart_console_write+0x3f/0x54
[<ffffffff813c9dc8>] serial8250_console_write+0xba/0x10d
[<ffffffff8109363b>] call_console_drivers.constprop.6+0xbc/0xf0
[<ffffffff81093bf7>] console_unlock+0x269/0x302
[<ffffffff8109405e>] vprintk_emit+0x3ce/0x3ff
[<ffffffff81604702>] printk+0x54/0x56
[<ffffffff81089799>] ? arch_local_irq_save+0x15/0x1b
[<ffffffff8108752e>] print_cfs_rq+0x4fc/0xd71
[<ffffffff81080fff>] print_cfs_stats+0x5a/0x9e
[<ffffffff81086c65>] print_cpu+0x538/0x8e3
[<ffffffff81087f7e>] sysrq_sched_debug_show+0x1f/0x42
[<ffffffff81078874>] show_state_filter+0x92/0x9f
[<ffffffff813b7c7a>] sysrq_handle_showstate_blocked+0x13/0x15
[<ffffffff813b82c3>] __handle_sysrq+0xa0/0x138
[<ffffffff813b8630>] write_sysrq_trigger+0x28/0x37
[<ffffffff811a565a>] proc_reg_write+0x5a/0x7c
[<ffffffff81155417>] vfs_write+0xab/0x107
[<ffffffff81155b19>] SyS_write+0x46/0x79
[<ffffffff81615f6c>] sysenter_dispatch+0x7/0x21
Code: 89 e5 e8 a2 fe ff ff 89 c2 66 31 c0 c1 e2 10 01 d0 15 ff ff 00 00 f7 d0 c1
e8 10 5d c3 90 90 90 90 90 90 90 90 90 90 90 90 90 90 <55> 48 89 e5 e8 9c
c4 d0 ff 66 90 5d c3 66 66 66 66 90 55 48 89
Thanks,
Marc
--
"A mouse is a device used to point at the xterm you want to type in" -
A.S.R.
Microsoft is to operating systems ....
.... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html