I''ve observed this hanging task now several times. Not sure when this started, but 2.6.32 is affected too, IIRC. I don''t have a test pattern for this. Dovecot imap triggers this from time to time. I''ve enabled CONFIG_DETECT_HUNG_TASK now and got this two tasks which hang: INFO: task imap:2958 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. imap D 0000000000000000 0 2958 2653 0x00000000 ffff88008caf5a28 0000000000000046 0000000000000000 ffffffff810544cf ffff88008caf5998 0000000000000001 ffff88008caf5fd8 ffff88008caf9530 000000000000de78 00000000001d2700 00000000001d2700 ffff88008caf9530 Call Trace: [<ffffffff810544cf>] ? trace_hardirqs_off+0xd/0xf [<ffffffff8105613b>] ? trace_hardirqs_on_caller+0x10c/0x130 [<ffffffff81071262>] ? sync_page+0x0/0x48 [<ffffffff8138b46a>] io_schedule+0x38/0x4d [<ffffffff810712a6>] sync_page+0x44/0x48 [<ffffffff8138b99a>] __wait_on_bit_lock+0x41/0x8a [<ffffffff8107124e>] __lock_page+0x61/0x68 [<ffffffff810475a3>] ? wake_bit_function+0x0/0x2e [<ffffffff810718c6>] filemap_fault+0xea/0x345 [<ffffffff81084022>] __do_fault+0x50/0x3d3 [<ffffffff81086077>] handle_mm_fault+0x32f/0x65d [<ffffffff8102039d>] ? do_page_fault+0xf4/0x26f [<ffffffff811cacb4>] ? __down_read_trylock+0x46/0x4e [<ffffffff8104b0f3>] ? down_read_trylock+0x3f/0x49 [<ffffffff8102039d>] ? do_page_fault+0xf4/0x26f [<ffffffff81020500>] do_page_fault+0x257/0x26f [<ffffffff8138e2ff>] page_fault+0x1f/0x30 [<ffffffff81083a13>] ? might_fault+0x57/0xa7 [<ffffffff811746e2>] ? btrfs_copy_from_user+0x4f/0x113 [<ffffffff81174771>] ? btrfs_copy_from_user+0xde/0x113 [<ffffffff81174cd0>] btrfs_file_write+0x439/0x6fe [<ffffffff810a5c34>] vfs_write+0xad/0x14e [<ffffffff8105613b>] ? trace_hardirqs_on_caller+0x10c/0x130 [<ffffffff810a5d2a>] sys_pwrite64+0x55/0x74 [<ffffffff81001eeb>] system_call_fastpath+0x16/0x1b 2 locks held by imap/2958: #0: (&sb->s_type->i_mutex_key#4){+.+.+.}, at: [<ffffffff81174a00>] btrfs_file_write+0x169/0x6fe #1: (&mm->mmap_sem){++++++}, at: [<ffffffff8102039d>] do_page_fault+0xf4/0x26f INFO: task flush-btrfs-2:2783 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. flush-btrfs-2 D 0000000000000000 0 2783 2 0x00000000 ffff88010cdcf9d0 0000000000000046 0000000000000000 ffffffff810544cf ffff88010cdcf940 0000000000000000 ffff88010cdcffd8 ffff88010cd18290 000000000000de78 00000000001d2700 00000000001d2700 ffff88010cd18290 Call Trace: [<ffffffff810544cf>] ? trace_hardirqs_off+0xd/0xf [<ffffffff8105613b>] ? trace_hardirqs_on_caller+0x10c/0x130 [<ffffffff81071262>] ? sync_page+0x0/0x48 [<ffffffff8138b46a>] io_schedule+0x38/0x4d [<ffffffff810712a6>] sync_page+0x44/0x48 [<ffffffff8138b99a>] __wait_on_bit_lock+0x41/0x8a [<ffffffff81070ce1>] ? find_get_pages_tag+0x0/0x130 [<ffffffff8107124e>] __lock_page+0x61/0x68 [<ffffffff810475a3>] ? wake_bit_function+0x0/0x2e [<ffffffff811871f6>] T.858+0xf1/0x2cd [<ffffffff8104c4f9>] ? sched_clock_cpu+0xc6/0xd4 [<ffffffff8104c3cd>] ? sched_clock_local+0x1c/0x82 [<ffffffff8104c4f9>] ? sched_clock_cpu+0xc6/0xd4 [<ffffffff810544cf>] ? trace_hardirqs_off+0xd/0xf [<ffffffff811874d3>] extent_writepages+0x3f/0x54 [<ffffffff8116dbd5>] ? btrfs_get_extent+0x0/0x7ee [<ffffffff8116da7e>] btrfs_writepages+0x22/0x24 [<ffffffff81077d12>] do_writepages+0x1f/0x28 [<ffffffff810c0233>] writeback_single_inode+0xf1/0x2f0 [<ffffffff810c0f57>] writeback_inodes_wb+0x3a9/0x4b2 [<ffffffff810c118b>] wb_writeback+0x12b/0x1af [<ffffffff810c1436>] wb_do_writeback+0x17f/0x195 [<ffffffff810c1342>] ? wb_do_writeback+0x8b/0x195 [<ffffffff810c1477>] bdi_writeback_task+0x2b/0x84 [<ffffffff81082a45>] ? bdi_start_fn+0x0/0xcf [<ffffffff81082ab6>] bdi_start_fn+0x71/0xcf [<ffffffff81082a45>] ? bdi_start_fn+0x0/0xcf [<ffffffff81047197>] kthread+0x7a/0x82 [<ffffffff81002c94>] kernel_thread_helper+0x4/0x10 [<ffffffff8138e0fc>] ? restore_args+0x0/0x30 [<ffffffff8104711d>] ? kthread+0x0/0x82 [<ffffffff81002c90>] ? kernel_thread_helper+0x0/0x10 1 lock held by flush-btrfs-2/2783: #0: (&type->s_umount_key#20){++++..}, at: [<ffffffff810c0e82>] writeback_inodes_wb+0x2d4/0x4b2 regards, Johannes
On Sun, Jan 10, 2010 at 09:05:46PM +0100, Johannes Hirte wrote:> I''ve observed this hanging task now several times. Not sure when this started, > but 2.6.32 is affected too, IIRC. I don''t have a test pattern for this. Dovecot > imap triggers this from time to time. I''ve enabled CONFIG_DETECT_HUNG_TASK now > and got this two tasks which hang:You''re stuck on a read, could you please do a sysrq-w when this happens? Also, do you eventually recover or are you stuck forever? -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Adrian von Bidder
2010-Jan-11 07:34 UTC
Re: task imap:2958 blocked for more than 120 seconds
On Sunday 10 January 2010 21.05:46 Johannes Hirte wrote:> I''ve observed this hanging task now several times. Not sure when this > started, but 2.6.32 is affected too, IIRC. I don''t have a test pattern > for this. Dovecot imap triggers this from time to time. I''ve enabled > CONFIG_DETECT_HUNG_TASK now and got this two tasks which hang:I''ve had vey similar hangs occasionally; don''t remember if they were the exact same code path or not. "btrfs-vol -b" on an 2T btrfs fs (raid 1 mode over 4 disks) on an arm CPU has triggered it several times, so it seems a reliable way to reproduce this. Unfortunately log rotation has already killed the messages. Do you want me to re-run? 2.6.32 kernel, btrfs-progs from git about 4 weeks ago. cheers -- vbi -- Place stamp here.
Adrian von Bidder
2010-Jan-11 08:45 UTC
Re: task imap:2958 blocked for more than 120 seconds
On Monday 11 January 2010 08.34:36 Adrian von Bidder wrote:> "btrfs-vol -b" on an 2T btrfs fs (raid 1 mode over 4 disks) on an arm > CPU has triggered it several times, so it seems a reliable way to > reproduce this. >Found it (Debian kernel 2.6.32 on ARM): [78260.386272] INFO: task btrfs-vol:10979 blocked for more than 120 seconds. [78260.386306] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [78260.386331] btrfs-vol D c02b080c 0 10979 1 0x00000001 [78260.386373] [<c02b080c>] (schedule+0x424/0x488) from [<c02b0c9c>] (schedule_timeout+0x1c/0x244) [78260.386408] [<c02b0c9c>] (schedule_timeout+0x1c/0x244) from [<c02b0b10>] (wait_for_common+0xdc/0x178) [78260.386611] [<c02b0b10>] (wait_for_common+0xdc/0x178) from [<bf29b880>] (merge_reloc_roots+0x15c/0x1a4 [btrfs]) [78260.386940] [<bf29b880>] (merge_reloc_roots+0x15c/0x1a4 [btrfs]) from [<bf2a3fd8>] (relocate_block_group+0x548/0x5c8 [btrfs]) [78260.387258] [<bf2a3fd8>] (relocate_block_group+0x548/0x5c8 [btrfs]) from [<bf2a4434>] (btrfs_relocate_block_group+0x17c/0x3a4 [btrfs]) [78260.387564] [<bf2a4434>] (btrfs_relocate_block_group+0x17c/0x3a4 [btrfs]) from [<bf2868e0>] (btrfs_relocate_chunk+0x70/0x7c0 [btrfs]) [78260.387856] [<bf2868e0>] (btrfs_relocate_chunk+0x70/0x7c0 [btrfs]) from [<bf2879f4>] (btrfs_balance+0x370/0x424 [btrfs]) [78260.388148] [<bf2879f4>] (btrfs_balance+0x370/0x424 [btrfs]) from [<bf28d3a8>] (btrfs_ioctl+0x754/0x968 [btrfs]) [78260.388319] [<bf28d3a8>] (btrfs_ioctl+0x754/0x968 [btrfs]) from [<c00d8788>] (vfs_ioctl+0x2c/0x70) [78260.388357] [<c00d8788>] (vfs_ioctl+0x2c/0x70) from [<c00d8e8c>] (do_vfs_ioctl+0x4f4/0x55c) [78260.388390] [<c00d8e8c>] (do_vfs_ioctl+0x4f4/0x55c) from [<c00d8f44>] (sys_ioctl+0x50/0x74) [78260.388423] [<c00d8f44>] (sys_ioctl+0x50/0x74) from [<c0026ea0>] (ret_fast_syscall+0x0/0x28) [78380.381159] INFO: task btrfs-vol:10979 blocked for more than 120 seconds. [78380.381194] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [78380.381219] btrfs-vol D c02b080c 0 10979 1 0x00000001 [78380.381262] [<c02b080c>] (schedule+0x424/0x488) from [<c02b0c9c>] (schedule_timeout+0x1c/0x244) [78380.381297] [<c02b0c9c>] (schedule_timeout+0x1c/0x244) from [<c02b0b10>] (wait_for_common+0xdc/0x178) [78380.381501] [<c02b0b10>] (wait_for_common+0xdc/0x178) from [<bf29b880>] (merge_reloc_roots+0x15c/0x1a4 [btrfs]) [78380.381830] [<bf29b880>] (merge_reloc_roots+0x15c/0x1a4 [btrfs]) from [<bf2a3fd8>] (relocate_block_group+0x548/0x5c8 [btrfs]) [78380.382232] [<bf2a3fd8>] (relocate_block_group+0x548/0x5c8 [btrfs]) from [<bf2a4434>] (btrfs_relocate_block_group+0x17c/0x3a4 [btrfs]) [78380.382545] [<bf2a4434>] (btrfs_relocate_block_group+0x17c/0x3a4 [btrfs]) from [<bf2868e0>] (btrfs_relocate_chunk+0x70/0x7c0 [btrfs]) [78380.382839] [<bf2868e0>] (btrfs_relocate_chunk+0x70/0x7c0 [btrfs]) from [<bf2879f4>] (btrfs_balance+0x370/0x424 [btrfs]) [78380.383131] [<bf2879f4>] (btrfs_balance+0x370/0x424 [btrfs]) from [<bf28d3a8>] (btrfs_ioctl+0x754/0x968 [btrfs]) [78380.383302] [<bf28d3a8>] (btrfs_ioctl+0x754/0x968 [btrfs]) from [<c00d8788>] (vfs_ioctl+0x2c/0x70) [78380.383341] [<c00d8788>] (vfs_ioctl+0x2c/0x70) from [<c00d8e8c>] (do_vfs_ioctl+0x4f4/0x55c) [78380.383374] [<c00d8e8c>] (do_vfs_ioctl+0x4f4/0x55c) from [<c00d8f44>] (sys_ioctl+0x50/0x74) [78380.383408] [<c00d8f44>] (sys_ioctl+0x50/0x74) from [<c0026ea0>] (ret_fast_syscall+0x0/0x28) umount right after some big fs action (not sure, it was either lots of file deletions, a big rsync of some tree, or right after the btrfs-vol stuff) manages to trigger a btrfs related hang, too: [97460.345446] INFO: task umount:12765 blocked for more than 120 seconds. [97460.345481] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [97460.345505] umount D c02b080c 0 12765 12681 0x00000000 [97460.345554] [<c02b080c>] (schedule+0x424/0x488) from [<c00e719c>] (bdi_sched_wait+0xc/0x18) [97460.345592] [<c00e719c>] (bdi_sched_wait+0xc/0x18) from [<c02b0f68>] (__wait_on_bit+0x5c/0xa8) [97460.345625] [<c02b0f68>] (__wait_on_bit+0x5c/0xa8) from [<c02b1060>] (out_of_line_wait_on_bit+0xac/0xc4) [97460.345661] [<c02b1060>] (out_of_line_wait_on_bit+0xac/0xc4) from [<c00e7210>] (sync_inodes_sb+0x68/0x100) [97460.345699] [<c00e7210>] (sync_inodes_sb+0x68/0x100) from [<c00eb340>] (__sync_filesystem+0x64/0x94) [97460.345737] [<c00eb340>] (__sync_filesystem+0x64/0x94) from [<c00cdc74>] (generic_shutdown_super+0x28/0x110) [97460.345776] [<c00cdc74>] (generic_shutdown_super+0x28/0x110) from [<c00cdda8>] (kill_anon_super+0x14/0x3c) [97460.345813] [<c00cdda8>] (kill_anon_super+0x14/0x3c) from [<c00ce46c>] (deactivate_super+0x6c/0x90) [97460.345849] [<c00ce46c>] (deactivate_super+0x6c/0x90) from [<c00e2310>] (sys_umount+0x2bc/0x2e8) [97460.345883] [<c00e2310>] (sys_umount+0x2bc/0x2e8) from [<c0026ea0>] (ret_fast_syscall+0x0/0x28) [97580.340641] INFO: task umount:12765 blocked for more than 120 seconds. [97580.340674] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [97580.340699] umount D c02b080c 0 12765 12681 0x00000000 [97580.340749] [<c02b080c>] (schedule+0x424/0x488) from [<c00e719c>] (bdi_sched_wait+0xc/0x18) [97580.340787] [<c00e719c>] (bdi_sched_wait+0xc/0x18) from [<c02b0f68>] (__wait_on_bit+0x5c/0xa8) [97580.340821] [<c02b0f68>] (__wait_on_bit+0x5c/0xa8) from [<c02b1060>] (out_of_line_wait_on_bit+0xac/0xc4) [97580.340857] [<c02b1060>] (out_of_line_wait_on_bit+0xac/0xc4) from [<c00e7210>] (sync_inodes_sb+0x68/0x100) [97580.340894] [<c00e7210>] (sync_inodes_sb+0x68/0x100) from [<c00eb340>] (__sync_filesystem+0x64/0x94) [97580.340932] [<c00eb340>] (__sync_filesystem+0x64/0x94) from [<c00cdc74>] (generic_shutdown_super+0x28/0x110) [97580.340970] [<c00cdc74>] (generic_shutdown_super+0x28/0x110) from [<c00cdda8>] (kill_anon_super+0x14/0x3c) [97580.341008] [<c00cdda8>] (kill_anon_super+0x14/0x3c) from [<c00ce46c>] (deactivate_super+0x6c/0x90) [97580.341044] [<c00ce46c>] (deactivate_super+0x6c/0x90) from [<c00e2310>] (sys_umount+0x2bc/0x2e8) [97580.341079] [<c00e2310>] (sys_umount+0x2bc/0x2e8) from [<c0026ea0>] (ret_fast_syscall+0x0/0x28) I''ve never had the system or even the affected processes die on me, the end result was always ok. Just took ages. (Ok, btrfs-vol -b taking ages on a big fs is ok. umount taking 10min is a bit over the top, especially since the machine only has 1G ram, so there can''t be that many dirty caches in any case... cheers -- vbi -- featured product: PostgreSQL - http://postgresql.org
Am Sonntag 10 Januar 2010 21:19:26 schrieb Chris Mason:> On Sun, Jan 10, 2010 at 09:05:46PM +0100, Johannes Hirte wrote: > > I''ve observed this hanging task now several times. Not sure when this > > started, but 2.6.32 is affected too, IIRC. I don''t have a test pattern > > for this. Dovecot imap triggers this from time to time. I''ve enabled > > CONFIG_DETECT_HUNG_TASK now > > > and got this two tasks which hang: > You''re stuck on a read, could you please do a sysrq-w when this happens?Will do so when it happens again.> Also, do you eventually recover or are you stuck forever?I didn''t wait too long when it happened, so I''m not sure. The longest time I''ve waited was 20-30min, until reboot without recover. So either it''s stuck forever or it takes really long to recover. And one question I have: How do you identify a read in this call trace? regards, Johannes -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jan 13, 2010 at 11:37:33PM +0100, Johannes Hirte wrote:> Am Sonntag 10 Januar 2010 21:19:26 schrieb Chris Mason: > > On Sun, Jan 10, 2010 at 09:05:46PM +0100, Johannes Hirte wrote: > > > I''ve observed this hanging task now several times. Not sure when this > > > started, but 2.6.32 is affected too, IIRC. I don''t have a test pattern > > > for this. Dovecot imap triggers this from time to time. I''ve enabled > > > CONFIG_DETECT_HUNG_TASK now > > > > > and got this two tasks which hang: > > You''re stuck on a read, could you please do a sysrq-w when this happens? > > Will do so when it happens again.No need, I reread the trace, you''re stuck in copy_from_user and probably hitting a deadlock in the btrfs page faulting path. This is pretty rare, but I''ll put it into the queue to fix up.> > > Also, do you eventually recover or are you stuck forever? > > I didn''t wait too long when it happened, so I''m not sure. The longest time > I''ve waited was 20-30min, until reboot without recover. So either it''s stuck > forever or it takes really long to recover. > > And one question I have: How do you identify a read in this call trace?Lock page followed by sync page is usually for reading. -chris
On Mon, Jan 11, 2010 at 09:45:31AM +0100, Adrian von Bidder wrote:> On Monday 11 January 2010 08.34:36 Adrian von Bidder wrote: > > "btrfs-vol -b" on an 2T btrfs fs (raid 1 mode over 4 disks) on an arm > > CPU has triggered it several times, so it seems a reliable way to > > reproduce this. > > > > Found it (Debian kernel 2.6.32 on ARM): > > [78260.386272] INFO: task btrfs-vol:10979 blocked for more than 120 seconds. > [78260.386306] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > [78260.386331] btrfs-vol D c02b080c 0 10979 1 0x00000001 > [78260.386373] [<c02b080c>] (schedule+0x424/0x488) from [<c02b0c9c>] (schedule_timeout+0x1c/0x244) > [78260.386408] [<c02b0c9c>] (schedule_timeout+0x1c/0x244) from [<c02b0b10>] (wait_for_common+0xdc/0x178) > [78260.386611] [<c02b0b10>] (wait_for_common+0xdc/0x178) from [<bf29b880>] (merge_reloc_roots+0x15c/0x1a4 [btrfs]) > [78260.386940] [<bf29b880>] (merge_reloc_roots+0x15c/0x1a4 [btrfs]) from [<bf2a3fd8>] (relocate_block_group+0x548/0x5c8 [btrfs])Blocking here isn''t a huge surprise, relocation can involve some long held locks. Does this recover? -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Adrian von Bidder
2010-Jan-14 07:03 UTC
Re: task imap:2958 blocked for more than 120 seconds
Heyho! On Thursday 14 January 2010 02.08:29 Chris Mason wrote:> On Mon, Jan 11, 2010 at 09:45:31AM +0100, Adrian von Bidder wrote: > > On Monday 11 January 2010 08.34:36 Adrian von Bidder wrote: > > > "btrfs-vol -b" on an 2T btrfs fs (raid 1 mode over 4 disks) on an arm > > > CPU has triggered it several times, so it seems a reliable way to > > > reproduce this. > > > > Found it (Debian kernel 2.6.32 on ARM): > > > > [78260.386272] INFO: task btrfs-vol:10979 blocked for more than 120 > > seconds. [78260.386306] "echo 0 > > > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > > [78260.386331] btrfs-vol D c02b080c 0 10979 1 0x00000001 > > [78260.386373] [<c02b080c>] (schedule+0x424/0x488) from [<c02b0c9c>] > > (schedule_timeout+0x1c/0x244) [78260.386408] [<c02b0c9c>] > > (schedule_timeout+0x1c/0x244) from [<c02b0b10>] > > (wait_for_common+0xdc/0x178) [78260.386611] [<c02b0b10>] > > (wait_for_common+0xdc/0x178) from [<bf29b880>] > > (merge_reloc_roots+0x15c/0x1a4 [btrfs]) [78260.386940] [<bf29b880>] > > (merge_reloc_roots+0x15c/0x1a4 [btrfs]) from [<bf2a3fd8>] > > (relocate_block_group+0x548/0x5c8 [btrfs]) > > Blocking here isn''t a huge surprise, relocation can involve some long > held locks. Does this recover?Yes, always so far. cheers -- vbi -- Tomorrow we have important delivery and this afternoon out Japanese colleagues decided that 2000 pages describing our software solution is not enough. They need at least 500 extra pages. Fortunately it could be any existing documentation, so we randomly choose POSIX standard Part 1 (535 pages) and the Eclipse basic tutorial (592 pages). -- Michal Čihař
Am Sonntag 10 Januar 2010 21:05:46 schrieb Johannes Hirte:> I''ve observed this hanging task now several times. Not sure when this > started, but 2.6.32 is affected too, IIRC. I don''t have a test pattern for > this. Dovecot imap triggers this from time to time. I''ve enabled > CONFIG_DETECT_HUNG_TASK now and got this two tasks which hang: > > INFO: task imap:2958 blocked for more than 120 seconds. > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > imap D 0000000000000000 0 2958 2653 0x00000000 > ffff88008caf5a28 0000000000000046 0000000000000000 ffffffff810544cf > ffff88008caf5998 0000000000000001 ffff88008caf5fd8 ffff88008caf9530 > 000000000000de78 00000000001d2700 00000000001d2700 ffff88008caf9530 > Call Trace: > [<ffffffff810544cf>] ? trace_hardirqs_off+0xd/0xf > [<ffffffff8105613b>] ? trace_hardirqs_on_caller+0x10c/0x130 > [<ffffffff81071262>] ? sync_page+0x0/0x48 > [<ffffffff8138b46a>] io_schedule+0x38/0x4d > [<ffffffff810712a6>] sync_page+0x44/0x48 > [<ffffffff8138b99a>] __wait_on_bit_lock+0x41/0x8a > [<ffffffff8107124e>] __lock_page+0x61/0x68 > [<ffffffff810475a3>] ? wake_bit_function+0x0/0x2e > [<ffffffff810718c6>] filemap_fault+0xea/0x345 > [<ffffffff81084022>] __do_fault+0x50/0x3d3 > [<ffffffff81086077>] handle_mm_fault+0x32f/0x65d > [<ffffffff8102039d>] ? do_page_fault+0xf4/0x26f > [<ffffffff811cacb4>] ? __down_read_trylock+0x46/0x4e > [<ffffffff8104b0f3>] ? down_read_trylock+0x3f/0x49 > [<ffffffff8102039d>] ? do_page_fault+0xf4/0x26f > [<ffffffff81020500>] do_page_fault+0x257/0x26f > [<ffffffff8138e2ff>] page_fault+0x1f/0x30 > [<ffffffff81083a13>] ? might_fault+0x57/0xa7 > [<ffffffff811746e2>] ? btrfs_copy_from_user+0x4f/0x113 > [<ffffffff81174771>] ? btrfs_copy_from_user+0xde/0x113 > [<ffffffff81174cd0>] btrfs_file_write+0x439/0x6fe > [<ffffffff810a5c34>] vfs_write+0xad/0x14e > [<ffffffff8105613b>] ? trace_hardirqs_on_caller+0x10c/0x130 > [<ffffffff810a5d2a>] sys_pwrite64+0x55/0x74 > [<ffffffff81001eeb>] system_call_fastpath+0x16/0x1b > 2 locks held by imap/2958: > #0: (&sb->s_type->i_mutex_key#4){+.+.+.}, at: [<ffffffff81174a00>] > btrfs_file_write+0x169/0x6fe > #1: (&mm->mmap_sem){++++++}, at: [<ffffffff8102039d>] > do_page_fault+0xf4/0x26f INFO: task flush-btrfs-2:2783 blocked for more > than 120 seconds. > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > flush-btrfs-2 D 0000000000000000 0 2783 2 0x00000000 > ffff88010cdcf9d0 0000000000000046 0000000000000000 ffffffff810544cf > ffff88010cdcf940 0000000000000000 ffff88010cdcffd8 ffff88010cd18290 > 000000000000de78 00000000001d2700 00000000001d2700 ffff88010cd18290 > Call Trace: > [<ffffffff810544cf>] ? trace_hardirqs_off+0xd/0xf > [<ffffffff8105613b>] ? trace_hardirqs_on_caller+0x10c/0x130 > [<ffffffff81071262>] ? sync_page+0x0/0x48 > [<ffffffff8138b46a>] io_schedule+0x38/0x4d > [<ffffffff810712a6>] sync_page+0x44/0x48 > [<ffffffff8138b99a>] __wait_on_bit_lock+0x41/0x8a > [<ffffffff81070ce1>] ? find_get_pages_tag+0x0/0x130 > [<ffffffff8107124e>] __lock_page+0x61/0x68 > [<ffffffff810475a3>] ? wake_bit_function+0x0/0x2e > [<ffffffff811871f6>] T.858+0xf1/0x2cd > [<ffffffff8104c4f9>] ? sched_clock_cpu+0xc6/0xd4 > [<ffffffff8104c3cd>] ? sched_clock_local+0x1c/0x82 > [<ffffffff8104c4f9>] ? sched_clock_cpu+0xc6/0xd4 > [<ffffffff810544cf>] ? trace_hardirqs_off+0xd/0xf > [<ffffffff811874d3>] extent_writepages+0x3f/0x54 > [<ffffffff8116dbd5>] ? btrfs_get_extent+0x0/0x7ee > [<ffffffff8116da7e>] btrfs_writepages+0x22/0x24 > [<ffffffff81077d12>] do_writepages+0x1f/0x28 > [<ffffffff810c0233>] writeback_single_inode+0xf1/0x2f0 > [<ffffffff810c0f57>] writeback_inodes_wb+0x3a9/0x4b2 > [<ffffffff810c118b>] wb_writeback+0x12b/0x1af > [<ffffffff810c1436>] wb_do_writeback+0x17f/0x195 > [<ffffffff810c1342>] ? wb_do_writeback+0x8b/0x195 > [<ffffffff810c1477>] bdi_writeback_task+0x2b/0x84 > [<ffffffff81082a45>] ? bdi_start_fn+0x0/0xcf > [<ffffffff81082ab6>] bdi_start_fn+0x71/0xcf > [<ffffffff81082a45>] ? bdi_start_fn+0x0/0xcf > [<ffffffff81047197>] kthread+0x7a/0x82 > [<ffffffff81002c94>] kernel_thread_helper+0x4/0x10 > [<ffffffff8138e0fc>] ? restore_args+0x0/0x30 > [<ffffffff8104711d>] ? kthread+0x0/0x82 > [<ffffffff81002c90>] ? kernel_thread_helper+0x0/0x10 > 1 lock held by flush-btrfs-2/2783: > #0: (&type->s_umount_key#20){++++..}, at: [<ffffffff810c0e82>] > writeback_inodes_wb+0x2d4/0x4b2 > > regards, > JohannesIt happend again, today I found in the logs: INFO: task imap:2590 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. imap D ffff88002828ddf8 0 2590 2472 0x00000000 ffff88011fa53000 0000000000000086 0000000000000000 ffff8800282923a0 ffff88011b7cc480 0000000000000000 ffff880028294c58 ffff880117a5e000 00000000fffdcf81 ffff880117a5e280 0000000000012340 0000000000012340 Call Trace: [<ffffffff81061de2>] ? sync_page+0x0/0x46 [<ffffffff8134acaa>] ? io_schedule+0x35/0x48 [<ffffffff81061e23>] ? sync_page+0x41/0x46 [<ffffffff8134b185>] ? __wait_on_bit_lock+0x3c/0x85 [<ffffffff81061dcf>] ? __lock_page+0x5d/0x63 [<ffffffff81042877>] ? wake_bit_function+0x0/0x2e [<ffffffff810623da>] ? filemap_fault+0xcc/0x305 [<ffffffff810733e6>] ? __do_fault+0x52/0x3f0 [<ffffffff810753fa>] ? handle_mm_fault+0x346/0x6b6 [<ffffffff8101ecff>] ? do_page_fault+0x264/0x280 [<ffffffff8134c91f>] ? page_fault+0x1f/0x30 [<ffffffff81157b12>] ? btrfs_copy_from_user+0x50/0x109 [<ffffffff81157b94>] ? btrfs_copy_from_user+0xd2/0x109 [<ffffffff811580f5>] ? btrfs_file_write+0x448/0x6db [<ffffffff81093aff>] ? vfs_write+0xa8/0x14c [<ffffffff81093bf6>] ? sys_pwrite64+0x53/0x71 [<ffffffff81001eab>] ? system_call_fastpath+0x16/0x1b and btrfsck said: datengrab ~ # btrfsck /dev/mapper/sdb root 5 inode 842742 errors 400 found 186575216640 bytes used err is 1 total csum bytes: 181076596 total tree bytes: 1147568128 total fs tree bytes: 884981760 btree space waste bytes: 289677575 file data blocks allocated: 191211343872 referenced 185262080000 Btrfs v0.19-4-gab8fb4c I was able to "fix" this by copying the affected directory to another location on the same disc, delete the old directory and rename the copy to the old one. After this the deadlock and the FS corruption was gone. regards, Johannes