thr3ads.net - Btrfs devel - task imap:2958 blocked for more than 120 seconds [Jan 2010]

If this information is useful, please help other people find it:
Share via:

Johannes Hirte

2010-Jan-10 20:05 UTC

task imap:2958 blocked for more than 120 seconds

I''ve observed this hanging task now several times. Not sure when this
started,
but 2.6.32 is affected too, IIRC. I don''t have a test pattern for this.
Dovecot
imap triggers this from time to time. I''ve enabled
CONFIG_DETECT_HUNG_TASK now
and got this two tasks which hang:

INFO: task imap:2958 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
imap          D 0000000000000000     0  2958   2653 0x00000000
 ffff88008caf5a28 0000000000000046 0000000000000000 ffffffff810544cf
 ffff88008caf5998 0000000000000001 ffff88008caf5fd8 ffff88008caf9530
 000000000000de78 00000000001d2700 00000000001d2700 ffff88008caf9530
Call Trace:
 [<ffffffff810544cf>] ? trace_hardirqs_off+0xd/0xf
 [<ffffffff8105613b>] ? trace_hardirqs_on_caller+0x10c/0x130
 [<ffffffff81071262>] ? sync_page+0x0/0x48
 [<ffffffff8138b46a>] io_schedule+0x38/0x4d
 [<ffffffff810712a6>] sync_page+0x44/0x48
 [<ffffffff8138b99a>] __wait_on_bit_lock+0x41/0x8a
 [<ffffffff8107124e>] __lock_page+0x61/0x68
 [<ffffffff810475a3>] ? wake_bit_function+0x0/0x2e
 [<ffffffff810718c6>] filemap_fault+0xea/0x345
 [<ffffffff81084022>] __do_fault+0x50/0x3d3
 [<ffffffff81086077>] handle_mm_fault+0x32f/0x65d
 [<ffffffff8102039d>] ? do_page_fault+0xf4/0x26f
 [<ffffffff811cacb4>] ? __down_read_trylock+0x46/0x4e
 [<ffffffff8104b0f3>] ? down_read_trylock+0x3f/0x49
 [<ffffffff8102039d>] ? do_page_fault+0xf4/0x26f
 [<ffffffff81020500>] do_page_fault+0x257/0x26f
 [<ffffffff8138e2ff>] page_fault+0x1f/0x30
 [<ffffffff81083a13>] ? might_fault+0x57/0xa7
 [<ffffffff811746e2>] ? btrfs_copy_from_user+0x4f/0x113
 [<ffffffff81174771>] ? btrfs_copy_from_user+0xde/0x113
 [<ffffffff81174cd0>] btrfs_file_write+0x439/0x6fe
 [<ffffffff810a5c34>] vfs_write+0xad/0x14e
 [<ffffffff8105613b>] ? trace_hardirqs_on_caller+0x10c/0x130
 [<ffffffff810a5d2a>] sys_pwrite64+0x55/0x74
 [<ffffffff81001eeb>] system_call_fastpath+0x16/0x1b
2 locks held by imap/2958:
 #0:  (&sb->s_type->i_mutex_key#4){+.+.+.}, at:
[<ffffffff81174a00>]
btrfs_file_write+0x169/0x6fe                                          
 #1:  (&mm->mmap_sem){++++++}, at: [<ffffffff8102039d>]
do_page_fault+0xf4/0x26f
INFO: task flush-btrfs-2:2783 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
flush-btrfs-2 D 0000000000000000     0  2783      2 0x00000000
 ffff88010cdcf9d0 0000000000000046 0000000000000000 ffffffff810544cf
 ffff88010cdcf940 0000000000000000 ffff88010cdcffd8 ffff88010cd18290
 000000000000de78 00000000001d2700 00000000001d2700 ffff88010cd18290
Call Trace:
 [<ffffffff810544cf>] ? trace_hardirqs_off+0xd/0xf
 [<ffffffff8105613b>] ? trace_hardirqs_on_caller+0x10c/0x130
 [<ffffffff81071262>] ? sync_page+0x0/0x48
 [<ffffffff8138b46a>] io_schedule+0x38/0x4d
 [<ffffffff810712a6>] sync_page+0x44/0x48
 [<ffffffff8138b99a>] __wait_on_bit_lock+0x41/0x8a
 [<ffffffff81070ce1>] ? find_get_pages_tag+0x0/0x130
 [<ffffffff8107124e>] __lock_page+0x61/0x68
 [<ffffffff810475a3>] ? wake_bit_function+0x0/0x2e
 [<ffffffff811871f6>] T.858+0xf1/0x2cd
 [<ffffffff8104c4f9>] ? sched_clock_cpu+0xc6/0xd4
 [<ffffffff8104c3cd>] ? sched_clock_local+0x1c/0x82
 [<ffffffff8104c4f9>] ? sched_clock_cpu+0xc6/0xd4
 [<ffffffff810544cf>] ? trace_hardirqs_off+0xd/0xf
 [<ffffffff811874d3>] extent_writepages+0x3f/0x54
 [<ffffffff8116dbd5>] ? btrfs_get_extent+0x0/0x7ee
 [<ffffffff8116da7e>] btrfs_writepages+0x22/0x24
 [<ffffffff81077d12>] do_writepages+0x1f/0x28
 [<ffffffff810c0233>] writeback_single_inode+0xf1/0x2f0
 [<ffffffff810c0f57>] writeback_inodes_wb+0x3a9/0x4b2
 [<ffffffff810c118b>] wb_writeback+0x12b/0x1af
 [<ffffffff810c1436>] wb_do_writeback+0x17f/0x195
 [<ffffffff810c1342>] ? wb_do_writeback+0x8b/0x195
 [<ffffffff810c1477>] bdi_writeback_task+0x2b/0x84
 [<ffffffff81082a45>] ? bdi_start_fn+0x0/0xcf
 [<ffffffff81082ab6>] bdi_start_fn+0x71/0xcf
 [<ffffffff81082a45>] ? bdi_start_fn+0x0/0xcf
 [<ffffffff81047197>] kthread+0x7a/0x82
 [<ffffffff81002c94>] kernel_thread_helper+0x4/0x10
 [<ffffffff8138e0fc>] ? restore_args+0x0/0x30
 [<ffffffff8104711d>] ? kthread+0x0/0x82
 [<ffffffff81002c90>] ? kernel_thread_helper+0x0/0x10
1 lock held by flush-btrfs-2/2783:
 #0:  (&type->s_umount_key#20){++++..}, at: [<ffffffff810c0e82>] 
writeback_inodes_wb+0x2d4/0x4b2

regards,
  Johannes

Chris Mason

2010-Jan-10 20:19 UTC

head link

Re: task imap:2958 blocked for more than 120 seconds

On Sun, Jan 10, 2010 at 09:05:46PM +0100, Johannes Hirte
wrote:> I''ve observed this hanging task now several times. Not sure when
this started,
> but 2.6.32 is affected too, IIRC. I don''t have a test pattern for
this. Dovecot
> imap triggers this from time to time. I''ve enabled
CONFIG_DETECT_HUNG_TASK now
> and got this two tasks which hang:
You''re stuck on a read, could you please do a sysrq-w when this
happens?
Also, do you eventually recover or are you stuck forever?

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Adrian von Bidder

2010-Jan-11 07:34 UTC

head link

Re: task imap:2958 blocked for more than 120 seconds

On Sunday 10 January 2010 21.05:46 Johannes Hirte wrote:> I''ve observed this hanging task now several times. Not sure when
this
>  started, but 2.6.32 is affected too, IIRC. I don''t have a test
pattern
>  for this. Dovecot imap triggers this from time to time. I''ve
enabled
>  CONFIG_DETECT_HUNG_TASK now and got this two tasks which hang:
I''ve had vey similar hangs occasionally; don''t remember if
they were the
exact same code path or not.

"btrfs-vol -b" on an 2T btrfs fs (raid 1 mode over 4 disks) on an arm
CPU
has triggered it several times, so it seems a reliable way to reproduce 
this.

Unfortunately log rotation has already killed the messages.  Do you want me 
to re-run?  2.6.32 kernel, btrfs-progs from git about 4 weeks ago.

cheers
-- vbi

-- 
Place stamp here.

Adrian von Bidder

2010-Jan-11 08:45 UTC

head link

Re: task imap:2958 blocked for more than 120 seconds

On Monday 11 January 2010 08.34:36 Adrian von Bidder
wrote:> "btrfs-vol -b" on an 2T btrfs fs (raid 1 mode over 4 disks) on an
arm
>  CPU  has triggered it several times, so it seems a reliable way to
>  reproduce this.
> 
Found it (Debian kernel 2.6.32 on ARM):

[78260.386272] INFO: task btrfs-vol:10979 blocked for more than 120 seconds.
[78260.386306] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[78260.386331] btrfs-vol     D c02b080c     0 10979      1 0x00000001
[78260.386373] [<c02b080c>] (schedule+0x424/0x488) from [<c02b0c9c>]
(schedule_timeout+0x1c/0x244)
[78260.386408] [<c02b0c9c>] (schedule_timeout+0x1c/0x244) from
[<c02b0b10>] (wait_for_common+0xdc/0x178)
[78260.386611] [<c02b0b10>] (wait_for_common+0xdc/0x178) from
[<bf29b880>] (merge_reloc_roots+0x15c/0x1a4 [btrfs])
[78260.386940] [<bf29b880>] (merge_reloc_roots+0x15c/0x1a4 [btrfs]) from
[<bf2a3fd8>] (relocate_block_group+0x548/0x5c8 [btrfs])
[78260.387258] [<bf2a3fd8>] (relocate_block_group+0x548/0x5c8 [btrfs])
from [<bf2a4434>] (btrfs_relocate_block_group+0x17c/0x3a4 [btrfs])
[78260.387564] [<bf2a4434>] (btrfs_relocate_block_group+0x17c/0x3a4
[btrfs]) from [<bf2868e0>] (btrfs_relocate_chunk+0x70/0x7c0 [btrfs])
[78260.387856] [<bf2868e0>] (btrfs_relocate_chunk+0x70/0x7c0 [btrfs]) from
[<bf2879f4>] (btrfs_balance+0x370/0x424 [btrfs])
[78260.388148] [<bf2879f4>] (btrfs_balance+0x370/0x424 [btrfs]) from
[<bf28d3a8>] (btrfs_ioctl+0x754/0x968 [btrfs])
[78260.388319] [<bf28d3a8>] (btrfs_ioctl+0x754/0x968 [btrfs]) from
[<c00d8788>] (vfs_ioctl+0x2c/0x70)
[78260.388357] [<c00d8788>] (vfs_ioctl+0x2c/0x70) from [<c00d8e8c>]
(do_vfs_ioctl+0x4f4/0x55c)
[78260.388390] [<c00d8e8c>] (do_vfs_ioctl+0x4f4/0x55c) from
[<c00d8f44>] (sys_ioctl+0x50/0x74)
[78260.388423] [<c00d8f44>] (sys_ioctl+0x50/0x74) from [<c0026ea0>]
(ret_fast_syscall+0x0/0x28)
[78380.381159] INFO: task btrfs-vol:10979 blocked for more than 120 seconds.
[78380.381194] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[78380.381219] btrfs-vol     D c02b080c     0 10979      1 0x00000001
[78380.381262] [<c02b080c>] (schedule+0x424/0x488) from [<c02b0c9c>]
(schedule_timeout+0x1c/0x244)
[78380.381297] [<c02b0c9c>] (schedule_timeout+0x1c/0x244) from
[<c02b0b10>] (wait_for_common+0xdc/0x178)
[78380.381501] [<c02b0b10>] (wait_for_common+0xdc/0x178) from
[<bf29b880>] (merge_reloc_roots+0x15c/0x1a4 [btrfs])
[78380.381830] [<bf29b880>] (merge_reloc_roots+0x15c/0x1a4 [btrfs]) from
[<bf2a3fd8>] (relocate_block_group+0x548/0x5c8 [btrfs])
[78380.382232] [<bf2a3fd8>] (relocate_block_group+0x548/0x5c8 [btrfs])
from [<bf2a4434>] (btrfs_relocate_block_group+0x17c/0x3a4 [btrfs])
[78380.382545] [<bf2a4434>] (btrfs_relocate_block_group+0x17c/0x3a4
[btrfs]) from [<bf2868e0>] (btrfs_relocate_chunk+0x70/0x7c0 [btrfs])
[78380.382839] [<bf2868e0>] (btrfs_relocate_chunk+0x70/0x7c0 [btrfs]) from
[<bf2879f4>] (btrfs_balance+0x370/0x424 [btrfs])
[78380.383131] [<bf2879f4>] (btrfs_balance+0x370/0x424 [btrfs]) from
[<bf28d3a8>] (btrfs_ioctl+0x754/0x968 [btrfs])
[78380.383302] [<bf28d3a8>] (btrfs_ioctl+0x754/0x968 [btrfs]) from
[<c00d8788>] (vfs_ioctl+0x2c/0x70)
[78380.383341] [<c00d8788>] (vfs_ioctl+0x2c/0x70) from [<c00d8e8c>]
(do_vfs_ioctl+0x4f4/0x55c)
[78380.383374] [<c00d8e8c>] (do_vfs_ioctl+0x4f4/0x55c) from
[<c00d8f44>] (sys_ioctl+0x50/0x74)
[78380.383408] [<c00d8f44>] (sys_ioctl+0x50/0x74) from [<c0026ea0>]
(ret_fast_syscall+0x0/0x28)

umount right after some big fs action (not sure, it was either lots of 
file deletions, a big rsync of some tree, or right after the btrfs-vol
stuff) manages to trigger a btrfs related hang, too:

[97460.345446] INFO: task umount:12765 blocked for more than 120 seconds.
[97460.345481] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[97460.345505] umount        D c02b080c     0 12765  12681 0x00000000
[97460.345554] [<c02b080c>] (schedule+0x424/0x488) from [<c00e719c>]
(bdi_sched_wait+0xc/0x18)
[97460.345592] [<c00e719c>] (bdi_sched_wait+0xc/0x18) from
[<c02b0f68>] (__wait_on_bit+0x5c/0xa8)
[97460.345625] [<c02b0f68>] (__wait_on_bit+0x5c/0xa8) from
[<c02b1060>] (out_of_line_wait_on_bit+0xac/0xc4)
[97460.345661] [<c02b1060>] (out_of_line_wait_on_bit+0xac/0xc4) from
[<c00e7210>] (sync_inodes_sb+0x68/0x100)
[97460.345699] [<c00e7210>] (sync_inodes_sb+0x68/0x100) from
[<c00eb340>] (__sync_filesystem+0x64/0x94)
[97460.345737] [<c00eb340>] (__sync_filesystem+0x64/0x94) from
[<c00cdc74>] (generic_shutdown_super+0x28/0x110)
[97460.345776] [<c00cdc74>] (generic_shutdown_super+0x28/0x110) from
[<c00cdda8>] (kill_anon_super+0x14/0x3c)
[97460.345813] [<c00cdda8>] (kill_anon_super+0x14/0x3c) from
[<c00ce46c>] (deactivate_super+0x6c/0x90)
[97460.345849] [<c00ce46c>] (deactivate_super+0x6c/0x90) from
[<c00e2310>] (sys_umount+0x2bc/0x2e8)
[97460.345883] [<c00e2310>] (sys_umount+0x2bc/0x2e8) from
[<c0026ea0>] (ret_fast_syscall+0x0/0x28)
[97580.340641] INFO: task umount:12765 blocked for more than 120 seconds.
[97580.340674] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[97580.340699] umount        D c02b080c     0 12765  12681 0x00000000
[97580.340749] [<c02b080c>] (schedule+0x424/0x488) from [<c00e719c>]
(bdi_sched_wait+0xc/0x18)
[97580.340787] [<c00e719c>] (bdi_sched_wait+0xc/0x18) from
[<c02b0f68>] (__wait_on_bit+0x5c/0xa8)
[97580.340821] [<c02b0f68>] (__wait_on_bit+0x5c/0xa8) from
[<c02b1060>] (out_of_line_wait_on_bit+0xac/0xc4)
[97580.340857] [<c02b1060>] (out_of_line_wait_on_bit+0xac/0xc4) from
[<c00e7210>] (sync_inodes_sb+0x68/0x100)
[97580.340894] [<c00e7210>] (sync_inodes_sb+0x68/0x100) from
[<c00eb340>] (__sync_filesystem+0x64/0x94)
[97580.340932] [<c00eb340>] (__sync_filesystem+0x64/0x94) from
[<c00cdc74>] (generic_shutdown_super+0x28/0x110)
[97580.340970] [<c00cdc74>] (generic_shutdown_super+0x28/0x110) from
[<c00cdda8>] (kill_anon_super+0x14/0x3c)
[97580.341008] [<c00cdda8>] (kill_anon_super+0x14/0x3c) from
[<c00ce46c>] (deactivate_super+0x6c/0x90)
[97580.341044] [<c00ce46c>] (deactivate_super+0x6c/0x90) from
[<c00e2310>] (sys_umount+0x2bc/0x2e8)
[97580.341079] [<c00e2310>] (sys_umount+0x2bc/0x2e8) from
[<c0026ea0>] (ret_fast_syscall+0x0/0x28)


I''ve never had the system or even the affected processes die on me, the
end result was always ok.  Just took ages.  (Ok, btrfs-vol -b taking ages
on a big fs is ok.  umount taking 10min is a bit over the top, especially
since the machine only has 1G ram, so there can''t be that many dirty
caches
in any case...

cheers
-- vbi



-- 
featured product: PostgreSQL - http://postgresql.org

Johannes Hirte

2010-Jan-13 22:37 UTC

head link

Re: task imap:2958 blocked for more than 120 seconds

Am Sonntag 10 Januar 2010 21:19:26 schrieb Chris Mason:> On Sun, Jan 10, 2010 at 09:05:46PM +0100, Johannes Hirte wrote:
> > I''ve observed this hanging task now several times. Not sure
when this
> > started, but 2.6.32 is affected too, IIRC. I don''t have a
test pattern
> > for this. Dovecot imap triggers this from time to time. I''ve
enabled
> > CONFIG_DETECT_HUNG_TASK now
> 
> > and got this two tasks which hang:
> You''re stuck on a read, could you please do a sysrq-w when this
happens?
Will do so when it happens again.
> Also, do you eventually recover or are you stuck forever?
I didn''t wait too long when it happened, so I''m not sure.  The
longest time
I''ve waited was 20-30min, until reboot without recover. So either
it''s stuck
forever or it takes really long to recover.

And one question I have: How do you identify a read in this call trace? 

regards,
  Johannes
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Chris Mason

2010-Jan-14 01:07 UTC

head link

Re: task imap:2958 blocked for more than 120 seconds

On Wed, Jan 13, 2010 at 11:37:33PM +0100, Johannes Hirte
wrote:> Am Sonntag 10 Januar 2010 21:19:26 schrieb Chris Mason:
> > On Sun, Jan 10, 2010 at 09:05:46PM +0100, Johannes Hirte wrote:
> > > I''ve observed this hanging task now several times. Not
sure when this
> > > started, but 2.6.32 is affected too, IIRC. I don''t have
a test pattern
> > > for this. Dovecot imap triggers this from time to time.
I''ve enabled
> > > CONFIG_DETECT_HUNG_TASK now
> > 
> > > and got this two tasks which hang:
> > You''re stuck on a read, could you please do a sysrq-w when
this happens?
> 
> Will do so when it happens again.
No need, I reread the trace, you''re stuck in copy_from_user and
probably
hitting a deadlock in the btrfs page faulting path.  This is pretty
rare, but I''ll put it into the queue to fix up.> 
> > Also, do you eventually recover or are you stuck forever?
> 
> I didn''t wait too long when it happened, so I''m not sure.
The longest time
> I''ve waited was 20-30min, until reboot without recover. So either
it''s stuck
> forever or it takes really long to recover.
> 
> And one question I have: How do you identify a read in this call trace? 
Lock page followed by sync page is usually for reading.

-chris

Chris Mason

2010-Jan-14 01:08 UTC

head link

Re: task imap:2958 blocked for more than 120 seconds

On Mon, Jan 11, 2010 at 09:45:31AM +0100, Adrian von Bidder
wrote:> On Monday 11 January 2010 08.34:36 Adrian von Bidder wrote:
> > "btrfs-vol -b" on an 2T btrfs fs (raid 1 mode over 4 disks)
on an arm
> >  CPU  has triggered it several times, so it seems a reliable way to
> >  reproduce this.
> > 
> 
> Found it (Debian kernel 2.6.32 on ARM):
> 
> [78260.386272] INFO: task btrfs-vol:10979 blocked for more than 120
seconds.
> [78260.386306] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [78260.386331] btrfs-vol     D c02b080c     0 10979      1 0x00000001
> [78260.386373] [<c02b080c>] (schedule+0x424/0x488) from
[<c02b0c9c>] (schedule_timeout+0x1c/0x244)
> [78260.386408] [<c02b0c9c>] (schedule_timeout+0x1c/0x244) from
[<c02b0b10>] (wait_for_common+0xdc/0x178)
> [78260.386611] [<c02b0b10>] (wait_for_common+0xdc/0x178) from
[<bf29b880>] (merge_reloc_roots+0x15c/0x1a4 [btrfs])
> [78260.386940] [<bf29b880>] (merge_reloc_roots+0x15c/0x1a4 [btrfs])
from [<bf2a3fd8>] (relocate_block_group+0x548/0x5c8 [btrfs])
Blocking here isn''t a huge surprise, relocation can involve some long
held locks.  Does this recover?

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Adrian von Bidder

2010-Jan-14 07:03 UTC

head link

Re: task imap:2958 blocked for more than 120 seconds

Heyho!

On Thursday 14 January 2010 02.08:29 Chris Mason wrote:> On Mon, Jan 11, 2010 at 09:45:31AM +0100, Adrian von Bidder wrote:
> > On Monday 11 January 2010 08.34:36 Adrian von Bidder wrote:
> > > "btrfs-vol -b" on an 2T btrfs fs (raid 1 mode over 4
disks) on an arm
> > >  CPU  has triggered it several times, so it seems a reliable way
to
> > >  reproduce this.
> >
> > Found it (Debian kernel 2.6.32 on ARM):
> >
> > [78260.386272] INFO: task btrfs-vol:10979 blocked for more than 120
> > seconds. [78260.386306] "echo 0 >
> > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > [78260.386331] btrfs-vol     D c02b080c     0 10979      1 0x00000001
> > [78260.386373] [<c02b080c>] (schedule+0x424/0x488) from
[<c02b0c9c>]
> > (schedule_timeout+0x1c/0x244) [78260.386408] [<c02b0c9c>]
> > (schedule_timeout+0x1c/0x244) from [<c02b0b10>]
> > (wait_for_common+0xdc/0x178) [78260.386611] [<c02b0b10>]
> > (wait_for_common+0xdc/0x178) from [<bf29b880>]
> > (merge_reloc_roots+0x15c/0x1a4 [btrfs]) [78260.386940]
[<bf29b880>]
> > (merge_reloc_roots+0x15c/0x1a4 [btrfs]) from [<bf2a3fd8>]
> > (relocate_block_group+0x548/0x5c8 [btrfs])
> 
> Blocking here isn''t a huge surprise, relocation can involve some
long
> held locks.  Does this recover?
Yes, always so far.

cheers
-- vbi

-- 
Tomorrow we have important delivery and this afternoon out Japanese
colleagues decided that 2000 pages describing our software solution is
not enough. They need at least 500 extra pages. Fortunately it could be
any existing documentation, so we randomly choose POSIX standard Part 1
(535 pages) and the Eclipse basic tutorial (592 pages).
        -- Michal Čihař

Johannes Hirte

2010-Mar-07 15:52 UTC

head link

Re: task imap:2958 blocked for more than 120 seconds

Am Sonntag 10 Januar 2010 21:05:46 schrieb Johannes
Hirte:> I''ve observed this hanging task now several times. Not sure when
this
> started, but 2.6.32 is affected too, IIRC. I don''t have a test
pattern for
> this. Dovecot imap triggers this from time to time. I''ve enabled
> CONFIG_DETECT_HUNG_TASK now and got this two tasks which hang:
> 
> INFO: task imap:2958 blocked for more than 120 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
this message.
> imap          D 0000000000000000     0  2958   2653 0x00000000
>  ffff88008caf5a28 0000000000000046 0000000000000000 ffffffff810544cf
>  ffff88008caf5998 0000000000000001 ffff88008caf5fd8 ffff88008caf9530
>  000000000000de78 00000000001d2700 00000000001d2700 ffff88008caf9530
> Call Trace:
>  [<ffffffff810544cf>] ? trace_hardirqs_off+0xd/0xf
>  [<ffffffff8105613b>] ? trace_hardirqs_on_caller+0x10c/0x130
>  [<ffffffff81071262>] ? sync_page+0x0/0x48
>  [<ffffffff8138b46a>] io_schedule+0x38/0x4d
>  [<ffffffff810712a6>] sync_page+0x44/0x48
>  [<ffffffff8138b99a>] __wait_on_bit_lock+0x41/0x8a
>  [<ffffffff8107124e>] __lock_page+0x61/0x68
>  [<ffffffff810475a3>] ? wake_bit_function+0x0/0x2e
>  [<ffffffff810718c6>] filemap_fault+0xea/0x345
>  [<ffffffff81084022>] __do_fault+0x50/0x3d3
>  [<ffffffff81086077>] handle_mm_fault+0x32f/0x65d
>  [<ffffffff8102039d>] ? do_page_fault+0xf4/0x26f
>  [<ffffffff811cacb4>] ? __down_read_trylock+0x46/0x4e
>  [<ffffffff8104b0f3>] ? down_read_trylock+0x3f/0x49
>  [<ffffffff8102039d>] ? do_page_fault+0xf4/0x26f
>  [<ffffffff81020500>] do_page_fault+0x257/0x26f
>  [<ffffffff8138e2ff>] page_fault+0x1f/0x30
>  [<ffffffff81083a13>] ? might_fault+0x57/0xa7
>  [<ffffffff811746e2>] ? btrfs_copy_from_user+0x4f/0x113
>  [<ffffffff81174771>] ? btrfs_copy_from_user+0xde/0x113
>  [<ffffffff81174cd0>] btrfs_file_write+0x439/0x6fe
>  [<ffffffff810a5c34>] vfs_write+0xad/0x14e
>  [<ffffffff8105613b>] ? trace_hardirqs_on_caller+0x10c/0x130
>  [<ffffffff810a5d2a>] sys_pwrite64+0x55/0x74
>  [<ffffffff81001eeb>] system_call_fastpath+0x16/0x1b
> 2 locks held by imap/2958:
>  #0:  (&sb->s_type->i_mutex_key#4){+.+.+.}, at:
[<ffffffff81174a00>]
> btrfs_file_write+0x169/0x6fe
>  #1:  (&mm->mmap_sem){++++++}, at: [<ffffffff8102039d>]
> do_page_fault+0xf4/0x26f INFO: task flush-btrfs-2:2783 blocked for more
> than 120 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
this message.
> flush-btrfs-2 D 0000000000000000     0  2783      2 0x00000000
>  ffff88010cdcf9d0 0000000000000046 0000000000000000 ffffffff810544cf
>  ffff88010cdcf940 0000000000000000 ffff88010cdcffd8 ffff88010cd18290
>  000000000000de78 00000000001d2700 00000000001d2700 ffff88010cd18290
> Call Trace:
>  [<ffffffff810544cf>] ? trace_hardirqs_off+0xd/0xf
>  [<ffffffff8105613b>] ? trace_hardirqs_on_caller+0x10c/0x130
>  [<ffffffff81071262>] ? sync_page+0x0/0x48
>  [<ffffffff8138b46a>] io_schedule+0x38/0x4d
>  [<ffffffff810712a6>] sync_page+0x44/0x48
>  [<ffffffff8138b99a>] __wait_on_bit_lock+0x41/0x8a
>  [<ffffffff81070ce1>] ? find_get_pages_tag+0x0/0x130
>  [<ffffffff8107124e>] __lock_page+0x61/0x68
>  [<ffffffff810475a3>] ? wake_bit_function+0x0/0x2e
>  [<ffffffff811871f6>] T.858+0xf1/0x2cd
>  [<ffffffff8104c4f9>] ? sched_clock_cpu+0xc6/0xd4
>  [<ffffffff8104c3cd>] ? sched_clock_local+0x1c/0x82
>  [<ffffffff8104c4f9>] ? sched_clock_cpu+0xc6/0xd4
>  [<ffffffff810544cf>] ? trace_hardirqs_off+0xd/0xf
>  [<ffffffff811874d3>] extent_writepages+0x3f/0x54
>  [<ffffffff8116dbd5>] ? btrfs_get_extent+0x0/0x7ee
>  [<ffffffff8116da7e>] btrfs_writepages+0x22/0x24
>  [<ffffffff81077d12>] do_writepages+0x1f/0x28
>  [<ffffffff810c0233>] writeback_single_inode+0xf1/0x2f0
>  [<ffffffff810c0f57>] writeback_inodes_wb+0x3a9/0x4b2
>  [<ffffffff810c118b>] wb_writeback+0x12b/0x1af
>  [<ffffffff810c1436>] wb_do_writeback+0x17f/0x195
>  [<ffffffff810c1342>] ? wb_do_writeback+0x8b/0x195
>  [<ffffffff810c1477>] bdi_writeback_task+0x2b/0x84
>  [<ffffffff81082a45>] ? bdi_start_fn+0x0/0xcf
>  [<ffffffff81082ab6>] bdi_start_fn+0x71/0xcf
>  [<ffffffff81082a45>] ? bdi_start_fn+0x0/0xcf
>  [<ffffffff81047197>] kthread+0x7a/0x82
>  [<ffffffff81002c94>] kernel_thread_helper+0x4/0x10
>  [<ffffffff8138e0fc>] ? restore_args+0x0/0x30
>  [<ffffffff8104711d>] ? kthread+0x0/0x82
>  [<ffffffff81002c90>] ? kernel_thread_helper+0x0/0x10
> 1 lock held by flush-btrfs-2/2783:
>  #0:  (&type->s_umount_key#20){++++..}, at:
[<ffffffff810c0e82>]
> writeback_inodes_wb+0x2d4/0x4b2
> 
> regards,
>   Johannes
It happend again, today I found in the logs:

INFO: task imap:2590 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
imap          D ffff88002828ddf8     0  2590   2472 0x00000000
 ffff88011fa53000 0000000000000086 0000000000000000 ffff8800282923a0
 ffff88011b7cc480 0000000000000000 ffff880028294c58 ffff880117a5e000
 00000000fffdcf81 ffff880117a5e280 0000000000012340 0000000000012340
Call Trace:
 [<ffffffff81061de2>] ? sync_page+0x0/0x46
 [<ffffffff8134acaa>] ? io_schedule+0x35/0x48
 [<ffffffff81061e23>] ? sync_page+0x41/0x46
 [<ffffffff8134b185>] ? __wait_on_bit_lock+0x3c/0x85
 [<ffffffff81061dcf>] ? __lock_page+0x5d/0x63
 [<ffffffff81042877>] ? wake_bit_function+0x0/0x2e
 [<ffffffff810623da>] ? filemap_fault+0xcc/0x305
 [<ffffffff810733e6>] ? __do_fault+0x52/0x3f0
 [<ffffffff810753fa>] ? handle_mm_fault+0x346/0x6b6
 [<ffffffff8101ecff>] ? do_page_fault+0x264/0x280
 [<ffffffff8134c91f>] ? page_fault+0x1f/0x30
 [<ffffffff81157b12>] ? btrfs_copy_from_user+0x50/0x109
 [<ffffffff81157b94>] ? btrfs_copy_from_user+0xd2/0x109
 [<ffffffff811580f5>] ? btrfs_file_write+0x448/0x6db
 [<ffffffff81093aff>] ? vfs_write+0xa8/0x14c
 [<ffffffff81093bf6>] ? sys_pwrite64+0x53/0x71
 [<ffffffff81001eab>] ? system_call_fastpath+0x16/0x1b

and btrfsck said:

datengrab ~ # btrfsck /dev/mapper/sdb 
root 5 inode 842742 errors 400
found 186575216640 bytes used err is 1
total csum bytes: 181076596
total tree bytes: 1147568128
total fs tree bytes: 884981760
btree space waste bytes: 289677575
file data blocks allocated: 191211343872
 referenced 185262080000
Btrfs v0.19-4-gab8fb4c

I was able to "fix" this by copying the affected directory to another
location
on the same disc, delete the old directory and rename the copy to the old one. 
After this the deadlock and the FS corruption was gone.

regards,
  Johannes

Btrfs devel - Jan 2010 - task imap:2958 blocked for more than 120 seconds

task imap:2958 blocked for more than 120 seconds

Re: task imap:2958 blocked for more than 120 seconds

Re: task imap:2958 blocked for more than 120 seconds

Re: task imap:2958 blocked for more than 120 seconds

Re: task imap:2958 blocked for more than 120 seconds

Re: task imap:2958 blocked for more than 120 seconds

Re: task imap:2958 blocked for more than 120 seconds

Re: task imap:2958 blocked for more than 120 seconds

Re: task imap:2958 blocked for more than 120 seconds