thr3ads.net - Btrfs devel - btrfs: hanging processes

If this information is useful, please help other people find it:
Share via:

Jerome Ibanes

2010-Jun-10 17:41 UTC

btrfs: hanging processes - race condition?

List,

I ran into a hang issue (race condition: cpu is high when the server is 
idle, meaning that btrfs is hanging, and IOwait is high as well) running 
2.6.34 on debian/lenny on a x86_64 server (dual Opteron 275 w/ 16GB ram). 
The btrfs filesystem live on 18x300GB scsi spindles, configured as Raid-0, 
as shown below:

Label: none  uuid: bc6442c6-2fe2-4236-a5aa-6b7841234c52
         Total devices 18 FS bytes used 2.94TB
         devid    5 size 279.39GB used 208.33GB path /dev/cciss/c1d0
         devid   17 size 279.39GB used 208.34GB path /dev/cciss/c1d8
         devid   16 size 279.39GB used 209.33GB path /dev/cciss/c1d7
         devid    4 size 279.39GB used 208.33GB path /dev/cciss/c0d4
         devid    1 size 279.39GB used 233.72GB path /dev/cciss/c0d1
         devid   13 size 279.39GB used 208.33GB path /dev/cciss/c1d4
         devid    8 size 279.39GB used 208.33GB path /dev/cciss/c1d11
         devid   12 size 279.39GB used 208.33GB path /dev/cciss/c1d3
         devid    3 size 279.39GB used 208.33GB path /dev/cciss/c0d3
         devid    9 size 279.39GB used 208.33GB path /dev/cciss/c1d12
         devid    6 size 279.39GB used 208.33GB path /dev/cciss/c1d1
         devid   11 size 279.39GB used 208.33GB path /dev/cciss/c1d2
         devid   14 size 279.39GB used 208.33GB path /dev/cciss/c1d5
         devid    2 size 279.39GB used 233.70GB path /dev/cciss/c0d2
         devid   15 size 279.39GB used 209.33GB path /dev/cciss/c1d6
         devid   10 size 279.39GB used 208.33GB path /dev/cciss/c1d13
         devid    7 size 279.39GB used 208.33GB path /dev/cciss/c1d10
         devid   18 size 279.39GB used 208.34GB path /dev/cciss/c1d9
Btrfs v0.19-16-g075587c-dirty

The filesystem, mounted in /mnt/btrfs is hanging, no existing or new 
process can access it, however ''df'' still displays the disk
usage (3TB out
of 5). The disks appear to be physically healthy. Please note that a 
significant number of files were placed on this filesystem, between 20 and 
30 million files.

The relevant kernel messages are displayed below:

INFO: task btrfs-submit-0:4220 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
btrfs-submit- D 000000010042e12f     0  4220      2 0x00000000
  ffff8803e584ac70 0000000000000046 0000000000004000 0000000000011680
  ffff8803f7349fd8 ffff8803f7349fd8 ffff8803e584ac70 0000000000011680
  0000000000000001 ffff8803ff99d250 ffffffff8149f020 0000000081150ab0
Call Trace:
  [<ffffffff813089f3>] ? io_schedule+0x71/0xb1
  [<ffffffff811470be>] ? get_request_wait+0xab/0x140
  [<ffffffff810406f4>] ? autoremove_wake_function+0x0/0x2e
  [<ffffffff81143a4d>] ? elv_rq_merge_ok+0x89/0x97
  [<ffffffff8114a245>] ? blk_recount_segments+0x17/0x27
  [<ffffffff81147429>] ? __make_request+0x2d6/0x3fc
  [<ffffffff81145b16>] ? generic_make_request+0x207/0x268
  [<ffffffff81145c12>] ? submit_bio+0x9b/0xa2
  [<ffffffffa01aa081>] ? btrfs_requeue_work+0xd7/0xe1 [btrfs]
  [<ffffffffa01a5365>] ? run_scheduled_bios+0x297/0x48f [btrfs]
  [<ffffffffa01aa687>] ? worker_loop+0x17c/0x452 [btrfs]
  [<ffffffffa01aa50b>] ? worker_loop+0x0/0x452 [btrfs]
  [<ffffffff81040331>] ? kthread+0x79/0x81
  [<ffffffff81003674>] ? kernel_thread_helper+0x4/0x10
  [<ffffffff810402b8>] ? kthread+0x0/0x81
  [<ffffffff81003670>] ? kernel_thread_helper+0x0/0x10
INFO: task btrfs-transacti:4230 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
btrfs-transac D 000000010042e1cc     0  4230      2 0x00000000
  ffff8803e544d300 0000000000000046 0000000000004000 0000000000011680
  ffff8803f3531fd8 ffff8803f3531fd8 ffff8803e544d300 0000000000011680
  ffff8803fe488240 00000000000004c1 ffff8803ff8d7340 0000000381147502
Call Trace:
  [<ffffffff810b4153>] ? sync_buffer+0x0/0x3f
  [<ffffffff813089f3>] ? io_schedule+0x71/0xb1
  [<ffffffff810b418e>] ? sync_buffer+0x3b/0x3f
  [<ffffffff81308fba>] ? __wait_on_bit+0x41/0x70
  [<ffffffff810b4153>] ? sync_buffer+0x0/0x3f
  [<ffffffff81309054>] ? out_of_line_wait_on_bit+0x6b/0x77
  [<ffffffff81040722>] ? wake_bit_function+0x0/0x23
  [<ffffffffa0186635>] ? write_dev_supers+0xf3/0x225 [btrfs]
  [<ffffffffa018693b>] ? write_all_supers+0x1d4/0x22c [btrfs]
  [<ffffffffa01898a1>] ? btrfs_commit_transaction+0x4fe/0x5e1 [btrfs]
  [<ffffffff810406f4>] ? autoremove_wake_function+0x0/0x2e
  [<ffffffffa0185628>] ? transaction_kthread+0x16b/0x1fd [btrfs]
  [<ffffffffa01854bd>] ? transaction_kthread+0x0/0x1fd [btrfs]
  [<ffffffff81040331>] ? kthread+0x79/0x81
  [<ffffffff81003674>] ? kernel_thread_helper+0x4/0x10
  [<ffffffff810402b8>] ? kthread+0x0/0x81
  [<ffffffff81003670>] ? kernel_thread_helper+0x0/0x10
INFO: task tar:31615 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
tar           D 000000010042dee1     0 31615   4269 0x00000000
  ffff8803ffa74d70 0000000000000082 0000000000004000 0000000000011680
  ffff88010046dfd8 ffff88010046dfd8 ffff8803ffa74d70 0000000000011680
  ffff880361cdd480 0000000161cdd480 ffff8803ff8becf0 0000000200005fff
Call Trace:
  [<ffffffff8106d2af>] ? sync_page+0x0/0x45
  [<ffffffff813089f3>] ? io_schedule+0x71/0xb1
  [<ffffffff8106d2f0>] ? sync_page+0x41/0x45
  [<ffffffff81308fba>] ? __wait_on_bit+0x41/0x70
  [<ffffffff8106d483>] ? wait_on_page_bit+0x6b/0x71
  [<ffffffff81040722>] ? wake_bit_function+0x0/0x23
  [<ffffffffa0192917>] ? prepare_pages+0xe0/0x244 [btrfs]
  [<ffffffffa017e85a>] ? btrfs_check_data_free_space+0x69/0x206 [btrfs]
  [<ffffffffa0192f6a>] ? btrfs_file_write+0x405/0x711 [btrfs]
  [<ffffffff811ab83d>] ? tty_write+0x213/0x22e
  [<ffffffff810951b8>] ? vfs_write+0xad/0x149
  [<ffffffff81095310>] ? sys_write+0x45/0x6e
  [<ffffffff810028eb>] ? system_call_fastpath+0x16/0x1b
INFO: task btrfs-submit-0:4220 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
btrfs-submit- D 000000010042e12f     0  4220      2 0x00000000
  ffff8803e584ac70 0000000000000046 0000000000004000 0000000000011680
  ffff8803f7349fd8 ffff8803f7349fd8 ffff8803e584ac70 0000000000011680
  0000000000000001 ffff8803ff99d250 ffffffff8149f020 0000000081150ab0
Call Trace:
  [<ffffffff813089f3>] ? io_schedule+0x71/0xb1
  [<ffffffff811470be>] ? get_request_wait+0xab/0x140
  [<ffffffff810406f4>] ? autoremove_wake_function+0x0/0x2e
  [<ffffffff81143a4d>] ? elv_rq_merge_ok+0x89/0x97
  [<ffffffff8114a245>] ? blk_recount_segments+0x17/0x27
  [<ffffffff81147429>] ? __make_request+0x2d6/0x3fc
  [<ffffffff81145b16>] ? generic_make_request+0x207/0x268
  [<ffffffff81145c12>] ? submit_bio+0x9b/0xa2
  [<ffffffffa01aa081>] ? btrfs_requeue_work+0xd7/0xe1 [btrfs]
  [<ffffffffa01a5365>] ? run_scheduled_bios+0x297/0x48f [btrfs]
  [<ffffffffa01aa687>] ? worker_loop+0x17c/0x452 [btrfs]
  [<ffffffffa01aa50b>] ? worker_loop+0x0/0x452 [btrfs]
  [<ffffffff81040331>] ? kthread+0x79/0x81
  [<ffffffff81003674>] ? kernel_thread_helper+0x4/0x10
  [<ffffffff810402b8>] ? kthread+0x0/0x81
  [<ffffffff81003670>] ? kernel_thread_helper+0x0/0x10
INFO: task btrfs-transacti:4230 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
btrfs-transac D 000000010042e1cc     0  4230      2 0x00000000
  ffff8803e544d300 0000000000000046 0000000000004000 0000000000011680
  ffff8803f3531fd8 ffff8803f3531fd8 ffff8803e544d300 0000000000011680
  ffff8803fe488240 00000000000004c1 ffff8803ff8d7340 0000000381147502
Call Trace:
  [<ffffffff810b4153>] ? sync_buffer+0x0/0x3f
  [<ffffffff813089f3>] ? io_schedule+0x71/0xb1
  [<ffffffff810b418e>] ? sync_buffer+0x3b/0x3f
  [<ffffffff81308fba>] ? __wait_on_bit+0x41/0x70
  [<ffffffff810b4153>] ? sync_buffer+0x0/0x3f
  [<ffffffff81309054>] ? out_of_line_wait_on_bit+0x6b/0x77
  [<ffffffff81040722>] ? wake_bit_function+0x0/0x23
  [<ffffffffa0186635>] ? write_dev_supers+0xf3/0x225 [btrfs]
  [<ffffffffa018693b>] ? write_all_supers+0x1d4/0x22c [btrfs]
  [<ffffffffa01898a1>] ? btrfs_commit_transaction+0x4fe/0x5e1 [btrfs]
  [<ffffffff810406f4>] ? autoremove_wake_function+0x0/0x2e
  [<ffffffffa0185628>] ? transaction_kthread+0x16b/0x1fd [btrfs]
  [<ffffffffa01854bd>] ? transaction_kthread+0x0/0x1fd [btrfs]
  [<ffffffff81040331>] ? kthread+0x79/0x81
  [<ffffffff81003674>] ? kernel_thread_helper+0x4/0x10
  [<ffffffff810402b8>] ? kthread+0x0/0x81
  [<ffffffff81003670>] ? kernel_thread_helper+0x0/0x10
INFO: task tar:31615 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
tar           D 000000010042dee1     0 31615   4269 0x00000000
  ffff8803ffa74d70 0000000000000082 0000000000004000 0000000000011680
  ffff88010046dfd8 ffff88010046dfd8 ffff8803ffa74d70 0000000000011680
  ffff880361cdd480 0000000161cdd480 ffff8803ff8becf0 0000000200005fff
Call Trace:
  [<ffffffff8106d2af>] ? sync_page+0x0/0x45
  [<ffffffff813089f3>] ? io_schedule+0x71/0xb1
  [<ffffffff8106d2f0>] ? sync_page+0x41/0x45
  [<ffffffff81308fba>] ? __wait_on_bit+0x41/0x70
  [<ffffffff8106d483>] ? wait_on_page_bit+0x6b/0x71
  [<ffffffff81040722>] ? wake_bit_function+0x0/0x23
  [<ffffffffa0192917>] ? prepare_pages+0xe0/0x244 [btrfs]
  [<ffffffffa017e85a>] ? btrfs_check_data_free_space+0x69/0x206 [btrfs]
  [<ffffffffa0192f6a>] ? btrfs_file_write+0x405/0x711 [btrfs]
  [<ffffffff811ab83d>] ? tty_write+0x213/0x22e
  [<ffffffff810951b8>] ? vfs_write+0xad/0x149
  [<ffffffff81095310>] ? sys_write+0x45/0x6e
  [<ffffffff810028eb>] ? system_call_fastpath+0x16/0x1b
INFO: task btrfs-submit-0:4220 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
btrfs-submit- D 000000010042e12f     0  4220      2 0x00000000
  ffff8803e584ac70 0000000000000046 0000000000004000 0000000000011680
  ffff8803f7349fd8 ffff8803f7349fd8 ffff8803e584ac70 0000000000011680
  0000000000000001 ffff8803ff99d250 ffffffff8149f020 0000000081150ab0
Call Trace:
  [<ffffffff813089f3>] ? io_schedule+0x71/0xb1
  [<ffffffff811470be>] ? get_request_wait+0xab/0x140
  [<ffffffff810406f4>] ? autoremove_wake_function+0x0/0x2e
  [<ffffffff81143a4d>] ? elv_rq_merge_ok+0x89/0x97
  [<ffffffff8114a245>] ? blk_recount_segments+0x17/0x27
  [<ffffffff81147429>] ? __make_request+0x2d6/0x3fc
  [<ffffffff81145b16>] ? generic_make_request+0x207/0x268
  [<ffffffff81145c12>] ? submit_bio+0x9b/0xa2
  [<ffffffffa01aa081>] ? btrfs_requeue_work+0xd7/0xe1 [btrfs]
  [<ffffffffa01a5365>] ? run_scheduled_bios+0x297/0x48f [btrfs]
  [<ffffffffa01aa687>] ? worker_loop+0x17c/0x452 [btrfs]
  [<ffffffffa01aa50b>] ? worker_loop+0x0/0x452 [btrfs]
  [<ffffffff81040331>] ? kthread+0x79/0x81
  [<ffffffff81003674>] ? kernel_thread_helper+0x4/0x10
  [<ffffffff810402b8>] ? kthread+0x0/0x81
  [<ffffffff81003670>] ? kernel_thread_helper+0x0/0x10
INFO: task btrfs-transacti:4230 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
btrfs-transac D 000000010042e1cc     0  4230      2 0x00000000
  ffff8803e544d300 0000000000000046 0000000000004000 0000000000011680
  ffff8803f3531fd8 ffff8803f3531fd8 ffff8803e544d300 0000000000011680
  ffff8803fe488240 00000000000004c1 ffff8803ff8d7340 0000000381147502
Call Trace:
  [<ffffffff810b4153>] ? sync_buffer+0x0/0x3f
  [<ffffffff813089f3>] ? io_schedule+0x71/0xb1
  [<ffffffff810b418e>] ? sync_buffer+0x3b/0x3f
  [<ffffffff81308fba>] ? __wait_on_bit+0x41/0x70
  [<ffffffff810b4153>] ? sync_buffer+0x0/0x3f
  [<ffffffff81309054>] ? out_of_line_wait_on_bit+0x6b/0x77
  [<ffffffff81040722>] ? wake_bit_function+0x0/0x23
  [<ffffffffa0186635>] ? write_dev_supers+0xf3/0x225 [btrfs]
  [<ffffffffa018693b>] ? write_all_supers+0x1d4/0x22c [btrfs]
  [<ffffffffa01898a1>] ? btrfs_commit_transaction+0x4fe/0x5e1 [btrfs]
  [<ffffffff810406f4>] ? autoremove_wake_function+0x0/0x2e
  [<ffffffffa0185628>] ? transaction_kthread+0x16b/0x1fd [btrfs]
  [<ffffffffa01854bd>] ? transaction_kthread+0x0/0x1fd [btrfs]
  [<ffffffff81040331>] ? kthread+0x79/0x81
  [<ffffffff81003674>] ? kernel_thread_helper+0x4/0x10
  [<ffffffff810402b8>] ? kthread+0x0/0x81
  [<ffffffff81003670>] ? kernel_thread_helper+0x0/0x10
INFO: task tar:31615 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
tar           D 000000010042dee1     0 31615   4269 0x00000000
  ffff8803ffa74d70 0000000000000082 0000000000004000 0000000000011680
  ffff88010046dfd8 ffff88010046dfd8 ffff8803ffa74d70 0000000000011680
  ffff880361cdd480 0000000161cdd480 ffff8803ff8becf0 0000000200005fff
Call Trace:
  [<ffffffff8106d2af>] ? sync_page+0x0/0x45
  [<ffffffff813089f3>] ? io_schedule+0x71/0xb1
  [<ffffffff8106d2f0>] ? sync_page+0x41/0x45
  [<ffffffff81308fba>] ? __wait_on_bit+0x41/0x70
  [<ffffffff8106d483>] ? wait_on_page_bit+0x6b/0x71
  [<ffffffff81040722>] ? wake_bit_function+0x0/0x23
  [<ffffffffa0192917>] ? prepare_pages+0xe0/0x244 [btrfs]
  [<ffffffffa017e85a>] ? btrfs_check_data_free_space+0x69/0x206 [btrfs]
  [<ffffffffa0192f6a>] ? btrfs_file_write+0x405/0x711 [btrfs]
  [<ffffffff811ab83d>] ? tty_write+0x213/0x22e
  [<ffffffff810951b8>] ? vfs_write+0xad/0x149
  [<ffffffff81095310>] ? sys_write+0x45/0x6e
  [<ffffffff810028eb>] ? system_call_fastpath+0x16/0x1b
INFO: task btrfs-submit-0:4220 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
btrfs-submit- D 000000010042e12f     0  4220      2 0x00000000
  ffff8803e584ac70 0000000000000046 0000000000004000 0000000000011680
  ffff8803f7349fd8 ffff8803f7349fd8 ffff8803e584ac70 0000000000011680
  0000000000000001 ffff8803ff99d250 ffffffff8149f020 0000000081150ab0
Call Trace:
  [<ffffffff813089f3>] ? io_schedule+0x71/0xb1
  [<ffffffff811470be>] ? get_request_wait+0xab/0x140
  [<ffffffff810406f4>] ? autoremove_wake_function+0x0/0x2e
  [<ffffffff81143a4d>] ? elv_rq_merge_ok+0x89/0x97
  [<ffffffff8114a245>] ? blk_recount_segments+0x17/0x27
  [<ffffffff81147429>] ? __make_request+0x2d6/0x3fc
  [<ffffffff81145b16>] ? generic_make_request+0x207/0x268
  [<ffffffff81145c12>] ? submit_bio+0x9b/0xa2
  [<ffffffffa01aa081>] ? btrfs_requeue_work+0xd7/0xe1 [btrfs]
  [<ffffffffa01a5365>] ? run_scheduled_bios+0x297/0x48f [btrfs]
  [<ffffffffa01aa687>] ? worker_loop+0x17c/0x452 [btrfs]
  [<ffffffffa01aa50b>] ? worker_loop+0x0/0x452 [btrfs]
  [<ffffffff81040331>] ? kthread+0x79/0x81
  [<ffffffff81003674>] ? kernel_thread_helper+0x4/0x10
  [<ffffffff810402b8>] ? kthread+0x0/0x81
  [<ffffffff81003670>] ? kernel_thread_helper+0x0/0x10



Jerome J. Ibanes
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Shaohua Li

2010-Jun-11 01:12 UTC

head link

Re: btrfs: hanging processes - race condition?

On Fri, Jun 11, 2010 at 01:41:41AM +0800, Jerome Ibanes
wrote:> List,
> 
> I ran into a hang issue (race condition: cpu is high when the server is
> idle, meaning that btrfs is hanging, and IOwait is high as well) running
> 2.6.34 on debian/lenny on a x86_64 server (dual Opteron 275 w/ 16GB ram).
> The btrfs filesystem live on 18x300GB scsi spindles, configured as Raid-0,
> as shown below:
> 
> Label: none  uuid: bc6442c6-2fe2-4236-a5aa-6b7841234c52
>          Total devices 18 FS bytes used 2.94TB
>          devid    5 size 279.39GB used 208.33GB path /dev/cciss/c1d0
>          devid   17 size 279.39GB used 208.34GB path /dev/cciss/c1d8
>          devid   16 size 279.39GB used 209.33GB path /dev/cciss/c1d7
>          devid    4 size 279.39GB used 208.33GB path /dev/cciss/c0d4
>          devid    1 size 279.39GB used 233.72GB path /dev/cciss/c0d1
>          devid   13 size 279.39GB used 208.33GB path /dev/cciss/c1d4
>          devid    8 size 279.39GB used 208.33GB path /dev/cciss/c1d11
>          devid   12 size 279.39GB used 208.33GB path /dev/cciss/c1d3
>          devid    3 size 279.39GB used 208.33GB path /dev/cciss/c0d3
>          devid    9 size 279.39GB used 208.33GB path /dev/cciss/c1d12
>          devid    6 size 279.39GB used 208.33GB path /dev/cciss/c1d1
>          devid   11 size 279.39GB used 208.33GB path /dev/cciss/c1d2
>          devid   14 size 279.39GB used 208.33GB path /dev/cciss/c1d5
>          devid    2 size 279.39GB used 233.70GB path /dev/cciss/c0d2
>          devid   15 size 279.39GB used 209.33GB path /dev/cciss/c1d6
>          devid   10 size 279.39GB used 208.33GB path /dev/cciss/c1d13
>          devid    7 size 279.39GB used 208.33GB path /dev/cciss/c1d10
>          devid   18 size 279.39GB used 208.34GB path /dev/cciss/c1d9
> Btrfs v0.19-16-g075587c-dirty
> 
> The filesystem, mounted in /mnt/btrfs is hanging, no existing or new
> process can access it, however ''df'' still displays the
disk usage (3TB out
> of 5). The disks appear to be physically healthy. Please note that a
> significant number of files were placed on this filesystem, between 20 and
> 30 million files.
> 
> The relevant kernel messages are displayed below:
> 
> INFO: task btrfs-submit-0:4220 blocked for more than 120 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
this message.
> btrfs-submit- D 000000010042e12f     0  4220      2 0x00000000
>   ffff8803e584ac70 0000000000000046 0000000000004000 0000000000011680
>   ffff8803f7349fd8 ffff8803f7349fd8 ffff8803e584ac70 0000000000011680
>   0000000000000001 ffff8803ff99d250 ffffffff8149f020 0000000081150ab0
> Call Trace:
>   [<ffffffff813089f3>] ? io_schedule+0x71/0xb1
>   [<ffffffff811470be>] ? get_request_wait+0xab/0x140
>   [<ffffffff810406f4>] ? autoremove_wake_function+0x0/0x2e
>   [<ffffffff81143a4d>] ? elv_rq_merge_ok+0x89/0x97
>   [<ffffffff8114a245>] ? blk_recount_segments+0x17/0x27
>   [<ffffffff81147429>] ? __make_request+0x2d6/0x3fc
>   [<ffffffff81145b16>] ? generic_make_request+0x207/0x268
>   [<ffffffff81145c12>] ? submit_bio+0x9b/0xa2
>   [<ffffffffa01aa081>] ? btrfs_requeue_work+0xd7/0xe1 [btrfs]
>   [<ffffffffa01a5365>] ? run_scheduled_bios+0x297/0x48f [btrfs]
>   [<ffffffffa01aa687>] ? worker_loop+0x17c/0x452 [btrfs]
>   [<ffffffffa01aa50b>] ? worker_loop+0x0/0x452 [btrfs]
>   [<ffffffff81040331>] ? kthread+0x79/0x81
>   [<ffffffff81003674>] ? kernel_thread_helper+0x4/0x10
>   [<ffffffff810402b8>] ? kthread+0x0/0x81
>   [<ffffffff81003670>] ? kernel_thread_helper+0x0/0x10This looks like the issue we saw too, http://lkml.org/lkml/2010/6/8/375.
This is reproduceable in our setup.

Thanks,
Shaohua
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Yan, Zheng

2010-Jun-11 02:32 UTC

head link

Re: btrfs: hanging processes - race condition?

On Fri, Jun 11, 2010 at 9:12 AM, Shaohua Li <shaohua.li@intel.com>
wrote:> On Fri, Jun 11, 2010 at 01:41:41AM +0800, Jerome Ibanes wrote:
>> List,
>>
>> I ran into a hang issue (race condition: cpu is high when the server is
>> idle, meaning that btrfs is hanging, and IOwait is high as well)
running
>> 2.6.34 on debian/lenny on a x86_64 server (dual Opteron 275 w/ 16GB
ram).
>> The btrfs filesystem live on 18x300GB scsi spindles, configured as
Raid-0,
>> as shown below:
>>
>> Label: none  uuid: bc6442c6-2fe2-4236-a5aa-6b7841234c52
>>          Total devices 18 FS bytes used 2.94TB
>>          devid    5 size 279.39GB used 208.33GB path /dev/cciss/c1d0
>>          devid   17 size 279.39GB used 208.34GB path /dev/cciss/c1d8
>>          devid   16 size 279.39GB used 209.33GB path /dev/cciss/c1d7
>>          devid    4 size 279.39GB used 208.33GB path /dev/cciss/c0d4
>>          devid    1 size 279.39GB used 233.72GB path /dev/cciss/c0d1
>>          devid   13 size 279.39GB used 208.33GB path /dev/cciss/c1d4
>>          devid    8 size 279.39GB used 208.33GB path /dev/cciss/c1d11
>>          devid   12 size 279.39GB used 208.33GB path /dev/cciss/c1d3
>>          devid    3 size 279.39GB used 208.33GB path /dev/cciss/c0d3
>>          devid    9 size 279.39GB used 208.33GB path /dev/cciss/c1d12
>>          devid    6 size 279.39GB used 208.33GB path /dev/cciss/c1d1
>>          devid   11 size 279.39GB used 208.33GB path /dev/cciss/c1d2
>>          devid   14 size 279.39GB used 208.33GB path /dev/cciss/c1d5
>>          devid    2 size 279.39GB used 233.70GB path /dev/cciss/c0d2
>>          devid   15 size 279.39GB used 209.33GB path /dev/cciss/c1d6
>>          devid   10 size 279.39GB used 208.33GB path /dev/cciss/c1d13
>>          devid    7 size 279.39GB used 208.33GB path /dev/cciss/c1d10
>>          devid   18 size 279.39GB used 208.34GB path /dev/cciss/c1d9
>> Btrfs v0.19-16-g075587c-dirty
>>
>> The filesystem, mounted in /mnt/btrfs is hanging, no existing or new
>> process can access it, however ''df'' still displays
the disk usage (3TB out
>> of 5). The disks appear to be physically healthy. Please note that a
>> significant number of files were placed on this filesystem, between 20
and
>> 30 million files.
>>
>> The relevant kernel messages are displayed below:
>>
>> INFO: task btrfs-submit-0:4220 blocked for more than 120 seconds.
>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
>> btrfs-submit- D 000000010042e12f     0  4220      2 0x00000000
>>   ffff8803e584ac70 0000000000000046 0000000000004000 0000000000011680
>>   ffff8803f7349fd8 ffff8803f7349fd8 ffff8803e584ac70 0000000000011680
>>   0000000000000001 ffff8803ff99d250 ffffffff8149f020 0000000081150ab0
>> Call Trace:
>>   [<ffffffff813089f3>] ? io_schedule+0x71/0xb1
>>   [<ffffffff811470be>] ? get_request_wait+0xab/0x140
>>   [<ffffffff810406f4>] ? autoremove_wake_function+0x0/0x2e
>>   [<ffffffff81143a4d>] ? elv_rq_merge_ok+0x89/0x97
>>   [<ffffffff8114a245>] ? blk_recount_segments+0x17/0x27
>>   [<ffffffff81147429>] ? __make_request+0x2d6/0x3fc
>>   [<ffffffff81145b16>] ? generic_make_request+0x207/0x268
>>   [<ffffffff81145c12>] ? submit_bio+0x9b/0xa2
>>   [<ffffffffa01aa081>] ? btrfs_requeue_work+0xd7/0xe1 [btrfs]
>>   [<ffffffffa01a5365>] ? run_scheduled_bios+0x297/0x48f [btrfs]
>>   [<ffffffffa01aa687>] ? worker_loop+0x17c/0x452 [btrfs]
>>   [<ffffffffa01aa50b>] ? worker_loop+0x0/0x452 [btrfs]
>>   [<ffffffff81040331>] ? kthread+0x79/0x81
>>   [<ffffffff81003674>] ? kernel_thread_helper+0x4/0x10
>>   [<ffffffff810402b8>] ? kthread+0x0/0x81
>>   [<ffffffff81003670>] ? kernel_thread_helper+0x0/0x10
> This looks like the issue we saw too, http://lkml.org/lkml/2010/6/8/375.
> This is reproduceable in our setup.
I think I know the cause of http://lkml.org/lkml/2010/6/8/375.
The code in the first do-while loop in btrfs_commit_transaction
set current process to TASK_UNINTERRUPTIBLE state, then calls
btrfs_start_delalloc_inodes, btrfs_wait_ordered_extents and
btrfs_run_ordered_operations(). All of these function may call
cond_resched().
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Shaohua Li

2010-Jun-13 06:50 UTC

head link

Re: btrfs: hanging processes - race condition?

On Fri, Jun 11, 2010 at 10:32:07AM +0800, Yan, Zheng 
wrote:> On Fri, Jun 11, 2010 at 9:12 AM, Shaohua Li <shaohua.li@intel.com>
wrote:
> > On Fri, Jun 11, 2010 at 01:41:41AM +0800, Jerome Ibanes wrote:
> >> List,
> >>
> >> I ran into a hang issue (race condition: cpu is high when the
server is
> >> idle, meaning that btrfs is hanging, and IOwait is high as well)
running
> >> 2.6.34 on debian/lenny on a x86_64 server (dual Opteron 275 w/
16GB ram).
> >> The btrfs filesystem live on 18x300GB scsi spindles, configured as
Raid-0,
> >> as shown below:
> >>
> >> Label: none  uuid: bc6442c6-2fe2-4236-a5aa-6b7841234c52
> >>          Total devices 18 FS bytes used 2.94TB
> >>          devid    5 size 279.39GB used 208.33GB path
/dev/cciss/c1d0
> >>          devid   17 size 279.39GB used 208.34GB path
/dev/cciss/c1d8
> >>          devid   16 size 279.39GB used 209.33GB path
/dev/cciss/c1d7
> >>          devid    4 size 279.39GB used 208.33GB path
/dev/cciss/c0d4
> >>          devid    1 size 279.39GB used 233.72GB path
/dev/cciss/c0d1
> >>          devid   13 size 279.39GB used 208.33GB path
/dev/cciss/c1d4
> >>          devid    8 size 279.39GB used 208.33GB path
/dev/cciss/c1d11
> >>          devid   12 size 279.39GB used 208.33GB path
/dev/cciss/c1d3
> >>          devid    3 size 279.39GB used 208.33GB path
/dev/cciss/c0d3
> >>          devid    9 size 279.39GB used 208.33GB path
/dev/cciss/c1d12
> >>          devid    6 size 279.39GB used 208.33GB path
/dev/cciss/c1d1
> >>          devid   11 size 279.39GB used 208.33GB path
/dev/cciss/c1d2
> >>          devid   14 size 279.39GB used 208.33GB path
/dev/cciss/c1d5
> >>          devid    2 size 279.39GB used 233.70GB path
/dev/cciss/c0d2
> >>          devid   15 size 279.39GB used 209.33GB path
/dev/cciss/c1d6
> >>          devid   10 size 279.39GB used 208.33GB path
/dev/cciss/c1d13
> >>          devid    7 size 279.39GB used 208.33GB path
/dev/cciss/c1d10
> >>          devid   18 size 279.39GB used 208.34GB path
/dev/cciss/c1d9
> >> Btrfs v0.19-16-g075587c-dirty
> >>
> >> The filesystem, mounted in /mnt/btrfs is hanging, no existing or
new
> >> process can access it, however ''df'' still
displays the disk usage (3TB out
> >> of 5). The disks appear to be physically healthy. Please note that
a
> >> significant number of files were placed on this filesystem,
between 20 and
> >> 30 million files.
> >>
> >> The relevant kernel messages are displayed below:
> >>
> >> INFO: task btrfs-submit-0:4220 blocked for more than 120 seconds.
> >> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
> >> btrfs-submit- D 000000010042e12f     0  4220      2 0x00000000
> >>   ffff8803e584ac70 0000000000000046 0000000000004000
0000000000011680
> >>   ffff8803f7349fd8 ffff8803f7349fd8 ffff8803e584ac70
0000000000011680
> >>   0000000000000001 ffff8803ff99d250 ffffffff8149f020
0000000081150ab0
> >> Call Trace:
> >>   [<ffffffff813089f3>] ? io_schedule+0x71/0xb1
> >>   [<ffffffff811470be>] ? get_request_wait+0xab/0x140
> >>   [<ffffffff810406f4>] ? autoremove_wake_function+0x0/0x2e
> >>   [<ffffffff81143a4d>] ? elv_rq_merge_ok+0x89/0x97
> >>   [<ffffffff8114a245>] ? blk_recount_segments+0x17/0x27
> >>   [<ffffffff81147429>] ? __make_request+0x2d6/0x3fc
> >>   [<ffffffff81145b16>] ? generic_make_request+0x207/0x268
> >>   [<ffffffff81145c12>] ? submit_bio+0x9b/0xa2
> >>   [<ffffffffa01aa081>] ? btrfs_requeue_work+0xd7/0xe1
[btrfs]
> >>   [<ffffffffa01a5365>] ? run_scheduled_bios+0x297/0x48f
[btrfs]
> >>   [<ffffffffa01aa687>] ? worker_loop+0x17c/0x452 [btrfs]
> >>   [<ffffffffa01aa50b>] ? worker_loop+0x0/0x452 [btrfs]
> >>   [<ffffffff81040331>] ? kthread+0x79/0x81
> >>   [<ffffffff81003674>] ? kernel_thread_helper+0x4/0x10
> >>   [<ffffffff810402b8>] ? kthread+0x0/0x81
> >>   [<ffffffff81003670>] ? kernel_thread_helper+0x0/0x10
> > This looks like the issue we saw too,
http://lkml.org/lkml/2010/6/8/375.
> > This is reproduceable in our setup.
> 
> I think I know the cause of http://lkml.org/lkml/2010/6/8/375.
> The code in the first do-while loop in btrfs_commit_transaction
> set current process to TASK_UNINTERRUPTIBLE state, then calls
> btrfs_start_delalloc_inodes, btrfs_wait_ordered_extents and
> btrfs_run_ordered_operations(). All of these function may call
> cond_resched().Hi,
When I test random write, I saw a lot of threads jump into btree_writepages()
and do noting and io throughput is zero for some time. Looks like there is a
live lock. See the code of btree_writepages():
	if (wbc->sync_mode == WB_SYNC_NONE) {
		struct btrfs_root *root = BTRFS_I(mapping->host)->root;
		u64 num_dirty;
		unsigned long thresh = 32 * 1024 * 1024;

		if (wbc->for_kupdate)
			return 0;

		/* this is a bit racy, but that''s ok */
		num_dirty = root->fs_info->dirty_metadata_bytes;>>>>>>		if (num_dirty < thresh)			return 0;
	}
The marked line is quite intrusive. In my test, the live lock is caused by the
thresh
check. The dirty_metadata_bytes < 32M. Without it, I can''t see the
live lock. Not
sure if this is related to the hang.

Thanks,
Shaohua
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Chris Mason

2010-Jun-14 13:26 UTC

head link

Re: btrfs: hanging processes - race condition?

On Fri, Jun 11, 2010 at 10:32:07AM +0800, Yan, Zheng 
wrote:> On Fri, Jun 11, 2010 at 9:12 AM, Shaohua Li <shaohua.li@intel.com>
wrote:
> > On Fri, Jun 11, 2010 at 01:41:41AM +0800, Jerome Ibanes wrote:
> >> List,
> >>
> >> I ran into a hang issue (race condition: cpu is high when the
server is
> >> idle, meaning that btrfs is hanging, and IOwait is high as well)
running
> >> 2.6.34 on debian/lenny on a x86_64 server (dual Opteron 275 w/
16GB ram).
> >> The btrfs filesystem live on 18x300GB scsi spindles, configured as
Raid-0,
> >> as shown below:
> >>
> >> Label: none  uuid: bc6442c6-2fe2-4236-a5aa-6b7841234c52
> >>          Total devices 18 FS bytes used 2.94TB
> >>          devid    5 size 279.39GB used 208.33GB path
/dev/cciss/c1d0
> >>          devid   17 size 279.39GB used 208.34GB path
/dev/cciss/c1d8
> >>          devid   16 size 279.39GB used 209.33GB path
/dev/cciss/c1d7
> >>          devid    4 size 279.39GB used 208.33GB path
/dev/cciss/c0d4
> >>          devid    1 size 279.39GB used 233.72GB path
/dev/cciss/c0d1
> >>          devid   13 size 279.39GB used 208.33GB path
/dev/cciss/c1d4
> >>          devid    8 size 279.39GB used 208.33GB path
/dev/cciss/c1d11
> >>          devid   12 size 279.39GB used 208.33GB path
/dev/cciss/c1d3
> >>          devid    3 size 279.39GB used 208.33GB path
/dev/cciss/c0d3
> >>          devid    9 size 279.39GB used 208.33GB path
/dev/cciss/c1d12
> >>          devid    6 size 279.39GB used 208.33GB path
/dev/cciss/c1d1
> >>          devid   11 size 279.39GB used 208.33GB path
/dev/cciss/c1d2
> >>          devid   14 size 279.39GB used 208.33GB path
/dev/cciss/c1d5
> >>          devid    2 size 279.39GB used 233.70GB path
/dev/cciss/c0d2
> >>          devid   15 size 279.39GB used 209.33GB path
/dev/cciss/c1d6
> >>          devid   10 size 279.39GB used 208.33GB path
/dev/cciss/c1d13
> >>          devid    7 size 279.39GB used 208.33GB path
/dev/cciss/c1d10
> >>          devid   18 size 279.39GB used 208.34GB path
/dev/cciss/c1d9
> >> Btrfs v0.19-16-g075587c-dirty
> >>
> >> The filesystem, mounted in /mnt/btrfs is hanging, no existing or
new
> >> process can access it, however ''df'' still
displays the disk usage (3TB out
> >> of 5). The disks appear to be physically healthy. Please note that
a
> >> significant number of files were placed on this filesystem,
between 20 and
> >> 30 million files.
> >>
> >> The relevant kernel messages are displayed below:
> >>
> >> INFO: task btrfs-submit-0:4220 blocked for more than 120 seconds.
> >> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
> >> btrfs-submit- D 000000010042e12f     0  4220      2 0x00000000
> >>   ffff8803e584ac70 0000000000000046 0000000000004000
0000000000011680
> >>   ffff8803f7349fd8 ffff8803f7349fd8 ffff8803e584ac70
0000000000011680
> >>   0000000000000001 ffff8803ff99d250 ffffffff8149f020
0000000081150ab0
> >> Call Trace:
> >>   [<ffffffff813089f3>] ? io_schedule+0x71/0xb1
> >>   [<ffffffff811470be>] ? get_request_wait+0xab/0x140
> >>   [<ffffffff810406f4>] ? autoremove_wake_function+0x0/0x2e
> >>   [<ffffffff81143a4d>] ? elv_rq_merge_ok+0x89/0x97
> >>   [<ffffffff8114a245>] ? blk_recount_segments+0x17/0x27
> >>   [<ffffffff81147429>] ? __make_request+0x2d6/0x3fc
> >>   [<ffffffff81145b16>] ? generic_make_request+0x207/0x268
> >>   [<ffffffff81145c12>] ? submit_bio+0x9b/0xa2
> >>   [<ffffffffa01aa081>] ? btrfs_requeue_work+0xd7/0xe1
[btrfs]
> >>   [<ffffffffa01a5365>] ? run_scheduled_bios+0x297/0x48f
[btrfs]
> >>   [<ffffffffa01aa687>] ? worker_loop+0x17c/0x452 [btrfs]
> >>   [<ffffffffa01aa50b>] ? worker_loop+0x0/0x452 [btrfs]
> >>   [<ffffffff81040331>] ? kthread+0x79/0x81
> >>   [<ffffffff81003674>] ? kernel_thread_helper+0x4/0x10
> >>   [<ffffffff810402b8>] ? kthread+0x0/0x81
> >>   [<ffffffff81003670>] ? kernel_thread_helper+0x0/0x10
> > This looks like the issue we saw too,
http://lkml.org/lkml/2010/6/8/375.
> > This is reproduceable in our setup.
> 
> I think I know the cause of http://lkml.org/lkml/2010/6/8/375.
> The code in the first do-while loop in btrfs_commit_transaction
> set current process to TASK_UNINTERRUPTIBLE state, then calls
> btrfs_start_delalloc_inodes, btrfs_wait_ordered_extents and
> btrfs_run_ordered_operations(). All of these function may call
> cond_resched().
The TASK_UNINTERRUPTIBLE problem was fixed with 2.6.35-rc1.  You can
find the changes in the master branch of the btrfs-unstable repo.

-chris

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Chris Mason

2010-Jun-14 13:28 UTC

head link

Re: btrfs: hanging processes - race condition?

On Sun, Jun 13, 2010 at 02:50:06PM +0800, Shaohua Li
wrote:> On Fri, Jun 11, 2010 at 10:32:07AM +0800, Yan, Zheng  wrote:
> > On Fri, Jun 11, 2010 at 9:12 AM, Shaohua Li
<shaohua.li@intel.com> wrote:
> > > On Fri, Jun 11, 2010 at 01:41:41AM +0800, Jerome Ibanes wrote:
> > >> List,
> > >>
> > >> I ran into a hang issue (race condition: cpu is high when the
server is
> > >> idle, meaning that btrfs is hanging, and IOwait is high as
well) running
> > >> 2.6.34 on debian/lenny on a x86_64 server (dual Opteron 275
w/ 16GB ram).
> > >> The btrfs filesystem live on 18x300GB scsi spindles,
configured as Raid-0,
> > >> as shown below:
> > >>
> > >> Label: none  uuid: bc6442c6-2fe2-4236-a5aa-6b7841234c52
> > >>          Total devices 18 FS bytes used 2.94TB
> > >>          devid    5 size 279.39GB used 208.33GB path
/dev/cciss/c1d0
> > >>          devid   17 size 279.39GB used 208.34GB path
/dev/cciss/c1d8
> > >>          devid   16 size 279.39GB used 209.33GB path
/dev/cciss/c1d7
> > >>          devid    4 size 279.39GB used 208.33GB path
/dev/cciss/c0d4
> > >>          devid    1 size 279.39GB used 233.72GB path
/dev/cciss/c0d1
> > >>          devid   13 size 279.39GB used 208.33GB path
/dev/cciss/c1d4
> > >>          devid    8 size 279.39GB used 208.33GB path
/dev/cciss/c1d11
> > >>          devid   12 size 279.39GB used 208.33GB path
/dev/cciss/c1d3
> > >>          devid    3 size 279.39GB used 208.33GB path
/dev/cciss/c0d3
> > >>          devid    9 size 279.39GB used 208.33GB path
/dev/cciss/c1d12
> > >>          devid    6 size 279.39GB used 208.33GB path
/dev/cciss/c1d1
> > >>          devid   11 size 279.39GB used 208.33GB path
/dev/cciss/c1d2
> > >>          devid   14 size 279.39GB used 208.33GB path
/dev/cciss/c1d5
> > >>          devid    2 size 279.39GB used 233.70GB path
/dev/cciss/c0d2
> > >>          devid   15 size 279.39GB used 209.33GB path
/dev/cciss/c1d6
> > >>          devid   10 size 279.39GB used 208.33GB path
/dev/cciss/c1d13
> > >>          devid    7 size 279.39GB used 208.33GB path
/dev/cciss/c1d10
> > >>          devid   18 size 279.39GB used 208.34GB path
/dev/cciss/c1d9
> > >> Btrfs v0.19-16-g075587c-dirty
> > >>
> > >> The filesystem, mounted in /mnt/btrfs is hanging, no existing
or new
> > >> process can access it, however ''df'' still
displays the disk usage (3TB out
> > >> of 5). The disks appear to be physically healthy. Please note
that a
> > >> significant number of files were placed on this filesystem,
between 20 and
> > >> 30 million files.
> > >>
> > >> The relevant kernel messages are displayed below:
> > >>
> > >> INFO: task btrfs-submit-0:4220 blocked for more than 120
seconds.
> > >> "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > >> btrfs-submit- D 000000010042e12f     0  4220      2
0x00000000
> > >>   ffff8803e584ac70 0000000000000046 0000000000004000
0000000000011680
> > >>   ffff8803f7349fd8 ffff8803f7349fd8 ffff8803e584ac70
0000000000011680
> > >>   0000000000000001 ffff8803ff99d250 ffffffff8149f020
0000000081150ab0
> > >> Call Trace:
> > >>   [<ffffffff813089f3>] ? io_schedule+0x71/0xb1
> > >>   [<ffffffff811470be>] ? get_request_wait+0xab/0x140
> > >>   [<ffffffff810406f4>] ?
autoremove_wake_function+0x0/0x2e
> > >>   [<ffffffff81143a4d>] ? elv_rq_merge_ok+0x89/0x97
> > >>   [<ffffffff8114a245>] ? blk_recount_segments+0x17/0x27
> > >>   [<ffffffff81147429>] ? __make_request+0x2d6/0x3fc
> > >>   [<ffffffff81145b16>] ?
generic_make_request+0x207/0x268
> > >>   [<ffffffff81145c12>] ? submit_bio+0x9b/0xa2
> > >>   [<ffffffffa01aa081>] ? btrfs_requeue_work+0xd7/0xe1
[btrfs]
> > >>   [<ffffffffa01a5365>] ? run_scheduled_bios+0x297/0x48f
[btrfs]
> > >>   [<ffffffffa01aa687>] ? worker_loop+0x17c/0x452
[btrfs]
> > >>   [<ffffffffa01aa50b>] ? worker_loop+0x0/0x452 [btrfs]
> > >>   [<ffffffff81040331>] ? kthread+0x79/0x81
> > >>   [<ffffffff81003674>] ? kernel_thread_helper+0x4/0x10
> > >>   [<ffffffff810402b8>] ? kthread+0x0/0x81
> > >>   [<ffffffff81003670>] ? kernel_thread_helper+0x0/0x10
> > > This looks like the issue we saw too,
http://lkml.org/lkml/2010/6/8/375.
> > > This is reproduceable in our setup.
> > 
> > I think I know the cause of http://lkml.org/lkml/2010/6/8/375.
> > The code in the first do-while loop in btrfs_commit_transaction
> > set current process to TASK_UNINTERRUPTIBLE state, then calls
> > btrfs_start_delalloc_inodes, btrfs_wait_ordered_extents and
> > btrfs_run_ordered_operations(). All of these function may call
> > cond_resched().
> Hi,
> When I test random write, I saw a lot of threads jump into
btree_writepages()
> and do noting and io throughput is zero for some time. Looks like there is
a
> live lock. See the code of btree_writepages():
> 	if (wbc->sync_mode == WB_SYNC_NONE) {
> 		struct btrfs_root *root = BTRFS_I(mapping->host)->root;
> 		u64 num_dirty;
> 		unsigned long thresh = 32 * 1024 * 1024;
> 
> 		if (wbc->for_kupdate)
> 			return 0;
> 
> 		/* this is a bit racy, but that''s ok */
> 		num_dirty = root->fs_info->dirty_metadata_bytes;
> >>>>>>		if (num_dirty < thresh)
> 			return 0;
> 	}
> The marked line is quite intrusive. In my test, the live lock is caused by
the thresh
> check. The dirty_metadata_bytes < 32M. Without it, I can''t see
the live lock. Not
> sure if this is related to the hang.
How much ram do you have?  The goal of the check is to avoid writing
metadata blocks because once we write them we have to do more IO to cow
them again if they are changed later.

It shouldn''t be looping hard in btrfs there, what was the workload?

-chris

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jerome Ibanes

2010-Jun-14 18:12 UTC

head link

Re: btrfs: hanging processes - race condition?

On Mon, 14 Jun 2010, Chris Mason wrote:
> On Sun, Jun 13, 2010 at 02:50:06PM +0800, Shaohua Li wrote:
>> On Fri, Jun 11, 2010 at 10:32:07AM +0800, Yan, Zheng  wrote:
>>> On Fri, Jun 11, 2010 at 9:12 AM, Shaohua Li
<shaohua.li@intel.com> wrote:
>>>> On Fri, Jun 11, 2010 at 01:41:41AM +0800, Jerome Ibanes wrote:
>>>>> List,
>>>>>
>>>>> I ran into a hang issue (race condition: cpu is high when
the server is
>>>>> idle, meaning that btrfs is hanging, and IOwait is high as
well) running
>>>>> 2.6.34 on debian/lenny on a x86_64 server (dual Opteron 275
w/ 16GB ram).
>>>>> The btrfs filesystem live on 18x300GB scsi spindles,
configured as Raid-0,
>>>>> as shown below:
>>>>>
>>>>> Label: none  uuid: bc6442c6-2fe2-4236-a5aa-6b7841234c52
>>>>>          Total devices 18 FS bytes used 2.94TB
>>>>>          devid    5 size 279.39GB used 208.33GB path
/dev/cciss/c1d0
>>>>>          devid   17 size 279.39GB used 208.34GB path
/dev/cciss/c1d8
>>>>>          devid   16 size 279.39GB used 209.33GB path
/dev/cciss/c1d7
>>>>>          devid    4 size 279.39GB used 208.33GB path
/dev/cciss/c0d4
>>>>>          devid    1 size 279.39GB used 233.72GB path
/dev/cciss/c0d1
>>>>>          devid   13 size 279.39GB used 208.33GB path
/dev/cciss/c1d4
>>>>>          devid    8 size 279.39GB used 208.33GB path
/dev/cciss/c1d11
>>>>>          devid   12 size 279.39GB used 208.33GB path
/dev/cciss/c1d3
>>>>>          devid    3 size 279.39GB used 208.33GB path
/dev/cciss/c0d3
>>>>>          devid    9 size 279.39GB used 208.33GB path
/dev/cciss/c1d12
>>>>>          devid    6 size 279.39GB used 208.33GB path
/dev/cciss/c1d1
>>>>>          devid   11 size 279.39GB used 208.33GB path
/dev/cciss/c1d2
>>>>>          devid   14 size 279.39GB used 208.33GB path
/dev/cciss/c1d5
>>>>>          devid    2 size 279.39GB used 233.70GB path
/dev/cciss/c0d2
>>>>>          devid   15 size 279.39GB used 209.33GB path
/dev/cciss/c1d6
>>>>>          devid   10 size 279.39GB used 208.33GB path
/dev/cciss/c1d13
>>>>>          devid    7 size 279.39GB used 208.33GB path
/dev/cciss/c1d10
>>>>>          devid   18 size 279.39GB used 208.34GB path
/dev/cciss/c1d9
>>>>> Btrfs v0.19-16-g075587c-dirty
>>>>>
>>>>> The filesystem, mounted in /mnt/btrfs is hanging, no
existing or new
>>>>> process can access it, however ''df'' still
displays the disk usage (3TB out
>>>>> of 5). The disks appear to be physically healthy. Please
note that a
>>>>> significant number of files were placed on this filesystem,
between 20 and
>>>>> 30 million files.
>>>>>
>>>>> The relevant kernel messages are displayed below:
>>>>>
>>>>> INFO: task btrfs-submit-0:4220 blocked for more than 120
seconds.
>>>>> "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>>>> btrfs-submit- D 000000010042e12f     0  4220      2
0x00000000
>>>>>   ffff8803e584ac70 0000000000000046 0000000000004000
0000000000011680
>>>>>   ffff8803f7349fd8 ffff8803f7349fd8 ffff8803e584ac70
0000000000011680
>>>>>   0000000000000001 ffff8803ff99d250 ffffffff8149f020
0000000081150ab0
>>>>> Call Trace:
>>>>>   [<ffffffff813089f3>] ? io_schedule+0x71/0xb1
>>>>>   [<ffffffff811470be>] ? get_request_wait+0xab/0x140
>>>>>   [<ffffffff810406f4>] ?
autoremove_wake_function+0x0/0x2e
>>>>>   [<ffffffff81143a4d>] ? elv_rq_merge_ok+0x89/0x97
>>>>>   [<ffffffff8114a245>] ?
blk_recount_segments+0x17/0x27
>>>>>   [<ffffffff81147429>] ? __make_request+0x2d6/0x3fc
>>>>>   [<ffffffff81145b16>] ?
generic_make_request+0x207/0x268
>>>>>   [<ffffffff81145c12>] ? submit_bio+0x9b/0xa2
>>>>>   [<ffffffffa01aa081>] ? btrfs_requeue_work+0xd7/0xe1
[btrfs]
>>>>>   [<ffffffffa01a5365>] ?
run_scheduled_bios+0x297/0x48f [btrfs]
>>>>>   [<ffffffffa01aa687>] ? worker_loop+0x17c/0x452
[btrfs]
>>>>>   [<ffffffffa01aa50b>] ? worker_loop+0x0/0x452
[btrfs]
>>>>>   [<ffffffff81040331>] ? kthread+0x79/0x81
>>>>>   [<ffffffff81003674>] ?
kernel_thread_helper+0x4/0x10
>>>>>   [<ffffffff810402b8>] ? kthread+0x0/0x81
>>>>>   [<ffffffff81003670>] ?
kernel_thread_helper+0x0/0x10
>>>> This looks like the issue we saw too,
http://lkml.org/lkml/2010/6/8/375.
>>>> This is reproduceable in our setup.
>>>
>>> I think I know the cause of http://lkml.org/lkml/2010/6/8/375.
>>> The code in the first do-while loop in btrfs_commit_transaction
>>> set current process to TASK_UNINTERRUPTIBLE state, then calls
>>> btrfs_start_delalloc_inodes, btrfs_wait_ordered_extents and
>>> btrfs_run_ordered_operations(). All of these function may call
>>> cond_resched().
>> Hi,
>> When I test random write, I saw a lot of threads jump into
btree_writepages()
>> and do noting and io throughput is zero for some time. Looks like there
is a
>> live lock. See the code of btree_writepages():
>> 	if (wbc->sync_mode == WB_SYNC_NONE) {
>> 		struct btrfs_root *root = BTRFS_I(mapping->host)->root;
>> 		u64 num_dirty;
>> 		unsigned long thresh = 32 * 1024 * 1024;
>>
>> 		if (wbc->for_kupdate)
>> 			return 0;
>>
>> 		/* this is a bit racy, but that''s ok */
>> 		num_dirty = root->fs_info->dirty_metadata_bytes;
>>>>>>>> 		if (num_dirty < thresh)
>> 			return 0;
>> 	}
>> The marked line is quite intrusive. In my test, the live lock is caused
by the thresh
>> check. The dirty_metadata_bytes < 32M. Without it, I can''t
see the live lock. Not
>> sure if this is related to the hang.
>
> How much ram do you have?  The goal of the check is to avoid writing
> metadata blocks because once we write them we have to do more IO to cow
> them again if they are changed later.
This server has 16GB of ram on a x86_64 (dual opteron 275, ecc memory).
> It shouldn''t be looping hard in btrfs there, what was the
workload?
The workload was the extraction of large tarballs (one at the time, about 
300+ files extracted by second from a single tarball, which is pretty 
good), as you might expect, the disks were tested (read and write) for 
physical errors before I report this bug.


Jerome J. Ibanes

Chris Mason

2010-Jun-14 19:08 UTC

head link

Re: btrfs: hanging processes - race condition?

On Mon, Jun 14, 2010 at 11:12:53AM -0700, Jerome Ibanes
wrote:> On Mon, 14 Jun 2010, Chris Mason wrote:
> 
> >On Sun, Jun 13, 2010 at 02:50:06PM +0800, Shaohua Li wrote:
> >>On Fri, Jun 11, 2010 at 10:32:07AM +0800, Yan, Zheng  wrote:
> >>>On Fri, Jun 11, 2010 at 9:12 AM, Shaohua Li
<shaohua.li@intel.com> wrote:
> >>>>On Fri, Jun 11, 2010 at 01:41:41AM +0800, Jerome Ibanes
wrote:
> >>>>>List,
> >>>>>
> >>>>>I ran into a hang issue (race condition: cpu is high
when the server is
> >>>>>idle, meaning that btrfs is hanging, and IOwait is high
as well) running
> >>>>>2.6.34 on debian/lenny on a x86_64 server (dual Opteron
275 w/ 16GB ram).
> >>>>>The btrfs filesystem live on 18x300GB scsi spindles,
configured as Raid-0,
> >>>>>as shown below:
> >>>>>
> >>>>>Label: none  uuid: bc6442c6-2fe2-4236-a5aa-6b7841234c52
> >>>>>         Total devices 18 FS bytes used 2.94TB
> >>>>>         devid    5 size 279.39GB used 208.33GB path
/dev/cciss/c1d0
> >>>>>         devid   17 size 279.39GB used 208.34GB path
/dev/cciss/c1d8
> >>>>>         devid   16 size 279.39GB used 209.33GB path
/dev/cciss/c1d7
> >>>>>         devid    4 size 279.39GB used 208.33GB path
/dev/cciss/c0d4
> >>>>>         devid    1 size 279.39GB used 233.72GB path
/dev/cciss/c0d1
> >>>>>         devid   13 size 279.39GB used 208.33GB path
/dev/cciss/c1d4
> >>>>>         devid    8 size 279.39GB used 208.33GB path
/dev/cciss/c1d11
> >>>>>         devid   12 size 279.39GB used 208.33GB path
/dev/cciss/c1d3
> >>>>>         devid    3 size 279.39GB used 208.33GB path
/dev/cciss/c0d3
> >>>>>         devid    9 size 279.39GB used 208.33GB path
/dev/cciss/c1d12
> >>>>>         devid    6 size 279.39GB used 208.33GB path
/dev/cciss/c1d1
> >>>>>         devid   11 size 279.39GB used 208.33GB path
/dev/cciss/c1d2
> >>>>>         devid   14 size 279.39GB used 208.33GB path
/dev/cciss/c1d5
> >>>>>         devid    2 size 279.39GB used 233.70GB path
/dev/cciss/c0d2
> >>>>>         devid   15 size 279.39GB used 209.33GB path
/dev/cciss/c1d6
> >>>>>         devid   10 size 279.39GB used 208.33GB path
/dev/cciss/c1d13
> >>>>>         devid    7 size 279.39GB used 208.33GB path
/dev/cciss/c1d10
> >>>>>         devid   18 size 279.39GB used 208.34GB path
/dev/cciss/c1d9
> >>>>>Btrfs v0.19-16-g075587c-dirty
> >>>>>
> >>>>>The filesystem, mounted in /mnt/btrfs is hanging, no
existing or new
> >>>>>process can access it, however ''df''
still displays the disk usage (3TB out
> >>>>>of 5). The disks appear to be physically healthy.
Please note that a
> >>>>>significant number of files were placed on this
filesystem, between 20 and
> >>>>>30 million files.
> >>>>>
> >>>>>The relevant kernel messages are displayed below:
> >>>>>
> >>>>>INFO: task btrfs-submit-0:4220 blocked for more than
120 seconds.
> >>>>>"echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
> >>>>>btrfs-submit- D 000000010042e12f     0  4220      2
0x00000000
> >>>>>  ffff8803e584ac70 0000000000000046 0000000000004000
0000000000011680
> >>>>>  ffff8803f7349fd8 ffff8803f7349fd8 ffff8803e584ac70
0000000000011680
> >>>>>  0000000000000001 ffff8803ff99d250 ffffffff8149f020
0000000081150ab0
> >>>>>Call Trace:
> >>>>>  [<ffffffff813089f3>] ? io_schedule+0x71/0xb1
> >>>>>  [<ffffffff811470be>] ?
get_request_wait+0xab/0x140
> >>>>>  [<ffffffff810406f4>] ?
autoremove_wake_function+0x0/0x2e
> >>>>>  [<ffffffff81143a4d>] ?
elv_rq_merge_ok+0x89/0x97
> >>>>>  [<ffffffff8114a245>] ?
blk_recount_segments+0x17/0x27
> >>>>>  [<ffffffff81147429>] ?
__make_request+0x2d6/0x3fc
> >>>>>  [<ffffffff81145b16>] ?
generic_make_request+0x207/0x268
> >>>>>  [<ffffffff81145c12>] ? submit_bio+0x9b/0xa2
> >>>>>  [<ffffffffa01aa081>] ?
btrfs_requeue_work+0xd7/0xe1 [btrfs]
> >>>>>  [<ffffffffa01a5365>] ?
run_scheduled_bios+0x297/0x48f [btrfs]
> >>>>>  [<ffffffffa01aa687>] ? worker_loop+0x17c/0x452
[btrfs]
> >>>>>  [<ffffffffa01aa50b>] ? worker_loop+0x0/0x452
[btrfs]
> >>>>>  [<ffffffff81040331>] ? kthread+0x79/0x81
> >>>>>  [<ffffffff81003674>] ?
kernel_thread_helper+0x4/0x10
> >>>>>  [<ffffffff810402b8>] ? kthread+0x0/0x81
> >>>>>  [<ffffffff81003670>] ?
kernel_thread_helper+0x0/0x10
> >>>>This looks like the issue we saw too,
http://lkml.org/lkml/2010/6/8/375.
> >>>>This is reproduceable in our setup.
> >>>
> >>>I think I know the cause of http://lkml.org/lkml/2010/6/8/375.
> >>>The code in the first do-while loop in btrfs_commit_transaction
> >>>set current process to TASK_UNINTERRUPTIBLE state, then calls
> >>>btrfs_start_delalloc_inodes, btrfs_wait_ordered_extents and
> >>>btrfs_run_ordered_operations(). All of these function may call
> >>>cond_resched().
> >>Hi,
> >>When I test random write, I saw a lot of threads jump into
btree_writepages()
> >>and do noting and io throughput is zero for some time. Looks like
there is a
> >>live lock. See the code of btree_writepages():
> >>	if (wbc->sync_mode == WB_SYNC_NONE) {
> >>		struct btrfs_root *root = BTRFS_I(mapping->host)->root;
> >>		u64 num_dirty;
> >>		unsigned long thresh = 32 * 1024 * 1024;
> >>
> >>		if (wbc->for_kupdate)
> >>			return 0;
> >>
> >>		/* this is a bit racy, but that''s ok */
> >>		num_dirty = root->fs_info->dirty_metadata_bytes;
> >>>>>>>>		if (num_dirty < thresh)
> >>			return 0;
> >>	}
> >>The marked line is quite intrusive. In my test, the live lock is
caused by the thresh
> >>check. The dirty_metadata_bytes < 32M. Without it, I
can''t see the live lock. Not
> >>sure if this is related to the hang.
> >
> >How much ram do you have?  The goal of the check is to avoid writing
> >metadata blocks because once we write them we have to do more IO to cow
> >them again if they are changed later.
> 
> This server has 16GB of ram on a x86_64 (dual opteron 275, ecc memory).
> 
> >It shouldn''t be looping hard in btrfs there, what was the
workload?
> 
> The workload was the extraction of large tarballs (one at the time,
> about 300+ files extracted by second from a single tarball, which is
> pretty good), as you might expect, the disks were tested (read and
> write) for physical errors before I report this bug.
I think Zheng is right and this one will get fixed by the latest code.
The spinning writepage part should be a different problem.

-chris

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jerome Ibanes

2010-Jun-14 19:13 UTC

head link

Re: btrfs: hanging processes - race condition?

> On Mon, Jun 14, 2010 at 11:12:53AM -0700, Jerome Ibanes wrote:
>> On Mon, 14 Jun 2010, Chris Mason wrote:
>>
>>> On Sun, Jun 13, 2010 at 02:50:06PM +0800, Shaohua Li wrote:
>>>> On Fri, Jun 11, 2010 at 10:32:07AM +0800, Yan, Zheng  wrote:
>>>>> On Fri, Jun 11, 2010 at 9:12 AM, Shaohua Li
<shaohua.li@intel.com> wrote:
>>>>>> On Fri, Jun 11, 2010 at 01:41:41AM +0800, Jerome Ibanes
wrote:
>>>>>>> List,
>>>>>>>
>>>>>>> I ran into a hang issue (race condition: cpu is
high when the server is
>>>>>>> idle, meaning that btrfs is hanging, and IOwait is
high as well) running
>>>>>>> 2.6.34 on debian/lenny on a x86_64 server (dual
Opteron 275 w/ 16GB ram).
>>>>>>> The btrfs filesystem live on 18x300GB scsi
spindles, configured as Raid-0,
>>>>>>> as shown below:
>>>>>>>
>>>>>>> Label: none  uuid:
bc6442c6-2fe2-4236-a5aa-6b7841234c52
>>>>>>>          Total devices 18 FS bytes used 2.94TB
>>>>>>>          devid    5 size 279.39GB used 208.33GB
path /dev/cciss/c1d0
>>>>>>>          devid   17 size 279.39GB used 208.34GB
path /dev/cciss/c1d8
>>>>>>>          devid   16 size 279.39GB used 209.33GB
path /dev/cciss/c1d7
>>>>>>>          devid    4 size 279.39GB used 208.33GB
path /dev/cciss/c0d4
>>>>>>>          devid    1 size 279.39GB used 233.72GB
path /dev/cciss/c0d1
>>>>>>>          devid   13 size 279.39GB used 208.33GB
path /dev/cciss/c1d4
>>>>>>>          devid    8 size 279.39GB used 208.33GB
path /dev/cciss/c1d11
>>>>>>>          devid   12 size 279.39GB used 208.33GB
path /dev/cciss/c1d3
>>>>>>>          devid    3 size 279.39GB used 208.33GB
path /dev/cciss/c0d3
>>>>>>>          devid    9 size 279.39GB used 208.33GB
path /dev/cciss/c1d12
>>>>>>>          devid    6 size 279.39GB used 208.33GB
path /dev/cciss/c1d1
>>>>>>>          devid   11 size 279.39GB used 208.33GB
path /dev/cciss/c1d2
>>>>>>>          devid   14 size 279.39GB used 208.33GB
path /dev/cciss/c1d5
>>>>>>>          devid    2 size 279.39GB used 233.70GB
path /dev/cciss/c0d2
>>>>>>>          devid   15 size 279.39GB used 209.33GB
path /dev/cciss/c1d6
>>>>>>>          devid   10 size 279.39GB used 208.33GB
path /dev/cciss/c1d13
>>>>>>>          devid    7 size 279.39GB used 208.33GB
path /dev/cciss/c1d10
>>>>>>>          devid   18 size 279.39GB used 208.34GB
path /dev/cciss/c1d9
>>>>>>> Btrfs v0.19-16-g075587c-dirty
>>>>>>>
>>>>>>> The filesystem, mounted in /mnt/btrfs is hanging,
no existing or new
>>>>>>> process can access it, however
''df'' still displays the disk usage (3TB out
>>>>>>> of 5). The disks appear to be physically healthy.
Please note that a
>>>>>>> significant number of files were placed on this
filesystem, between 20 and
>>>>>>> 30 million files.
>>>>>>>
>>>>>>> The relevant kernel messages are displayed below:
>>>>>>>
>>>>>>> INFO: task btrfs-submit-0:4220 blocked for more
than 120 seconds.
>>>>>>> "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>>>>>> btrfs-submit- D 000000010042e12f     0  4220      2
0x00000000
>>>>>>>   ffff8803e584ac70 0000000000000046
0000000000004000 0000000000011680
>>>>>>>   ffff8803f7349fd8 ffff8803f7349fd8
ffff8803e584ac70 0000000000011680
>>>>>>>   0000000000000001 ffff8803ff99d250
ffffffff8149f020 0000000081150ab0
>>>>>>> Call Trace:
>>>>>>>   [<ffffffff813089f3>] ?
io_schedule+0x71/0xb1
>>>>>>>   [<ffffffff811470be>] ?
get_request_wait+0xab/0x140
>>>>>>>   [<ffffffff810406f4>] ?
autoremove_wake_function+0x0/0x2e
>>>>>>>   [<ffffffff81143a4d>] ?
elv_rq_merge_ok+0x89/0x97
>>>>>>>   [<ffffffff8114a245>] ?
blk_recount_segments+0x17/0x27
>>>>>>>   [<ffffffff81147429>] ?
__make_request+0x2d6/0x3fc
>>>>>>>   [<ffffffff81145b16>] ?
generic_make_request+0x207/0x268
>>>>>>>   [<ffffffff81145c12>] ? submit_bio+0x9b/0xa2
>>>>>>>   [<ffffffffa01aa081>] ?
btrfs_requeue_work+0xd7/0xe1 [btrfs]
>>>>>>>   [<ffffffffa01a5365>] ?
run_scheduled_bios+0x297/0x48f [btrfs]
>>>>>>>   [<ffffffffa01aa687>] ?
worker_loop+0x17c/0x452 [btrfs]
>>>>>>>   [<ffffffffa01aa50b>] ?
worker_loop+0x0/0x452 [btrfs]
>>>>>>>   [<ffffffff81040331>] ? kthread+0x79/0x81
>>>>>>>   [<ffffffff81003674>] ?
kernel_thread_helper+0x4/0x10
>>>>>>>   [<ffffffff810402b8>] ? kthread+0x0/0x81
>>>>>>>   [<ffffffff81003670>] ?
kernel_thread_helper+0x0/0x10
>>>>>> This looks like the issue we saw too,
http://lkml.org/lkml/2010/6/8/375.
>>>>>> This is reproduceable in our setup.
>>>>>
>>>>> I think I know the cause of
http://lkml.org/lkml/2010/6/8/375.
>>>>> The code in the first do-while loop in
btrfs_commit_transaction
>>>>> set current process to TASK_UNINTERRUPTIBLE state, then
calls
>>>>> btrfs_start_delalloc_inodes, btrfs_wait_ordered_extents and
>>>>> btrfs_run_ordered_operations(). All of these function may
call
>>>>> cond_resched().
>>>> Hi,
>>>> When I test random write, I saw a lot of threads jump into
btree_writepages()
>>>> and do noting and io throughput is zero for some time. Looks
like there is a
>>>> live lock. See the code of btree_writepages():
>>>> 	if (wbc->sync_mode == WB_SYNC_NONE) {
>>>> 		struct btrfs_root *root = BTRFS_I(mapping->host)->root;
>>>> 		u64 num_dirty;
>>>> 		unsigned long thresh = 32 * 1024 * 1024;
>>>>
>>>> 		if (wbc->for_kupdate)
>>>> 			return 0;
>>>>
>>>> 		/* this is a bit racy, but that''s ok */
>>>> 		num_dirty = root->fs_info->dirty_metadata_bytes;
>>>>>>>>>> 		if (num_dirty < thresh)
>>>> 			return 0;
>>>> 	}
>>>> The marked line is quite intrusive. In my test, the live lock
is caused by the thresh
>>>> check. The dirty_metadata_bytes < 32M. Without it, I
can''t see the live lock. Not
>>>> sure if this is related to the hang.
>>>
>>> How much ram do you have?  The goal of the check is to avoid
writing
>>> metadata blocks because once we write them we have to do more IO to
cow
>>> them again if they are changed later.
>>
>> This server has 16GB of ram on a x86_64 (dual opteron 275, ecc memory).
>>
>>> It shouldn''t be looping hard in btrfs there, what was the
workload?
>>
>> The workload was the extraction of large tarballs (one at the time,
>> about 300+ files extracted by second from a single tarball, which is
>> pretty good), as you might expect, the disks were tested (read and
>> write) for physical errors before I report this bug.
>
> I think Zheng is right and this one will get fixed by the latest code.
> The spinning writepage part should be a different problem.
I''m trying to repro with 2.6.35-rc3, expect results within 24 hours.


Jerome J. Ibanes

Jerome Ibanes

2010-Jun-16 18:12 UTC

head link

Re: btrfs: hanging processes - race condition?

On Mon, 14 Jun 2010, Jerome Ibanes wrote:
>>  On Mon, Jun 14, 2010 at 11:12:53AM -0700, Jerome Ibanes wrote:
>> >  On Mon, 14 Jun 2010, Chris Mason wrote:
>> > 
>> > >  On Sun, Jun 13, 2010 at 02:50:06PM +0800, Shaohua Li wrote:
>> > > >  On Fri, Jun 11, 2010 at 10:32:07AM +0800, Yan, Zheng 
wrote:
>> > > > >  On Fri, Jun 11, 2010 at 9:12 AM, Shaohua Li
<shaohua.li@intel.com>
>> > > > >  wrote:
>> > > > > >  On Fri, Jun 11, 2010 at 01:41:41AM +0800,
Jerome Ibanes wrote:
>> > > > > > >  List,
>> > > > > > > 
>> > > > > > >  I ran into a hang issue (race condition:
cpu is high when the
>> > > > > > >  server is
>> > > > > > >  idle, meaning that btrfs is hanging, and
IOwait is high as
>> > > > > > >  well) running
>> > > > > > >  2.6.34 on debian/lenny on a x86_64
server (dual Opteron 275 w/
>> > > > > > >  16GB ram).
>> > > > > > >  The btrfs filesystem live on 18x300GB
scsi spindles,
>> > > > > > >  configured as Raid-0,
>> > > > > > >  as shown below:
>> > > > > > > 
>> > > > > > >  Label: none uuid:
bc6442c6-2fe2-4236-a5aa-6b7841234c52
>> > > > > > >          Total devices 18 FS bytes used
2.94TB
>> > > > > > >          devid  5 size 279.39GB used
208.33GB path
>> > > > > > >      /dev/cciss/c1d0
>> > > > > > >          devid  17 size 279.39GB used
208.34GB path
>> > > > > > >      /dev/cciss/c1d8
>> > > > > > >          devid  16 size 279.39GB used
209.33GB path
>> > > > > > >      /dev/cciss/c1d7
>> > > > > > >          devid  4 size 279.39GB used
208.33GB path
>> > > > > > >      /dev/cciss/c0d4
>> > > > > > >          devid  1 size 279.39GB used
233.72GB path
>> > > > > > >      /dev/cciss/c0d1
>> > > > > > >          devid  13 size 279.39GB used
208.33GB path
>> > > > > > >      /dev/cciss/c1d4
>> > > > > > >          devid  8 size 279.39GB used
208.33GB path
>> > > > > > >      /dev/cciss/c1d11
>> > > > > > >          devid  12 size 279.39GB used
208.33GB path
>> > > > > > >      /dev/cciss/c1d3
>> > > > > > >          devid  3 size 279.39GB used
208.33GB path
>> > > > > > >      /dev/cciss/c0d3
>> > > > > > >          devid  9 size 279.39GB used
208.33GB path
>> > > > > > >      /dev/cciss/c1d12
>> > > > > > >          devid  6 size 279.39GB used
208.33GB path
>> > > > > > >      /dev/cciss/c1d1
>> > > > > > >          devid  11 size 279.39GB used
208.33GB path
>> > > > > > >      /dev/cciss/c1d2
>> > > > > > >          devid  14 size 279.39GB used
208.33GB path
>> > > > > > >      /dev/cciss/c1d5
>> > > > > > >          devid  2 size 279.39GB used
233.70GB path
>> > > > > > >      /dev/cciss/c0d2
>> > > > > > >          devid  15 size 279.39GB used
209.33GB path
>> > > > > > >      /dev/cciss/c1d6
>> > > > > > >          devid  10 size 279.39GB used
208.33GB path
>> > > > > > >      /dev/cciss/c1d13
>> > > > > > >          devid  7 size 279.39GB used
208.33GB path
>> > > > > > >      /dev/cciss/c1d10
>> > > > > > >          devid  18 size 279.39GB used
208.34GB path
>> > > > > > >      /dev/cciss/c1d9
>> > > > > > >  Btrfs v0.19-16-g075587c-dirty
>> > > > > > > 
>> > > > > > >  The filesystem, mounted in /mnt/btrfs is
hanging, no existing
>> > > > > > >  or new
>> > > > > > >  process can access it, however
''df'' still displays the disk
>> > > > > > >  usage (3TB out
>> > > > > > >  of 5). The disks appear to be physically
healthy. Please note
>> > > > > > >  that a
>> > > > > > >  significant number of files were placed
on this filesystem,
>> > > > > > >  between 20 and
>> > > > > > >  30 million files.
>> > > > > > > 
>> > > > > > >  The relevant kernel messages are
displayed below:
>> > > > > > > 
>> > > > > > >  INFO: task btrfs-submit-0:4220 blocked
for more than 120
>> > > > > > >  seconds.
>> > > > > > >  "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables
>> > > > > > >  this message.
>> > > > > > >  btrfs-submit- D 000000010042e12f   0
4220   2 0x00000000
>> > > > > > >    ffff8803e584ac70 0000000000000046
0000000000004000
>> > > > > > >   0000000000011680
>> > > > > > >    ffff8803f7349fd8 ffff8803f7349fd8
ffff8803e584ac70
>> > > > > > >   0000000000011680
>> > > > > > >    0000000000000001 ffff8803ff99d250
ffffffff8149f020
>> > > > > > >   0000000081150ab0
>> > > > > > >  Call Trace:
>> > > > > > >    [<ffffffff813089f3>] ?
io_schedule+0x71/0xb1
>> > > > > > >    [<ffffffff811470be>] ?
get_request_wait+0xab/0x140
>> > > > > > >    [<ffffffff810406f4>] ?
autoremove_wake_function+0x0/0x2e
>> > > > > > >    [<ffffffff81143a4d>] ?
elv_rq_merge_ok+0x89/0x97
>> > > > > > >    [<ffffffff8114a245>] ?
blk_recount_segments+0x17/0x27
>> > > > > > >    [<ffffffff81147429>] ?
__make_request+0x2d6/0x3fc
>> > > > > > >    [<ffffffff81145b16>] ?
generic_make_request+0x207/0x268
>> > > > > > >    [<ffffffff81145c12>] ?
submit_bio+0x9b/0xa2
>> > > > > > >    [<ffffffffa01aa081>] ?
btrfs_requeue_work+0xd7/0xe1
>> > > > > > >   [btrfs]
>> > > > > > >    [<ffffffffa01a5365>] ?
run_scheduled_bios+0x297/0x48f
>> > > > > > >   [btrfs]
>> > > > > > >    [<ffffffffa01aa687>] ?
worker_loop+0x17c/0x452 [btrfs]
>> > > > > > >    [<ffffffffa01aa50b>] ?
worker_loop+0x0/0x452 [btrfs]
>> > > > > > >    [<ffffffff81040331>] ?
kthread+0x79/0x81
>> > > > > > >    [<ffffffff81003674>] ?
kernel_thread_helper+0x4/0x10
>> > > > > > >    [<ffffffff810402b8>] ?
kthread+0x0/0x81
>> > > > > > >    [<ffffffff81003670>] ?
kernel_thread_helper+0x0/0x10
>> > > > > >  This looks like the issue we saw too, 
>> > > > > >  http://lkml.org/lkml/2010/6/8/375.
>> > > > > >  This is reproduceable in our setup.
>> > > > > 
>> > > > >  I think I know the cause of
http://lkml.org/lkml/2010/6/8/375.
>> > > > >  The code in the first do-while loop in
btrfs_commit_transaction
>> > > > >  set current process to TASK_UNINTERRUPTIBLE state,
then calls
>> > > > >  btrfs_start_delalloc_inodes,
btrfs_wait_ordered_extents and
>> > > > >  btrfs_run_ordered_operations(). All of these
function may call
>> > > > >  cond_resched().
>> > > >  Hi,
>> > > >  When I test random write, I saw a lot of threads jump
into
>> > > >  btree_writepages()
>> > > >  and do noting and io throughput is zero for some time.
Looks like
>> > > >  there is a
>> > > >  live lock. See the code of btree_writepages():
>> > > >   if (wbc->sync_mode == WB_SYNC_NONE) {
>> > > >    struct btrfs_root *root =
BTRFS_I(mapping->host)->root;
>> > > >    u64 num_dirty;
>> > > >    unsigned long thresh = 32 * 1024 * 1024;
>> > > > 
>> > > >    if (wbc->for_kupdate)
>> > > >     return 0;
>> > > > 
>> > > >    /* this is a bit racy, but that''s ok */
>> > > >    num_dirty =
root->fs_info->dirty_metadata_bytes;
>> > > > > > > > > >    if (num_dirty <
thresh)
>> > > >   		return 0;
>> > > >  	}
>> > > >  The marked line is quite intrusive. In my test, the
live lock is
>> > > >  caused by the thresh
>> > > >  check. The dirty_metadata_bytes < 32M. Without it, I
can''t see the
>> > > >  live lock. Not
>> > > >  sure if this is related to the hang.
>> > > 
>> > >  How much ram do you have?  The goal of the check is to avoid
writing
>> > >  metadata blocks because once we write them we have to do
more IO to
>> > >  cow
>> > >  them again if they are changed later.
>> > 
>> >  This server has 16GB of ram on a x86_64 (dual opteron 275, ecc
memory).
>> > 
>> > >  It shouldn''t be looping hard in btrfs there, what
was the workload?
>> > 
>> >  The workload was the extraction of large tarballs (one at the
time,
>> >  about 300+ files extracted by second from a single tarball, which
is
>> >  pretty good), as you might expect, the disks were tested (read
and
>> >  write) for physical errors before I report this bug.
>>
>>  I think Zheng is right and this one will get fixed by the latest code.
>>  The spinning writepage part should be a different problem.
>
> I''m trying to repro with 2.6.35-rc3, expect results within 24
hours.
I can no longer repro this issue (after 48 hours of filesystem stress) 
under 2.6.35-rc3 on a filesystem with over 50 million files. I will reopen 
this thread should this reoccur.


Jerome J. Ibanes

Shaohua Li

2010-Jun-17 01:41 UTC

head link

Re: btrfs: hanging processes - race condition?

On Mon, Jun 14, 2010 at 09:28:29PM +0800, Chris Mason
wrote:> On Sun, Jun 13, 2010 at 02:50:06PM +0800, Shaohua Li wrote:
> > On Fri, Jun 11, 2010 at 10:32:07AM +0800, Yan, Zheng  wrote:
> > > On Fri, Jun 11, 2010 at 9:12 AM, Shaohua Li
<shaohua.li@intel.com> wrote:
> > > > On Fri, Jun 11, 2010 at 01:41:41AM +0800, Jerome Ibanes
wrote:
> > > >> List,
> > > >>
> > > >> I ran into a hang issue (race condition: cpu is high
when the server is
> > > >> idle, meaning that btrfs is hanging, and IOwait is high
as well) running
> > > >> 2.6.34 on debian/lenny on a x86_64 server (dual Opteron
275 w/ 16GB ram).
> > > >> The btrfs filesystem live on 18x300GB scsi spindles,
configured as Raid-0,
> > > >> as shown below:
> > > >>
> > > >> Label: none  uuid: bc6442c6-2fe2-4236-a5aa-6b7841234c52
> > > >>          Total devices 18 FS bytes used 2.94TB
> > > >>          devid    5 size 279.39GB used 208.33GB path
/dev/cciss/c1d0
> > > >>          devid   17 size 279.39GB used 208.34GB path
/dev/cciss/c1d8
> > > >>          devid   16 size 279.39GB used 209.33GB path
/dev/cciss/c1d7
> > > >>          devid    4 size 279.39GB used 208.33GB path
/dev/cciss/c0d4
> > > >>          devid    1 size 279.39GB used 233.72GB path
/dev/cciss/c0d1
> > > >>          devid   13 size 279.39GB used 208.33GB path
/dev/cciss/c1d4
> > > >>          devid    8 size 279.39GB used 208.33GB path
/dev/cciss/c1d11
> > > >>          devid   12 size 279.39GB used 208.33GB path
/dev/cciss/c1d3
> > > >>          devid    3 size 279.39GB used 208.33GB path
/dev/cciss/c0d3
> > > >>          devid    9 size 279.39GB used 208.33GB path
/dev/cciss/c1d12
> > > >>          devid    6 size 279.39GB used 208.33GB path
/dev/cciss/c1d1
> > > >>          devid   11 size 279.39GB used 208.33GB path
/dev/cciss/c1d2
> > > >>          devid   14 size 279.39GB used 208.33GB path
/dev/cciss/c1d5
> > > >>          devid    2 size 279.39GB used 233.70GB path
/dev/cciss/c0d2
> > > >>          devid   15 size 279.39GB used 209.33GB path
/dev/cciss/c1d6
> > > >>          devid   10 size 279.39GB used 208.33GB path
/dev/cciss/c1d13
> > > >>          devid    7 size 279.39GB used 208.33GB path
/dev/cciss/c1d10
> > > >>          devid   18 size 279.39GB used 208.34GB path
/dev/cciss/c1d9
> > > >> Btrfs v0.19-16-g075587c-dirty
> > > >>
> > > >> The filesystem, mounted in /mnt/btrfs is hanging, no
existing or new
> > > >> process can access it, however ''df''
still displays the disk usage (3TB out
> > > >> of 5). The disks appear to be physically healthy. Please
note that a
> > > >> significant number of files were placed on this
filesystem, between 20 and
> > > >> 30 million files.
> > > >>
> > > >> The relevant kernel messages are displayed below:
> > > >>
> > > >> INFO: task btrfs-submit-0:4220 blocked for more than 120
seconds.
> > > >> "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > > >> btrfs-submit- D 000000010042e12f     0  4220      2
0x00000000
> > > >>   ffff8803e584ac70 0000000000000046 0000000000004000
0000000000011680
> > > >>   ffff8803f7349fd8 ffff8803f7349fd8 ffff8803e584ac70
0000000000011680
> > > >>   0000000000000001 ffff8803ff99d250 ffffffff8149f020
0000000081150ab0
> > > >> Call Trace:
> > > >>   [<ffffffff813089f3>] ? io_schedule+0x71/0xb1
> > > >>   [<ffffffff811470be>] ?
get_request_wait+0xab/0x140
> > > >>   [<ffffffff810406f4>] ?
autoremove_wake_function+0x0/0x2e
> > > >>   [<ffffffff81143a4d>] ? elv_rq_merge_ok+0x89/0x97
> > > >>   [<ffffffff8114a245>] ?
blk_recount_segments+0x17/0x27
> > > >>   [<ffffffff81147429>] ?
__make_request+0x2d6/0x3fc
> > > >>   [<ffffffff81145b16>] ?
generic_make_request+0x207/0x268
> > > >>   [<ffffffff81145c12>] ? submit_bio+0x9b/0xa2
> > > >>   [<ffffffffa01aa081>] ?
btrfs_requeue_work+0xd7/0xe1 [btrfs]
> > > >>   [<ffffffffa01a5365>] ?
run_scheduled_bios+0x297/0x48f [btrfs]
> > > >>   [<ffffffffa01aa687>] ? worker_loop+0x17c/0x452
[btrfs]
> > > >>   [<ffffffffa01aa50b>] ? worker_loop+0x0/0x452
[btrfs]
> > > >>   [<ffffffff81040331>] ? kthread+0x79/0x81
> > > >>   [<ffffffff81003674>] ?
kernel_thread_helper+0x4/0x10
> > > >>   [<ffffffff810402b8>] ? kthread+0x0/0x81
> > > >>   [<ffffffff81003670>] ?
kernel_thread_helper+0x0/0x10
> > > > This looks like the issue we saw too,
http://lkml.org/lkml/2010/6/8/375.
> > > > This is reproduceable in our setup.
> > > 
> > > I think I know the cause of http://lkml.org/lkml/2010/6/8/375.
> > > The code in the first do-while loop in btrfs_commit_transaction
> > > set current process to TASK_UNINTERRUPTIBLE state, then calls
> > > btrfs_start_delalloc_inodes, btrfs_wait_ordered_extents and
> > > btrfs_run_ordered_operations(). All of these function may call
> > > cond_resched().
> > Hi,
> > When I test random write, I saw a lot of threads jump into
btree_writepages()
> > and do noting and io throughput is zero for some time. Looks like
there is a
> > live lock. See the code of btree_writepages():
> > 	if (wbc->sync_mode == WB_SYNC_NONE) {
> > 		struct btrfs_root *root = BTRFS_I(mapping->host)->root;
> > 		u64 num_dirty;
> > 		unsigned long thresh = 32 * 1024 * 1024;
> > 
> > 		if (wbc->for_kupdate)
> > 			return 0;
> > 
> > 		/* this is a bit racy, but that''s ok */
> > 		num_dirty = root->fs_info->dirty_metadata_bytes;
> > >>>>>>		if (num_dirty < thresh)
> > 			return 0;
> > 	}
> > The marked line is quite intrusive. In my test, the live lock is
caused by the thresh
> > check. The dirty_metadata_bytes < 32M. Without it, I can''t
see the live lock. Not
> > sure if this is related to the hang.
> 
> How much ram do you have?  The goal of the check is to avoid writing
> metadata blocks because once we write them we have to do more IO to cow
> them again if they are changed later.
> 
> It shouldn''t be looping hard in btrfs there, what was the
workload?This is a fio randomwrite. Yep, I limited memory to a small size (~500M),
because it makes
me easily to produce a ''xxx blocked for more than 120 seconds''
issue. I can understand small
memory could be an issue, but this still looks intrusive, right?

The issue Yanmin reported is under 2.6.35-rc1, so might not be the
''TASK_UNINTERRUPTIBLE'' issue, but we will try -rc3 too.

Thanks,
Shaohua
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Shaohua Li

2010-Jun-18 00:57 UTC

head link

Re: btrfs: hanging processes - race condition?

On Thu, Jun 17, 2010 at 09:41:18AM +0800, Shaohua Li
wrote:> On Mon, Jun 14, 2010 at 09:28:29PM +0800, Chris Mason wrote:
> > On Sun, Jun 13, 2010 at 02:50:06PM +0800, Shaohua Li wrote:
> > > On Fri, Jun 11, 2010 at 10:32:07AM +0800, Yan, Zheng  wrote:
> > > > On Fri, Jun 11, 2010 at 9:12 AM, Shaohua Li
<shaohua.li@intel.com> wrote:
> > > > > On Fri, Jun 11, 2010 at 01:41:41AM +0800, Jerome Ibanes
wrote:
> > > > >> List,
> > > > >>
> > > > >> I ran into a hang issue (race condition: cpu is
high when the server is
> > > > >> idle, meaning that btrfs is hanging, and IOwait is
high as well) running
> > > > >> 2.6.34 on debian/lenny on a x86_64 server (dual
Opteron 275 w/ 16GB ram).
> > > > >> The btrfs filesystem live on 18x300GB scsi
spindles, configured as Raid-0,
> > > > >> as shown below:
> > > > >>
> > > > >> Label: none  uuid:
bc6442c6-2fe2-4236-a5aa-6b7841234c52
> > > > >>          Total devices 18 FS bytes used 2.94TB
> > > > >>          devid    5 size 279.39GB used 208.33GB
path /dev/cciss/c1d0
> > > > >>          devid   17 size 279.39GB used 208.34GB
path /dev/cciss/c1d8
> > > > >>          devid   16 size 279.39GB used 209.33GB
path /dev/cciss/c1d7
> > > > >>          devid    4 size 279.39GB used 208.33GB
path /dev/cciss/c0d4
> > > > >>          devid    1 size 279.39GB used 233.72GB
path /dev/cciss/c0d1
> > > > >>          devid   13 size 279.39GB used 208.33GB
path /dev/cciss/c1d4
> > > > >>          devid    8 size 279.39GB used 208.33GB
path /dev/cciss/c1d11
> > > > >>          devid   12 size 279.39GB used 208.33GB
path /dev/cciss/c1d3
> > > > >>          devid    3 size 279.39GB used 208.33GB
path /dev/cciss/c0d3
> > > > >>          devid    9 size 279.39GB used 208.33GB
path /dev/cciss/c1d12
> > > > >>          devid    6 size 279.39GB used 208.33GB
path /dev/cciss/c1d1
> > > > >>          devid   11 size 279.39GB used 208.33GB
path /dev/cciss/c1d2
> > > > >>          devid   14 size 279.39GB used 208.33GB
path /dev/cciss/c1d5
> > > > >>          devid    2 size 279.39GB used 233.70GB
path /dev/cciss/c0d2
> > > > >>          devid   15 size 279.39GB used 209.33GB
path /dev/cciss/c1d6
> > > > >>          devid   10 size 279.39GB used 208.33GB
path /dev/cciss/c1d13
> > > > >>          devid    7 size 279.39GB used 208.33GB
path /dev/cciss/c1d10
> > > > >>          devid   18 size 279.39GB used 208.34GB
path /dev/cciss/c1d9
> > > > >> Btrfs v0.19-16-g075587c-dirty
> > > > >>
> > > > >> The filesystem, mounted in /mnt/btrfs is hanging,
no existing or new
> > > > >> process can access it, however
''df'' still displays the disk usage (3TB out
> > > > >> of 5). The disks appear to be physically healthy.
Please note that a
> > > > >> significant number of files were placed on this
filesystem, between 20 and
> > > > >> 30 million files.
> > > > >>
> > > > >> The relevant kernel messages are displayed below:
> > > > >>
> > > > >> INFO: task btrfs-submit-0:4220 blocked for more
than 120 seconds.
> > > > >> "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > > > >> btrfs-submit- D 000000010042e12f     0  4220      2
0x00000000
> > > > >>   ffff8803e584ac70 0000000000000046
0000000000004000 0000000000011680
> > > > >>   ffff8803f7349fd8 ffff8803f7349fd8
ffff8803e584ac70 0000000000011680
> > > > >>   0000000000000001 ffff8803ff99d250
ffffffff8149f020 0000000081150ab0
> > > > >> Call Trace:
> > > > >>   [<ffffffff813089f3>] ?
io_schedule+0x71/0xb1
> > > > >>   [<ffffffff811470be>] ?
get_request_wait+0xab/0x140
> > > > >>   [<ffffffff810406f4>] ?
autoremove_wake_function+0x0/0x2e
> > > > >>   [<ffffffff81143a4d>] ?
elv_rq_merge_ok+0x89/0x97
> > > > >>   [<ffffffff8114a245>] ?
blk_recount_segments+0x17/0x27
> > > > >>   [<ffffffff81147429>] ?
__make_request+0x2d6/0x3fc
> > > > >>   [<ffffffff81145b16>] ?
generic_make_request+0x207/0x268
> > > > >>   [<ffffffff81145c12>] ? submit_bio+0x9b/0xa2
> > > > >>   [<ffffffffa01aa081>] ?
btrfs_requeue_work+0xd7/0xe1 [btrfs]
> > > > >>   [<ffffffffa01a5365>] ?
run_scheduled_bios+0x297/0x48f [btrfs]
> > > > >>   [<ffffffffa01aa687>] ?
worker_loop+0x17c/0x452 [btrfs]
> > > > >>   [<ffffffffa01aa50b>] ?
worker_loop+0x0/0x452 [btrfs]
> > > > >>   [<ffffffff81040331>] ? kthread+0x79/0x81
> > > > >>   [<ffffffff81003674>] ?
kernel_thread_helper+0x4/0x10
> > > > >>   [<ffffffff810402b8>] ? kthread+0x0/0x81
> > > > >>   [<ffffffff81003670>] ?
kernel_thread_helper+0x0/0x10
> > > > > This looks like the issue we saw too,
http://lkml.org/lkml/2010/6/8/375.
> > > > > This is reproduceable in our setup.
> > > > 
> > > > I think I know the cause of
http://lkml.org/lkml/2010/6/8/375.
> > > > The code in the first do-while loop in
btrfs_commit_transaction
> > > > set current process to TASK_UNINTERRUPTIBLE state, then
calls
> > > > btrfs_start_delalloc_inodes, btrfs_wait_ordered_extents and
> > > > btrfs_run_ordered_operations(). All of these function may
call
> > > > cond_resched().
> > > Hi,
> > > When I test random write, I saw a lot of threads jump into
btree_writepages()
> > > and do noting and io throughput is zero for some time. Looks like
there is a
> > > live lock. See the code of btree_writepages():
> > > 	if (wbc->sync_mode == WB_SYNC_NONE) {
> > > 		struct btrfs_root *root = BTRFS_I(mapping->host)->root;
> > > 		u64 num_dirty;
> > > 		unsigned long thresh = 32 * 1024 * 1024;
> > > 
> > > 		if (wbc->for_kupdate)
> > > 			return 0;
> > > 
> > > 		/* this is a bit racy, but that''s ok */
> > > 		num_dirty = root->fs_info->dirty_metadata_bytes;
> > > >>>>>>		if (num_dirty < thresh)
> > > 			return 0;
> > > 	}
> > > The marked line is quite intrusive. In my test, the live lock is
caused by the thresh
> > > check. The dirty_metadata_bytes < 32M. Without it, I
can''t see the live lock. Not
> > > sure if this is related to the hang.
> > 
> > How much ram do you have?  The goal of the check is to avoid writing
> > metadata blocks because once we write them we have to do more IO to
cow
> > them again if they are changed later.
> > 
> > It shouldn''t be looping hard in btrfs there, what was the
workload?
> This is a fio randomwrite. Yep, I limited memory to a small size (~500M),
because it makes
> me easily to produce a ''xxx blocked for more than 120
seconds'' issue. I can understand small
> memory could be an issue, but this still looks intrusive, right?
> 
> The issue Yanmin reported is under 2.6.35-rc1, so might not be the
> ''TASK_UNINTERRUPTIBLE'' issue, but we will try -rc3 too.I still get below message with 2.6.35-rc3. The system is still running, because
my fio test finished even with the message.

INFO: task flush-btrfs-134:14144 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
flush-btrfs-1 D 0000000100346d5c  4480 14144      2 0x00000000
 ffff88016fd51530 0000000000000046 0000000000000001 ffff880100000000
 ffff88023ef0f100 0000000000013ac0 0000000000013ac0 0000000000004000
 0000000000013ac0 ffff88018be2dfd8 ffff88016fd51530 ffff88018be2dfd8
Call Trace:
 [<ffffffff8124be61>] ? wait_block_group_cache_progress+0xc0/0xe4
 [<ffffffff81052977>] ? autoremove_wake_function+0x0/0x2a
 [<ffffffff81052977>] ? autoremove_wake_function+0x0/0x2a
 [<ffffffff8124f956>] ? find_free_extent+0x694/0x9c4
 [<ffffffff8124fd53>] ? btrfs_reserve_extent+0xcd/0x189
 [<ffffffff81262803>] ? cow_file_range+0x19e/0x2fc
 [<ffffffff81262fe0>] ? run_delalloc_range+0xa7/0x393
 [<ffffffff8127832e>] ? test_range_bit+0x2b/0x127
 [<ffffffff8127b44e>] ? find_lock_delalloc_range+0x1af/0x1d1
 [<ffffffff8127b655>] ? __extent_writepage+0x1e5/0x61f
 [<ffffffff812c1fac>] ? prio_tree_next+0x1c0/0x221
 [<ffffffff812bf3c8>] ? cpumask_any_but+0x28/0x37
 [<ffffffff810b4af7>] ? page_mkclean+0x120/0x148
 [<ffffffff8127bed2>] ? extent_write_cache_pages.clone.0+0x15e/0x26c
 [<ffffffff8127c0db>] ? extent_writepages+0x41/0x5a
 [<ffffffff81264319>] ? btrfs_get_extent+0x0/0x798
 [<ffffffff810e6fcb>] ? writeback_single_inode+0xd1/0x2e8
 [<ffffffff810e7d02>] ? writeback_inodes_wb+0x40d/0x51f
 [<ffffffff810e7f47>] ? wb_writeback+0x133/0x1b2
 [<ffffffff810e81ae>] ? wb_do_writeback+0x148/0x15e
 [<ffffffff810e81fe>] ? bdi_writeback_task+0x3a/0x113
 [<ffffffff81052897>] ? bit_waitqueue+0x14/0xa4
 [<ffffffff810a9b19>] ? bdi_start_fn+0x0/0xc2
 [<ffffffff810a9b7c>] ? bdi_start_fn+0x63/0xc2
 [<ffffffff810a9b19>] ? bdi_start_fn+0x0/0xc2
 [<ffffffff81052525>] ? kthread+0x75/0x7d
 [<ffffffff81003654>] ? kernel_thread_helper+0x4/0x10
 [<ffffffff810524b0>] ? kthread+0x0/0x7d
 [<ffffffff81003650>] ? kernel_thread_helper+0x0/0x10
 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Btrfs devel - Jun 2010 - btrfs: hanging processes - race condition?

btrfs: hanging processes - race condition?

Re: btrfs: hanging processes - race condition?

Re: btrfs: hanging processes - race condition?

Re: btrfs: hanging processes - race condition?

Re: btrfs: hanging processes - race condition?

Re: btrfs: hanging processes - race condition?

Re: btrfs: hanging processes - race condition?

Re: btrfs: hanging processes - race condition?

Re: btrfs: hanging processes - race condition?

Re: btrfs: hanging processes - race condition?

Re: btrfs: hanging processes - race condition?

Re: btrfs: hanging processes - race condition?