Hello list, i''ve seen this deadlock today and can reproduce it while using ceph. Is this a known bug? [ 599.514534] INFO: task kworker/6:0:29 blocked for more than 120 seconds. [ 599.530394] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 599.546832] kworker/6:0 D ffffffff8180ebc0 0 29 2 0x00000000 [ 599.563725] ffff880e78f4dc90 0000000000000046 ffff880e78f4a180 0000000000012280 [ 599.580812] ffff880e78f4dfd8 ffff880e78f4c010 0000000000012280 0000000000012280 [ 599.580813] ffff880e78f4dfd8 0000000000012280 ffff880e78e8c340 ffff880e78f4a180 [ 599.580814] Call Trace: [ 599.580820] [<ffffffff81620334>] schedule+0x24/0x70 [ 599.580832] [<ffffffffa009b4c5>] wait_for_commit+0x55/0x90 [btrfs] [ 599.580836] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 [ 599.580844] [<ffffffffa009d095>] btrfs_commit_transaction+0x655/0xab0 [btrfs] [ 599.580846] [<ffffffff81076058>] ? idle_balance+0x108/0x130 [ 599.580847] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 [ 599.580853] [<ffffffffa009d7f0>] ? btrfs_end_transaction+0x20/0x20 [btrfs] [ 599.580859] [<ffffffffa009d80a>] do_async_commit+0x1a/0x30 [btrfs] [ 599.580861] [<ffffffff81058eff>] process_one_work+0x11f/0x470 [ 599.580862] [<ffffffff8105b128>] worker_thread+0x178/0x400 [ 599.580863] [<ffffffff8105afb0>] ? manage_workers+0x210/0x210 [ 599.580865] [<ffffffff8105fc46>] kthread+0x96/0xa0 [ 599.580866] [<ffffffff81622dd4>] kernel_thread_helper+0x4/0x10 [ 599.580868] [<ffffffff8105fbb0>] ? kthread_worker_fn+0x130/0x130 [ 599.580869] [<ffffffff81622dd0>] ? gs_change+0xb/0xb [ 599.580872] INFO: task kworker/2:1:1184 blocked for more than 120 seconds. [ 599.580872] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 599.580873] kworker/2:1 D ffffffff8180ebc0 0 1184 2 0x00000000 [ 599.580874] ffff880e448cfc90 0000000000000046 ffff880e46cce380 0000000000012280 [ 599.580875] ffff880e448cffd8 ffff880e448ce010 0000000000012280 0000000000012280 [ 599.580876] ffff880e448cffd8 0000000000012280 ffff880e78e5a240 ffff880e46cce380 [ 599.580877] Call Trace: [ 599.580879] [<ffffffff81620334>] schedule+0x24/0x70 [ 599.580884] [<ffffffffa009b4c5>] wait_for_commit+0x55/0x90 [btrfs] [ 599.580886] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 [ 599.580891] [<ffffffffa009d095>] btrfs_commit_transaction+0x655/0xab0 [btrfs] [ 599.580893] [<ffffffff81076058>] ? idle_balance+0x108/0x130 [ 599.580894] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 [ 599.580899] [<ffffffffa009d7f0>] ? btrfs_end_transaction+0x20/0x20 [btrfs] [ 599.580905] [<ffffffffa009d80a>] do_async_commit+0x1a/0x30 [btrfs] [ 599.580906] [<ffffffff81058eff>] process_one_work+0x11f/0x470 [ 599.580907] [<ffffffff8105b128>] worker_thread+0x178/0x400 [ 599.580908] [<ffffffff8105afb0>] ? manage_workers+0x210/0x210 [ 599.580910] [<ffffffff8105fc46>] kthread+0x96/0xa0 [ 599.580911] [<ffffffff81622dd4>] kernel_thread_helper+0x4/0x10 [ 599.580913] [<ffffffff8105fbb0>] ? kthread_worker_fn+0x130/0x130 [ 599.580914] [<ffffffff81622dd0>] ? gs_change+0xb/0xb [ 599.580915] INFO: task kworker/5:1:1186 blocked for more than 120 seconds. [ 599.580915] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 599.580916] kworker/5:1 D ffffffff8180ebc0 0 1186 2 0x00000000 [ 599.580917] ffff880e448a3c90 0000000000000046 ffff880e46d36600 0000000000012280 [ 599.580919] ffff880e448a3fd8 ffff880e448a2010 0000000000012280 0000000000012280 [ 599.580920] ffff880e448a3fd8 0000000000012280 ffff880e78e88300 ffff880e46d36600 [ 599.580920] Call Trace: [ 599.580922] [<ffffffff81620334>] schedule+0x24/0x70 [ 599.580927] [<ffffffffa009b4c5>] wait_for_commit+0x55/0x90 [btrfs] [ 599.580929] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 [ 599.580934] [<ffffffffa009d095>] btrfs_commit_transaction+0x655/0xab0 [btrfs] [ 599.580936] [<ffffffff81076058>] ? idle_balance+0x108/0x130 [ 599.580937] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 [ 599.580942] [<ffffffffa009d7f0>] ? btrfs_end_transaction+0x20/0x20 [btrfs] [ 599.580948] [<ffffffffa009d80a>] do_async_commit+0x1a/0x30 [btrfs] [ 599.580949] [<ffffffff81058eff>] process_one_work+0x11f/0x470 [ 599.580950] [<ffffffff8105b128>] worker_thread+0x178/0x400 [ 599.580951] [<ffffffff8105afb0>] ? manage_workers+0x210/0x210 [ 599.580952] [<ffffffff8105fc46>] kthread+0x96/0xa0 [ 599.580954] [<ffffffff81622dd4>] kernel_thread_helper+0x4/0x10 [ 599.580955] [<ffffffff8105fbb0>] ? kthread_worker_fn+0x130/0x130 [ 599.580957] [<ffffffff81622dd0>] ? gs_change+0xb/0xb [ 599.580966] INFO: task ceph-osd:3248 blocked for more than 120 seconds. [ 599.580966] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 599.580967] ceph-osd D 0000000000000006 0 3248 1 0x00000000 [ 599.580968] ffff880bc4201c38 0000000000000082 ffff880e3efc4480 0000000000012280 [ 599.580970] ffff880bc4201fd8 ffff880bc4200010 0000000000012280 0000000000012280 [ 599.580971] ffff880bc4201fd8 0000000000012280 ffff880e78f4a180 ffff880e3efc4480 [ 599.580971] Call Trace: [ 599.580972] [<ffffffff81620334>] schedule+0x24/0x70 [ 599.580978] [<ffffffffa009dd65>] btrfs_commit_transaction_async+0x1d5/0x240 [btrfs] [ 599.580983] [<ffffffffa0081eb6>] ? block_rsv_add_bytes+0x26/0x60 [btrfs] [ 599.580985] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 [ 599.580990] [<ffffffffa0081f25>] ? block_rsv_migrate_bytes+0x35/0x50 [btrfs] [ 599.580997] [<ffffffffa00ced2e>] btrfs_mksubvol+0x2be/0x350 [btrfs] [ 599.581004] [<ffffffffa00ceec9>] btrfs_ioctl_snap_create_transid+0x109/0x1a0 [btrfs] [ 599.581010] [<ffffffffa00d0bc4>] btrfs_ioctl_snap_create_v2+0x84/0xf0 [btrfs] [ 599.581016] [<ffffffffa00d353f>] btrfs_ioctl+0x76f/0x12d0 [btrfs] [ 599.581019] [<ffffffff8114459a>] ? fsnotify+0x1ba/0x2e0 [ 599.581021] [<ffffffff8111ade3>] do_vfs_ioctl+0x93/0x4f0 [ 599.581022] [<ffffffff8111b28a>] sys_ioctl+0x4a/0x80 [ 599.581024] [<ffffffff81621ba2>] system_call_fastpath+0x16/0x1b [ 599.581035] INFO: task ceph-osd:3278 blocked for more than 120 seconds. [ 599.581035] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 599.581036] ceph-osd D 0000000000000002 0 3278 1 0x00000000 [ 599.581038] ffff880bc41fbc38 0000000000000082 ffff880e412283c0 0000000000012280 [ 599.581039] ffff880bc41fbfd8 ffff880bc41fa010 0000000000012280 0000000000012280 [ 599.581040] ffff880bc41fbfd8 0000000000012280 ffff880e46cce380 ffff880e412283c0 [ 599.581040] Call Trace: [ 599.581042] [<ffffffff81620334>] schedule+0x24/0x70 [ 599.581048] [<ffffffffa009dd65>] btrfs_commit_transaction_async+0x1d5/0x240 [btrfs] [ 599.581053] [<ffffffffa0081eb6>] ? block_rsv_add_bytes+0x26/0x60 [btrfs] [ 599.581055] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 [ 599.581059] [<ffffffffa0081f25>] ? block_rsv_migrate_bytes+0x35/0x50 [btrfs] [ 599.581066] [<ffffffffa00ced2e>] btrfs_mksubvol+0x2be/0x350 [btrfs] [ 599.581071] [<ffffffffa00ceec9>] btrfs_ioctl_snap_create_transid+0x109/0x1a0 [btrfs] [ 599.581077] [<ffffffffa00d0bc4>] btrfs_ioctl_snap_create_v2+0x84/0xf0 [btrfs] [ 599.581082] [<ffffffffa00d353f>] btrfs_ioctl+0x76f/0x12d0 [btrfs] [ 599.581084] [<ffffffff8114459a>] ? fsnotify+0x1ba/0x2e0 [ 599.581085] [<ffffffff8111ade3>] do_vfs_ioctl+0x93/0x4f0 [ 599.581086] [<ffffffff8111b28a>] sys_ioctl+0x4a/0x80 [ 599.581088] [<ffffffff81621ba2>] system_call_fastpath+0x16/0x1b [ 599.581094] INFO: task ceph-osd:3330 blocked for more than 120 seconds. [ 599.581094] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 599.581095] ceph-osd D 0000000000000005 0 3330 1 0x00000000 [ 599.581096] ffff880bc3bf9c38 0000000000000086 ffff880bc3bd4740 0000000000012280 [ 599.581098] ffff880bc3bf9fd8 ffff880bc3bf8010 0000000000012280 0000000000012280 [ 599.581099] ffff880bc3bf9fd8 0000000000012280 ffff880e46d36600 ffff880bc3bd4740 [ 599.581099] Call Trace: [ 599.581101] [<ffffffff81620334>] schedule+0x24/0x70 [ 599.581106] [<ffffffffa009dd65>] btrfs_commit_transaction_async+0x1d5/0x240 [btrfs] [ 599.581111] [<ffffffffa0081eb6>] ? block_rsv_add_bytes+0x26/0x60 [btrfs] [ 599.581113] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 [ 599.581118] [<ffffffffa0081f25>] ? block_rsv_migrate_bytes+0x35/0x50 [btrfs] [ 599.581123] [<ffffffffa00ced2e>] btrfs_mksubvol+0x2be/0x350 [btrfs] [ 599.581129] [<ffffffffa00ceec9>] btrfs_ioctl_snap_create_transid+0x109/0x1a0 [btrfs] [ 599.581134] [<ffffffffa00d0bc4>] btrfs_ioctl_snap_create_v2+0x84/0xf0 [btrfs] [ 599.581139] [<ffffffffa00d353f>] btrfs_ioctl+0x76f/0x12d0 [btrfs] [ 599.581141] [<ffffffff8114459a>] ? fsnotify+0x1ba/0x2e0 [ 599.581142] [<ffffffff8111ade3>] do_vfs_ioctl+0x93/0x4f0 [ 599.581143] [<ffffffff8111b28a>] sys_ioctl+0x4a/0x80 [ 599.581145] [<ffffffff81621ba2>] system_call_fastpath+0x16/0x1b [ 719.395814] INFO: task kworker/6:0:29 blocked for more than 120 seconds. [ 719.417426] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 719.439537] kworker/6:0 D ffffffff8180ebc0 0 29 2 0x00000000 [ 719.461937] ffff880e78f4dc90 0000000000000046 ffff880e78f4a180 0000000000012280 [ 719.484547] ffff880e78f4dfd8 ffff880e78f4c010 0000000000012280 0000000000012280 [ 719.507211] ffff880e78f4dfd8 0000000000012280 ffff880e78e8c340 ffff880e78f4a180 [ 719.530071] Call Trace: [ 719.552729] [<ffffffff81620334>] schedule+0x24/0x70 [ 719.552741] [<ffffffffa009b4c5>] wait_for_commit+0x55/0x90 [btrfs] [ 719.552744] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 [ 719.552752] [<ffffffffa009d095>] btrfs_commit_transaction+0x655/0xab0 [btrfs] [ 719.552755] [<ffffffff81076058>] ? idle_balance+0x108/0x130 [ 719.552756] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 [ 719.552762] [<ffffffffa009d7f0>] ? btrfs_end_transaction+0x20/0x20 [btrfs] [ 719.552768] [<ffffffffa009d80a>] do_async_commit+0x1a/0x30 [btrfs] [ 719.552770] [<ffffffff81058eff>] process_one_work+0x11f/0x470 [ 719.552771] [<ffffffff8105b128>] worker_thread+0x178/0x400 [ 719.552772] [<ffffffff8105afb0>] ? manage_workers+0x210/0x210 [ 719.552774] [<ffffffff8105fc46>] kthread+0x96/0xa0 [ 719.552776] [<ffffffff81622dd4>] kernel_thread_helper+0x4/0x10 [ 719.552777] [<ffffffff8105fbb0>] ? kthread_worker_fn+0x130/0x130 [ 719.552779] [<ffffffff81622dd0>] ? gs_change+0xb/0xb [ 719.552781] INFO: task kworker/3:1:578 blocked for more than 120 seconds. [ 719.552781] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 719.552783] kworker/3:1 D ffffffff8180ebc0 0 578 2 0x00000000 [ 719.552784] ffff880e45dd3c90 0000000000000046 ffff880e7890e080 0000000000012280 [ 719.552786] ffff880e45dd3fd8 ffff880e45dd2010 0000000000012280 0000000000012280 [ 719.552787] ffff880e45dd3fd8 0000000000012280 ffff880e78e7e280 ffff880e7890e080 [ 719.552787] Call Trace: [ 719.552789] [<ffffffff81620334>] schedule+0x24/0x70 [ 719.552795] [<ffffffffa009b4c5>] wait_for_commit+0x55/0x90 [btrfs] [ 719.552797] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 [ 719.552802] [<ffffffffa009d095>] btrfs_commit_transaction+0x655/0xab0 [btrfs] [ 719.552804] [<ffffffff81076058>] ? idle_balance+0x108/0x130 [ 719.552805] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 [ 719.552810] [<ffffffffa009d7f0>] ? btrfs_end_transaction+0x20/0x20 [btrfs] [ 719.552816] [<ffffffffa009d80a>] do_async_commit+0x1a/0x30 [btrfs] [ 719.552817] [<ffffffff81058eff>] process_one_work+0x11f/0x470 [ 719.552818] [<ffffffff8105b128>] worker_thread+0x178/0x400 [ 719.552819] [<ffffffff8105afb0>] ? manage_workers+0x210/0x210 [ 719.552820] [<ffffffff8105fc46>] kthread+0x96/0xa0 [ 719.552822] [<ffffffff81622dd4>] kernel_thread_helper+0x4/0x10 [ 719.552824] [<ffffffff8105fbb0>] ? kthread_worker_fn+0x130/0x130 [ 719.552825] [<ffffffff81622dd0>] ? gs_change+0xb/0xb [ 719.552827] INFO: task kworker/2:1:1184 blocked for more than 120 seconds. [ 719.552827] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 719.552828] kworker/2:1 D ffffffff8180ebc0 0 1184 2 0x00000000 [ 719.552830] ffff880e448cfc90 0000000000000046 ffff880e46cce380 0000000000012280 [ 719.552831] ffff880e448cffd8 ffff880e448ce010 0000000000012280 0000000000012280 [ 719.552832] ffff880e448cffd8 0000000000012280 ffff880e78e5a240 ffff880e46cce380 [ 719.552832] Call Trace: [ 719.552834] [<ffffffff81620334>] schedule+0x24/0x70 [ 719.552839] [<ffffffffa009b4c5>] wait_for_commit+0x55/0x90 [btrfs] [ 719.552841] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 [ 719.552846] [<ffffffffa009d095>] btrfs_commit_transaction+0x655/0xab0 [btrfs] [ 719.552848] [<ffffffff81076058>] ? idle_balance+0x108/0x130 [ 719.552849] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 [ 719.552854] [<ffffffffa009d7f0>] ? btrfs_end_transaction+0x20/0x20 [btrfs] [ 719.552859] [<ffffffffa009d80a>] do_async_commit+0x1a/0x30 [btrfs] [ 719.552861] [<ffffffff81058eff>] process_one_work+0x11f/0x470 [ 719.552862] [<ffffffff8105b128>] worker_thread+0x178/0x400 [ 719.552863] [<ffffffff8105afb0>] ? manage_workers+0x210/0x210 [ 719.552864] [<ffffffff8105fc46>] kthread+0x96/0xa0 [ 719.552866] [<ffffffff81622dd4>] kernel_thread_helper+0x4/0x10 [ 719.552867] [<ffffffff8105fbb0>] ? kthread_worker_fn+0x130/0x130 [ 719.552868] [<ffffffff81622dd0>] ? gs_change+0xb/0xb [ 719.552869] INFO: task kworker/5:1:1186 blocked for more than 120 seconds. [ 719.552869] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 719.552870] kworker/5:1 D ffffffff8180ebc0 0 1186 2 0x00000000 [ 719.552872] ffff880e448a3c90 0000000000000046 ffff880e46d36600 0000000000012280 [ 719.552873] ffff880e448a3fd8 ffff880e448a2010 0000000000012280 0000000000012280 [ 719.552874] ffff880e448a3fd8 0000000000012280 ffff880e78e88300 ffff880e46d36600 [ 719.552874] Call Trace: [ 719.552876] [<ffffffff81620334>] schedule+0x24/0x70 [ 719.552881] [<ffffffffa009b4c5>] wait_for_commit+0x55/0x90 [btrfs] [ 719.552883] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 [ 719.552888] [<ffffffffa009d095>] btrfs_commit_transaction+0x655/0xab0 [btrfs] [ 719.552890] [<ffffffff81076058>] ? idle_balance+0x108/0x130 [ 719.552891] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 [ 719.552896] [<ffffffffa009d7f0>] ? btrfs_end_transaction+0x20/0x20 [btrfs] [ 719.552901] [<ffffffffa009d80a>] do_async_commit+0x1a/0x30 [btrfs] [ 719.552902] [<ffffffff81058eff>] process_one_work+0x11f/0x470 [ 719.552903] [<ffffffff8105b128>] worker_thread+0x178/0x400 [ 719.552904] [<ffffffff8105afb0>] ? manage_workers+0x210/0x210 [ 719.552906] [<ffffffff8105fc46>] kthread+0x96/0xa0 [ 719.552907] [<ffffffff81622dd4>] kernel_thread_helper+0x4/0x10 [ 719.552909] [<ffffffff8105fbb0>] ? kthread_worker_fn+0x130/0x130 [ 719.552910] [<ffffffff81622dd0>] ? gs_change+0xb/0xb Stefan -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Stefan, Could you please give us details about your setup, especially what kernel/btrfs version. If you aren''t on the most recent, it would be great if you could update and see if you can still reproduce, as there was a major change in btrfs recently for Ceph. I''m not sure what version it made it into. Regards, Michael On Sat, Jun 23, 2012 at 3:50 AM, Stefan Priebe <s.priebe@profihost.ag> wrote:> > Hello list, > > i''ve seen this deadlock today and can reproduce it while using ceph. > > Is this a known bug? > > [ 599.514534] INFO: task kworker/6:0:29 blocked for more than 120 > seconds. > [ 599.530394] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables > this message. > [ 599.546832] kworker/6:0 D ffffffff8180ebc0 0 29 2 > 0x00000000 > [ 599.563725] ffff880e78f4dc90 0000000000000046 ffff880e78f4a180 > 0000000000012280 > [ 599.580812] ffff880e78f4dfd8 ffff880e78f4c010 0000000000012280 > 0000000000012280 > [ 599.580813] ffff880e78f4dfd8 0000000000012280 ffff880e78e8c340 > ffff880e78f4a180 > [ 599.580814] Call Trace: > [ 599.580820] [<ffffffff81620334>] schedule+0x24/0x70 > [ 599.580832] [<ffffffffa009b4c5>] wait_for_commit+0x55/0x90 [btrfs] > [ 599.580836] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 > [ 599.580844] [<ffffffffa009d095>] btrfs_commit_transaction+0x655/0xab0 > [btrfs] > [ 599.580846] [<ffffffff81076058>] ? idle_balance+0x108/0x130 > [ 599.580847] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 > [ 599.580853] [<ffffffffa009d7f0>] ? btrfs_end_transaction+0x20/0x20 > [btrfs] > [ 599.580859] [<ffffffffa009d80a>] do_async_commit+0x1a/0x30 [btrfs] > [ 599.580861] [<ffffffff81058eff>] process_one_work+0x11f/0x470 > [ 599.580862] [<ffffffff8105b128>] worker_thread+0x178/0x400 > [ 599.580863] [<ffffffff8105afb0>] ? manage_workers+0x210/0x210 > [ 599.580865] [<ffffffff8105fc46>] kthread+0x96/0xa0 > [ 599.580866] [<ffffffff81622dd4>] kernel_thread_helper+0x4/0x10 > [ 599.580868] [<ffffffff8105fbb0>] ? kthread_worker_fn+0x130/0x130 > [ 599.580869] [<ffffffff81622dd0>] ? gs_change+0xb/0xb > [ 599.580872] INFO: task kworker/2:1:1184 blocked for more than 120 > seconds. > [ 599.580872] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables > this message. > [ 599.580873] kworker/2:1 D ffffffff8180ebc0 0 1184 2 > 0x00000000 > [ 599.580874] ffff880e448cfc90 0000000000000046 ffff880e46cce380 > 0000000000012280 > [ 599.580875] ffff880e448cffd8 ffff880e448ce010 0000000000012280 > 0000000000012280 > [ 599.580876] ffff880e448cffd8 0000000000012280 ffff880e78e5a240 > ffff880e46cce380 > [ 599.580877] Call Trace: > [ 599.580879] [<ffffffff81620334>] schedule+0x24/0x70 > [ 599.580884] [<ffffffffa009b4c5>] wait_for_commit+0x55/0x90 [btrfs] > [ 599.580886] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 > [ 599.580891] [<ffffffffa009d095>] btrfs_commit_transaction+0x655/0xab0 > [btrfs] > [ 599.580893] [<ffffffff81076058>] ? idle_balance+0x108/0x130 > [ 599.580894] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 > [ 599.580899] [<ffffffffa009d7f0>] ? btrfs_end_transaction+0x20/0x20 > [btrfs] > [ 599.580905] [<ffffffffa009d80a>] do_async_commit+0x1a/0x30 [btrfs] > [ 599.580906] [<ffffffff81058eff>] process_one_work+0x11f/0x470 > [ 599.580907] [<ffffffff8105b128>] worker_thread+0x178/0x400 > [ 599.580908] [<ffffffff8105afb0>] ? manage_workers+0x210/0x210 > [ 599.580910] [<ffffffff8105fc46>] kthread+0x96/0xa0 > [ 599.580911] [<ffffffff81622dd4>] kernel_thread_helper+0x4/0x10 > [ 599.580913] [<ffffffff8105fbb0>] ? kthread_worker_fn+0x130/0x130 > [ 599.580914] [<ffffffff81622dd0>] ? gs_change+0xb/0xb > [ 599.580915] INFO: task kworker/5:1:1186 blocked for more than 120 > seconds. > [ 599.580915] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables > this message. > [ 599.580916] kworker/5:1 D ffffffff8180ebc0 0 1186 2 > 0x00000000 > [ 599.580917] ffff880e448a3c90 0000000000000046 ffff880e46d36600 > 0000000000012280 > [ 599.580919] ffff880e448a3fd8 ffff880e448a2010 0000000000012280 > 0000000000012280 > [ 599.580920] ffff880e448a3fd8 0000000000012280 ffff880e78e88300 > ffff880e46d36600 > [ 599.580920] Call Trace: > [ 599.580922] [<ffffffff81620334>] schedule+0x24/0x70 > [ 599.580927] [<ffffffffa009b4c5>] wait_for_commit+0x55/0x90 [btrfs] > [ 599.580929] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 > [ 599.580934] [<ffffffffa009d095>] btrfs_commit_transaction+0x655/0xab0 > [btrfs] > [ 599.580936] [<ffffffff81076058>] ? idle_balance+0x108/0x130 > [ 599.580937] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 > [ 599.580942] [<ffffffffa009d7f0>] ? btrfs_end_transaction+0x20/0x20 > [btrfs] > [ 599.580948] [<ffffffffa009d80a>] do_async_commit+0x1a/0x30 [btrfs] > [ 599.580949] [<ffffffff81058eff>] process_one_work+0x11f/0x470 > [ 599.580950] [<ffffffff8105b128>] worker_thread+0x178/0x400 > [ 599.580951] [<ffffffff8105afb0>] ? manage_workers+0x210/0x210 > [ 599.580952] [<ffffffff8105fc46>] kthread+0x96/0xa0 > [ 599.580954] [<ffffffff81622dd4>] kernel_thread_helper+0x4/0x10 > [ 599.580955] [<ffffffff8105fbb0>] ? kthread_worker_fn+0x130/0x130 > [ 599.580957] [<ffffffff81622dd0>] ? gs_change+0xb/0xb > [ 599.580966] INFO: task ceph-osd:3248 blocked for more than 120 seconds. > [ 599.580966] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables > this message. > [ 599.580967] ceph-osd D 0000000000000006 0 3248 1 > 0x00000000 > [ 599.580968] ffff880bc4201c38 0000000000000082 ffff880e3efc4480 > 0000000000012280 > [ 599.580970] ffff880bc4201fd8 ffff880bc4200010 0000000000012280 > 0000000000012280 > [ 599.580971] ffff880bc4201fd8 0000000000012280 ffff880e78f4a180 > ffff880e3efc4480 > [ 599.580971] Call Trace: > [ 599.580972] [<ffffffff81620334>] schedule+0x24/0x70 > [ 599.580978] [<ffffffffa009dd65>] > btrfs_commit_transaction_async+0x1d5/0x240 [btrfs] > [ 599.580983] [<ffffffffa0081eb6>] ? block_rsv_add_bytes+0x26/0x60 > [btrfs] > [ 599.580985] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 > [ 599.580990] [<ffffffffa0081f25>] ? block_rsv_migrate_bytes+0x35/0x50 > [btrfs] > [ 599.580997] [<ffffffffa00ced2e>] btrfs_mksubvol+0x2be/0x350 [btrfs] > [ 599.581004] [<ffffffffa00ceec9>] > btrfs_ioctl_snap_create_transid+0x109/0x1a0 [btrfs] > [ 599.581010] [<ffffffffa00d0bc4>] btrfs_ioctl_snap_create_v2+0x84/0xf0 > [btrfs] > [ 599.581016] [<ffffffffa00d353f>] btrfs_ioctl+0x76f/0x12d0 [btrfs] > [ 599.581019] [<ffffffff8114459a>] ? fsnotify+0x1ba/0x2e0 > [ 599.581021] [<ffffffff8111ade3>] do_vfs_ioctl+0x93/0x4f0 > [ 599.581022] [<ffffffff8111b28a>] sys_ioctl+0x4a/0x80 > [ 599.581024] [<ffffffff81621ba2>] system_call_fastpath+0x16/0x1b > [ 599.581035] INFO: task ceph-osd:3278 blocked for more than 120 seconds. > [ 599.581035] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables > this message. > [ 599.581036] ceph-osd D 0000000000000002 0 3278 1 > 0x00000000 > [ 599.581038] ffff880bc41fbc38 0000000000000082 ffff880e412283c0 > 0000000000012280 > [ 599.581039] ffff880bc41fbfd8 ffff880bc41fa010 0000000000012280 > 0000000000012280 > [ 599.581040] ffff880bc41fbfd8 0000000000012280 ffff880e46cce380 > ffff880e412283c0 > [ 599.581040] Call Trace: > [ 599.581042] [<ffffffff81620334>] schedule+0x24/0x70 > [ 599.581048] [<ffffffffa009dd65>] > btrfs_commit_transaction_async+0x1d5/0x240 [btrfs] > [ 599.581053] [<ffffffffa0081eb6>] ? block_rsv_add_bytes+0x26/0x60 > [btrfs] > [ 599.581055] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 > [ 599.581059] [<ffffffffa0081f25>] ? block_rsv_migrate_bytes+0x35/0x50 > [btrfs] > [ 599.581066] [<ffffffffa00ced2e>] btrfs_mksubvol+0x2be/0x350 [btrfs] > [ 599.581071] [<ffffffffa00ceec9>] > btrfs_ioctl_snap_create_transid+0x109/0x1a0 [btrfs] > [ 599.581077] [<ffffffffa00d0bc4>] btrfs_ioctl_snap_create_v2+0x84/0xf0 > [btrfs] > [ 599.581082] [<ffffffffa00d353f>] btrfs_ioctl+0x76f/0x12d0 [btrfs] > [ 599.581084] [<ffffffff8114459a>] ? fsnotify+0x1ba/0x2e0 > [ 599.581085] [<ffffffff8111ade3>] do_vfs_ioctl+0x93/0x4f0 > [ 599.581086] [<ffffffff8111b28a>] sys_ioctl+0x4a/0x80 > [ 599.581088] [<ffffffff81621ba2>] system_call_fastpath+0x16/0x1b > [ 599.581094] INFO: task ceph-osd:3330 blocked for more than 120 seconds. > [ 599.581094] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables > this message. > [ 599.581095] ceph-osd D 0000000000000005 0 3330 1 > 0x00000000 > [ 599.581096] ffff880bc3bf9c38 0000000000000086 ffff880bc3bd4740 > 0000000000012280 > [ 599.581098] ffff880bc3bf9fd8 ffff880bc3bf8010 0000000000012280 > 0000000000012280 > [ 599.581099] ffff880bc3bf9fd8 0000000000012280 ffff880e46d36600 > ffff880bc3bd4740 > [ 599.581099] Call Trace: > [ 599.581101] [<ffffffff81620334>] schedule+0x24/0x70 > [ 599.581106] [<ffffffffa009dd65>] > btrfs_commit_transaction_async+0x1d5/0x240 [btrfs] > [ 599.581111] [<ffffffffa0081eb6>] ? block_rsv_add_bytes+0x26/0x60 > [btrfs] > [ 599.581113] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 > [ 599.581118] [<ffffffffa0081f25>] ? block_rsv_migrate_bytes+0x35/0x50 > [btrfs] > [ 599.581123] [<ffffffffa00ced2e>] btrfs_mksubvol+0x2be/0x350 [btrfs] > [ 599.581129] [<ffffffffa00ceec9>] > btrfs_ioctl_snap_create_transid+0x109/0x1a0 [btrfs] > [ 599.581134] [<ffffffffa00d0bc4>] btrfs_ioctl_snap_create_v2+0x84/0xf0 > [btrfs] > [ 599.581139] [<ffffffffa00d353f>] btrfs_ioctl+0x76f/0x12d0 [btrfs] > [ 599.581141] [<ffffffff8114459a>] ? fsnotify+0x1ba/0x2e0 > [ 599.581142] [<ffffffff8111ade3>] do_vfs_ioctl+0x93/0x4f0 > [ 599.581143] [<ffffffff8111b28a>] sys_ioctl+0x4a/0x80 > [ 599.581145] [<ffffffff81621ba2>] system_call_fastpath+0x16/0x1b > [ 719.395814] INFO: task kworker/6:0:29 blocked for more than 120 > seconds. > [ 719.417426] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables > this message. > [ 719.439537] kworker/6:0 D ffffffff8180ebc0 0 29 2 > 0x00000000 > [ 719.461937] ffff880e78f4dc90 0000000000000046 ffff880e78f4a180 > 0000000000012280 > [ 719.484547] ffff880e78f4dfd8 ffff880e78f4c010 0000000000012280 > 0000000000012280 > [ 719.507211] ffff880e78f4dfd8 0000000000012280 ffff880e78e8c340 > ffff880e78f4a180 > [ 719.530071] Call Trace: > [ 719.552729] [<ffffffff81620334>] schedule+0x24/0x70 > [ 719.552741] [<ffffffffa009b4c5>] wait_for_commit+0x55/0x90 [btrfs] > [ 719.552744] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 > [ 719.552752] [<ffffffffa009d095>] btrfs_commit_transaction+0x655/0xab0 > [btrfs] > [ 719.552755] [<ffffffff81076058>] ? idle_balance+0x108/0x130 > [ 719.552756] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 > [ 719.552762] [<ffffffffa009d7f0>] ? btrfs_end_transaction+0x20/0x20 > [btrfs] > [ 719.552768] [<ffffffffa009d80a>] do_async_commit+0x1a/0x30 [btrfs] > [ 719.552770] [<ffffffff81058eff>] process_one_work+0x11f/0x470 > [ 719.552771] [<ffffffff8105b128>] worker_thread+0x178/0x400 > [ 719.552772] [<ffffffff8105afb0>] ? manage_workers+0x210/0x210 > [ 719.552774] [<ffffffff8105fc46>] kthread+0x96/0xa0 > [ 719.552776] [<ffffffff81622dd4>] kernel_thread_helper+0x4/0x10 > [ 719.552777] [<ffffffff8105fbb0>] ? kthread_worker_fn+0x130/0x130 > [ 719.552779] [<ffffffff81622dd0>] ? gs_change+0xb/0xb > [ 719.552781] INFO: task kworker/3:1:578 blocked for more than 120 > seconds. > [ 719.552781] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables > this message. > [ 719.552783] kworker/3:1 D ffffffff8180ebc0 0 578 2 > 0x00000000 > [ 719.552784] ffff880e45dd3c90 0000000000000046 ffff880e7890e080 > 0000000000012280 > [ 719.552786] ffff880e45dd3fd8 ffff880e45dd2010 0000000000012280 > 0000000000012280 > [ 719.552787] ffff880e45dd3fd8 0000000000012280 ffff880e78e7e280 > ffff880e7890e080 > [ 719.552787] Call Trace: > [ 719.552789] [<ffffffff81620334>] schedule+0x24/0x70 > [ 719.552795] [<ffffffffa009b4c5>] wait_for_commit+0x55/0x90 [btrfs] > [ 719.552797] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 > [ 719.552802] [<ffffffffa009d095>] btrfs_commit_transaction+0x655/0xab0 > [btrfs] > [ 719.552804] [<ffffffff81076058>] ? idle_balance+0x108/0x130 > [ 719.552805] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 > [ 719.552810] [<ffffffffa009d7f0>] ? btrfs_end_transaction+0x20/0x20 > [btrfs] > [ 719.552816] [<ffffffffa009d80a>] do_async_commit+0x1a/0x30 [btrfs] > [ 719.552817] [<ffffffff81058eff>] process_one_work+0x11f/0x470 > [ 719.552818] [<ffffffff8105b128>] worker_thread+0x178/0x400 > [ 719.552819] [<ffffffff8105afb0>] ? manage_workers+0x210/0x210 > [ 719.552820] [<ffffffff8105fc46>] kthread+0x96/0xa0 > [ 719.552822] [<ffffffff81622dd4>] kernel_thread_helper+0x4/0x10 > [ 719.552824] [<ffffffff8105fbb0>] ? kthread_worker_fn+0x130/0x130 > [ 719.552825] [<ffffffff81622dd0>] ? gs_change+0xb/0xb > [ 719.552827] INFO: task kworker/2:1:1184 blocked for more than 120 > seconds. > [ 719.552827] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables > this message. > [ 719.552828] kworker/2:1 D ffffffff8180ebc0 0 1184 2 > 0x00000000 > [ 719.552830] ffff880e448cfc90 0000000000000046 ffff880e46cce380 > 0000000000012280 > [ 719.552831] ffff880e448cffd8 ffff880e448ce010 0000000000012280 > 0000000000012280 > [ 719.552832] ffff880e448cffd8 0000000000012280 ffff880e78e5a240 > ffff880e46cce380 > [ 719.552832] Call Trace: > [ 719.552834] [<ffffffff81620334>] schedule+0x24/0x70 > [ 719.552839] [<ffffffffa009b4c5>] wait_for_commit+0x55/0x90 [btrfs] > [ 719.552841] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 > [ 719.552846] [<ffffffffa009d095>] btrfs_commit_transaction+0x655/0xab0 > [btrfs] > [ 719.552848] [<ffffffff81076058>] ? idle_balance+0x108/0x130 > [ 719.552849] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 > [ 719.552854] [<ffffffffa009d7f0>] ? btrfs_end_transaction+0x20/0x20 > [btrfs] > [ 719.552859] [<ffffffffa009d80a>] do_async_commit+0x1a/0x30 [btrfs] > [ 719.552861] [<ffffffff81058eff>] process_one_work+0x11f/0x470 > [ 719.552862] [<ffffffff8105b128>] worker_thread+0x178/0x400 > [ 719.552863] [<ffffffff8105afb0>] ? manage_workers+0x210/0x210 > [ 719.552864] [<ffffffff8105fc46>] kthread+0x96/0xa0 > [ 719.552866] [<ffffffff81622dd4>] kernel_thread_helper+0x4/0x10 > [ 719.552867] [<ffffffff8105fbb0>] ? kthread_worker_fn+0x130/0x130 > [ 719.552868] [<ffffffff81622dd0>] ? gs_change+0xb/0xb > [ 719.552869] INFO: task kworker/5:1:1186 blocked for more than 120 > seconds. > [ 719.552869] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables > this message. > [ 719.552870] kworker/5:1 D ffffffff8180ebc0 0 1186 2 > 0x00000000 > [ 719.552872] ffff880e448a3c90 0000000000000046 ffff880e46d36600 > 0000000000012280 > [ 719.552873] ffff880e448a3fd8 ffff880e448a2010 0000000000012280 > 0000000000012280 > [ 719.552874] ffff880e448a3fd8 0000000000012280 ffff880e78e88300 > ffff880e46d36600 > [ 719.552874] Call Trace: > [ 719.552876] [<ffffffff81620334>] schedule+0x24/0x70 > [ 719.552881] [<ffffffffa009b4c5>] wait_for_commit+0x55/0x90 [btrfs] > [ 719.552883] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 > [ 719.552888] [<ffffffffa009d095>] btrfs_commit_transaction+0x655/0xab0 > [btrfs] > [ 719.552890] [<ffffffff81076058>] ? idle_balance+0x108/0x130 > [ 719.552891] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 > [ 719.552896] [<ffffffffa009d7f0>] ? btrfs_end_transaction+0x20/0x20 > [btrfs] > [ 719.552901] [<ffffffffa009d80a>] do_async_commit+0x1a/0x30 [btrfs] > [ 719.552902] [<ffffffff81058eff>] process_one_work+0x11f/0x470 > [ 719.552903] [<ffffffff8105b128>] worker_thread+0x178/0x400 > [ 719.552904] [<ffffffff8105afb0>] ? manage_workers+0x210/0x210 > [ 719.552906] [<ffffffff8105fc46>] kthread+0x96/0xa0 > [ 719.552907] [<ffffffff81622dd4>] kernel_thread_helper+0x4/0x10 > [ 719.552909] [<ffffffff8105fbb0>] ? kthread_worker_fn+0x130/0x130 > [ 719.552910] [<ffffffff81622dd0>] ? gs_change+0xb/0xb > > Stefan > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
I''m on linux kernel 3.5.0-rc3 like I mentioned in subject ;-). Kernel.org says this is the latest. Greets Stefan Am 23.06.2012 um 15:46 schrieb Michael <mike@draftx.net>:> Hi Stefan, > Could you please give us details about your setup, especially what > kernel/btrfs version. If you aren''t on the most recent, it would be > great if you could update and see if you can still reproduce, as there > was a major change in btrfs recently for Ceph. I''m not sure what > version it made it into. > Regards, > Michael > > On Sat, Jun 23, 2012 at 3:50 AM, Stefan Priebe <s.priebe@profihost.ag> wrote: >> >> Hello list, >> >> i''ve seen this deadlock today and can reproduce it while using ceph. >> >> Is this a known bug? >> >> [ 599.514534] INFO: task kworker/6:0:29 blocked for more than 120 >> seconds. >> [ 599.530394] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables >> this message. >> [ 599.546832] kworker/6:0 D ffffffff8180ebc0 0 29 2 >> 0x00000000 >> [ 599.563725] ffff880e78f4dc90 0000000000000046 ffff880e78f4a180 >> 0000000000012280 >> [ 599.580812] ffff880e78f4dfd8 ffff880e78f4c010 0000000000012280 >> 0000000000012280 >> [ 599.580813] ffff880e78f4dfd8 0000000000012280 ffff880e78e8c340 >> ffff880e78f4a180 >> [ 599.580814] Call Trace: >> [ 599.580820] [<ffffffff81620334>] schedule+0x24/0x70 >> [ 599.580832] [<ffffffffa009b4c5>] wait_for_commit+0x55/0x90 [btrfs] >> [ 599.580836] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 >> [ 599.580844] [<ffffffffa009d095>] btrfs_commit_transaction+0x655/0xab0 >> [btrfs] >> [ 599.580846] [<ffffffff81076058>] ? idle_balance+0x108/0x130 >> [ 599.580847] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 >> [ 599.580853] [<ffffffffa009d7f0>] ? btrfs_end_transaction+0x20/0x20 >> [btrfs] >> [ 599.580859] [<ffffffffa009d80a>] do_async_commit+0x1a/0x30 [btrfs] >> [ 599.580861] [<ffffffff81058eff>] process_one_work+0x11f/0x470 >> [ 599.580862] [<ffffffff8105b128>] worker_thread+0x178/0x400 >> [ 599.580863] [<ffffffff8105afb0>] ? manage_workers+0x210/0x210 >> [ 599.580865] [<ffffffff8105fc46>] kthread+0x96/0xa0 >> [ 599.580866] [<ffffffff81622dd4>] kernel_thread_helper+0x4/0x10 >> [ 599.580868] [<ffffffff8105fbb0>] ? kthread_worker_fn+0x130/0x130 >> [ 599.580869] [<ffffffff81622dd0>] ? gs_change+0xb/0xb >> [ 599.580872] INFO: task kworker/2:1:1184 blocked for more than 120 >> seconds. >> [ 599.580872] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables >> this message. >> [ 599.580873] kworker/2:1 D ffffffff8180ebc0 0 1184 2 >> 0x00000000 >> [ 599.580874] ffff880e448cfc90 0000000000000046 ffff880e46cce380 >> 0000000000012280 >> [ 599.580875] ffff880e448cffd8 ffff880e448ce010 0000000000012280 >> 0000000000012280 >> [ 599.580876] ffff880e448cffd8 0000000000012280 ffff880e78e5a240 >> ffff880e46cce380 >> [ 599.580877] Call Trace: >> [ 599.580879] [<ffffffff81620334>] schedule+0x24/0x70 >> [ 599.580884] [<ffffffffa009b4c5>] wait_for_commit+0x55/0x90 [btrfs] >> [ 599.580886] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 >> [ 599.580891] [<ffffffffa009d095>] btrfs_commit_transaction+0x655/0xab0 >> [btrfs] >> [ 599.580893] [<ffffffff81076058>] ? idle_balance+0x108/0x130 >> [ 599.580894] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 >> [ 599.580899] [<ffffffffa009d7f0>] ? btrfs_end_transaction+0x20/0x20 >> [btrfs] >> [ 599.580905] [<ffffffffa009d80a>] do_async_commit+0x1a/0x30 [btrfs] >> [ 599.580906] [<ffffffff81058eff>] process_one_work+0x11f/0x470 >> [ 599.580907] [<ffffffff8105b128>] worker_thread+0x178/0x400 >> [ 599.580908] [<ffffffff8105afb0>] ? manage_workers+0x210/0x210 >> [ 599.580910] [<ffffffff8105fc46>] kthread+0x96/0xa0 >> [ 599.580911] [<ffffffff81622dd4>] kernel_thread_helper+0x4/0x10 >> [ 599.580913] [<ffffffff8105fbb0>] ? kthread_worker_fn+0x130/0x130 >> [ 599.580914] [<ffffffff81622dd0>] ? gs_change+0xb/0xb >> [ 599.580915] INFO: task kworker/5:1:1186 blocked for more than 120 >> seconds. >> [ 599.580915] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables >> this message. >> [ 599.580916] kworker/5:1 D ffffffff8180ebc0 0 1186 2 >> 0x00000000 >> [ 599.580917] ffff880e448a3c90 0000000000000046 ffff880e46d36600 >> 0000000000012280 >> [ 599.580919] ffff880e448a3fd8 ffff880e448a2010 0000000000012280 >> 0000000000012280 >> [ 599.580920] ffff880e448a3fd8 0000000000012280 ffff880e78e88300 >> ffff880e46d36600 >> [ 599.580920] Call Trace: >> [ 599.580922] [<ffffffff81620334>] schedule+0x24/0x70 >> [ 599.580927] [<ffffffffa009b4c5>] wait_for_commit+0x55/0x90 [btrfs] >> [ 599.580929] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 >> [ 599.580934] [<ffffffffa009d095>] btrfs_commit_transaction+0x655/0xab0 >> [btrfs] >> [ 599.580936] [<ffffffff81076058>] ? idle_balance+0x108/0x130 >> [ 599.580937] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 >> [ 599.580942] [<ffffffffa009d7f0>] ? btrfs_end_transaction+0x20/0x20 >> [btrfs] >> [ 599.580948] [<ffffffffa009d80a>] do_async_commit+0x1a/0x30 [btrfs] >> [ 599.580949] [<ffffffff81058eff>] process_one_work+0x11f/0x470 >> [ 599.580950] [<ffffffff8105b128>] worker_thread+0x178/0x400 >> [ 599.580951] [<ffffffff8105afb0>] ? manage_workers+0x210/0x210 >> [ 599.580952] [<ffffffff8105fc46>] kthread+0x96/0xa0 >> [ 599.580954] [<ffffffff81622dd4>] kernel_thread_helper+0x4/0x10 >> [ 599.580955] [<ffffffff8105fbb0>] ? kthread_worker_fn+0x130/0x130 >> [ 599.580957] [<ffffffff81622dd0>] ? gs_change+0xb/0xb >> [ 599.580966] INFO: task ceph-osd:3248 blocked for more than 120 seconds. >> [ 599.580966] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables >> this message. >> [ 599.580967] ceph-osd D 0000000000000006 0 3248 1 >> 0x00000000 >> [ 599.580968] ffff880bc4201c38 0000000000000082 ffff880e3efc4480 >> 0000000000012280 >> [ 599.580970] ffff880bc4201fd8 ffff880bc4200010 0000000000012280 >> 0000000000012280 >> [ 599.580971] ffff880bc4201fd8 0000000000012280 ffff880e78f4a180 >> ffff880e3efc4480 >> [ 599.580971] Call Trace: >> [ 599.580972] [<ffffffff81620334>] schedule+0x24/0x70 >> [ 599.580978] [<ffffffffa009dd65>] >> btrfs_commit_transaction_async+0x1d5/0x240 [btrfs] >> [ 599.580983] [<ffffffffa0081eb6>] ? block_rsv_add_bytes+0x26/0x60 >> [btrfs] >> [ 599.580985] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 >> [ 599.580990] [<ffffffffa0081f25>] ? block_rsv_migrate_bytes+0x35/0x50 >> [btrfs] >> [ 599.580997] [<ffffffffa00ced2e>] btrfs_mksubvol+0x2be/0x350 [btrfs] >> [ 599.581004] [<ffffffffa00ceec9>] >> btrfs_ioctl_snap_create_transid+0x109/0x1a0 [btrfs] >> [ 599.581010] [<ffffffffa00d0bc4>] btrfs_ioctl_snap_create_v2+0x84/0xf0 >> [btrfs] >> [ 599.581016] [<ffffffffa00d353f>] btrfs_ioctl+0x76f/0x12d0 [btrfs] >> [ 599.581019] [<ffffffff8114459a>] ? fsnotify+0x1ba/0x2e0 >> [ 599.581021] [<ffffffff8111ade3>] do_vfs_ioctl+0x93/0x4f0 >> [ 599.581022] [<ffffffff8111b28a>] sys_ioctl+0x4a/0x80 >> [ 599.581024] [<ffffffff81621ba2>] system_call_fastpath+0x16/0x1b >> [ 599.581035] INFO: task ceph-osd:3278 blocked for more than 120 seconds. >> [ 599.581035] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables >> this message. >> [ 599.581036] ceph-osd D 0000000000000002 0 3278 1 >> 0x00000000 >> [ 599.581038] ffff880bc41fbc38 0000000000000082 ffff880e412283c0 >> 0000000000012280 >> [ 599.581039] ffff880bc41fbfd8 ffff880bc41fa010 0000000000012280 >> 0000000000012280 >> [ 599.581040] ffff880bc41fbfd8 0000000000012280 ffff880e46cce380 >> ffff880e412283c0 >> [ 599.581040] Call Trace: >> [ 599.581042] [<ffffffff81620334>] schedule+0x24/0x70 >> [ 599.581048] [<ffffffffa009dd65>] >> btrfs_commit_transaction_async+0x1d5/0x240 [btrfs] >> [ 599.581053] [<ffffffffa0081eb6>] ? block_rsv_add_bytes+0x26/0x60 >> [btrfs] >> [ 599.581055] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 >> [ 599.581059] [<ffffffffa0081f25>] ? block_rsv_migrate_bytes+0x35/0x50 >> [btrfs] >> [ 599.581066] [<ffffffffa00ced2e>] btrfs_mksubvol+0x2be/0x350 [btrfs] >> [ 599.581071] [<ffffffffa00ceec9>] >> btrfs_ioctl_snap_create_transid+0x109/0x1a0 [btrfs] >> [ 599.581077] [<ffffffffa00d0bc4>] btrfs_ioctl_snap_create_v2+0x84/0xf0 >> [btrfs] >> [ 599.581082] [<ffffffffa00d353f>] btrfs_ioctl+0x76f/0x12d0 [btrfs] >> [ 599.581084] [<ffffffff8114459a>] ? fsnotify+0x1ba/0x2e0 >> [ 599.581085] [<ffffffff8111ade3>] do_vfs_ioctl+0x93/0x4f0 >> [ 599.581086] [<ffffffff8111b28a>] sys_ioctl+0x4a/0x80 >> [ 599.581088] [<ffffffff81621ba2>] system_call_fastpath+0x16/0x1b >> [ 599.581094] INFO: task ceph-osd:3330 blocked for more than 120 seconds. >> [ 599.581094] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables >> this message. >> [ 599.581095] ceph-osd D 0000000000000005 0 3330 1 >> 0x00000000 >> [ 599.581096] ffff880bc3bf9c38 0000000000000086 ffff880bc3bd4740 >> 0000000000012280 >> [ 599.581098] ffff880bc3bf9fd8 ffff880bc3bf8010 0000000000012280 >> 0000000000012280 >> [ 599.581099] ffff880bc3bf9fd8 0000000000012280 ffff880e46d36600 >> ffff880bc3bd4740 >> [ 599.581099] Call Trace: >> [ 599.581101] [<ffffffff81620334>] schedule+0x24/0x70 >> [ 599.581106] [<ffffffffa009dd65>] >> btrfs_commit_transaction_async+0x1d5/0x240 [btrfs] >> [ 599.581111] [<ffffffffa0081eb6>] ? block_rsv_add_bytes+0x26/0x60 >> [btrfs] >> [ 599.581113] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 >> [ 599.581118] [<ffffffffa0081f25>] ? block_rsv_migrate_bytes+0x35/0x50 >> [btrfs] >> [ 599.581123] [<ffffffffa00ced2e>] btrfs_mksubvol+0x2be/0x350 [btrfs] >> [ 599.581129] [<ffffffffa00ceec9>] >> btrfs_ioctl_snap_create_transid+0x109/0x1a0 [btrfs] >> [ 599.581134] [<ffffffffa00d0bc4>] btrfs_ioctl_snap_create_v2+0x84/0xf0 >> [btrfs] >> [ 599.581139] [<ffffffffa00d353f>] btrfs_ioctl+0x76f/0x12d0 [btrfs] >> [ 599.581141] [<ffffffff8114459a>] ? fsnotify+0x1ba/0x2e0 >> [ 599.581142] [<ffffffff8111ade3>] do_vfs_ioctl+0x93/0x4f0 >> [ 599.581143] [<ffffffff8111b28a>] sys_ioctl+0x4a/0x80 >> [ 599.581145] [<ffffffff81621ba2>] system_call_fastpath+0x16/0x1b >> [ 719.395814] INFO: task kworker/6:0:29 blocked for more than 120 >> seconds. >> [ 719.417426] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables >> this message. >> [ 719.439537] kworker/6:0 D ffffffff8180ebc0 0 29 2 >> 0x00000000 >> [ 719.461937] ffff880e78f4dc90 0000000000000046 ffff880e78f4a180 >> 0000000000012280 >> [ 719.484547] ffff880e78f4dfd8 ffff880e78f4c010 0000000000012280 >> 0000000000012280 >> [ 719.507211] ffff880e78f4dfd8 0000000000012280 ffff880e78e8c340 >> ffff880e78f4a180 >> [ 719.530071] Call Trace: >> [ 719.552729] [<ffffffff81620334>] schedule+0x24/0x70 >> [ 719.552741] [<ffffffffa009b4c5>] wait_for_commit+0x55/0x90 [btrfs] >> [ 719.552744] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 >> [ 719.552752] [<ffffffffa009d095>] btrfs_commit_transaction+0x655/0xab0 >> [btrfs] >> [ 719.552755] [<ffffffff81076058>] ? idle_balance+0x108/0x130 >> [ 719.552756] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 >> [ 719.552762] [<ffffffffa009d7f0>] ? btrfs_end_transaction+0x20/0x20 >> [btrfs] >> [ 719.552768] [<ffffffffa009d80a>] do_async_commit+0x1a/0x30 [btrfs] >> [ 719.552770] [<ffffffff81058eff>] process_one_work+0x11f/0x470 >> [ 719.552771] [<ffffffff8105b128>] worker_thread+0x178/0x400 >> [ 719.552772] [<ffffffff8105afb0>] ? manage_workers+0x210/0x210 >> [ 719.552774] [<ffffffff8105fc46>] kthread+0x96/0xa0 >> [ 719.552776] [<ffffffff81622dd4>] kernel_thread_helper+0x4/0x10 >> [ 719.552777] [<ffffffff8105fbb0>] ? kthread_worker_fn+0x130/0x130 >> [ 719.552779] [<ffffffff81622dd0>] ? gs_change+0xb/0xb >> [ 719.552781] INFO: task kworker/3:1:578 blocked for more than 120 >> seconds. >> [ 719.552781] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables >> this message. >> [ 719.552783] kworker/3:1 D ffffffff8180ebc0 0 578 2 >> 0x00000000 >> [ 719.552784] ffff880e45dd3c90 0000000000000046 ffff880e7890e080 >> 0000000000012280 >> [ 719.552786] ffff880e45dd3fd8 ffff880e45dd2010 0000000000012280 >> 0000000000012280 >> [ 719.552787] ffff880e45dd3fd8 0000000000012280 ffff880e78e7e280 >> ffff880e7890e080 >> [ 719.552787] Call Trace: >> [ 719.552789] [<ffffffff81620334>] schedule+0x24/0x70 >> [ 719.552795] [<ffffffffa009b4c5>] wait_for_commit+0x55/0x90 [btrfs] >> [ 719.552797] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 >> [ 719.552802] [<ffffffffa009d095>] btrfs_commit_transaction+0x655/0xab0 >> [btrfs] >> [ 719.552804] [<ffffffff81076058>] ? idle_balance+0x108/0x130 >> [ 719.552805] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 >> [ 719.552810] [<ffffffffa009d7f0>] ? btrfs_end_transaction+0x20/0x20 >> [btrfs] >> [ 719.552816] [<ffffffffa009d80a>] do_async_commit+0x1a/0x30 [btrfs] >> [ 719.552817] [<ffffffff81058eff>] process_one_work+0x11f/0x470 >> [ 719.552818] [<ffffffff8105b128>] worker_thread+0x178/0x400 >> [ 719.552819] [<ffffffff8105afb0>] ? manage_workers+0x210/0x210 >> [ 719.552820] [<ffffffff8105fc46>] kthread+0x96/0xa0 >> [ 719.552822] [<ffffffff81622dd4>] kernel_thread_helper+0x4/0x10 >> [ 719.552824] [<ffffffff8105fbb0>] ? kthread_worker_fn+0x130/0x130 >> [ 719.552825] [<ffffffff81622dd0>] ? gs_change+0xb/0xb >> [ 719.552827] INFO: task kworker/2:1:1184 blocked for more than 120 >> seconds. >> [ 719.552827] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables >> this message. >> [ 719.552828] kworker/2:1 D ffffffff8180ebc0 0 1184 2 >> 0x00000000 >> [ 719.552830] ffff880e448cfc90 0000000000000046 ffff880e46cce380 >> 0000000000012280 >> [ 719.552831] ffff880e448cffd8 ffff880e448ce010 0000000000012280 >> 0000000000012280 >> [ 719.552832] ffff880e448cffd8 0000000000012280 ffff880e78e5a240 >> ffff880e46cce380 >> [ 719.552832] Call Trace: >> [ 719.552834] [<ffffffff81620334>] schedule+0x24/0x70 >> [ 719.552839] [<ffffffffa009b4c5>] wait_for_commit+0x55/0x90 [btrfs] >> [ 719.552841] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 >> [ 719.552846] [<ffffffffa009d095>] btrfs_commit_transaction+0x655/0xab0 >> [btrfs] >> [ 719.552848] [<ffffffff81076058>] ? idle_balance+0x108/0x130 >> [ 719.552849] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 >> [ 719.552854] [<ffffffffa009d7f0>] ? btrfs_end_transaction+0x20/0x20 >> [btrfs] >> [ 719.552859] [<ffffffffa009d80a>] do_async_commit+0x1a/0x30 [btrfs] >> [ 719.552861] [<ffffffff81058eff>] process_one_work+0x11f/0x470 >> [ 719.552862] [<ffffffff8105b128>] worker_thread+0x178/0x400 >> [ 719.552863] [<ffffffff8105afb0>] ? manage_workers+0x210/0x210 >> [ 719.552864] [<ffffffff8105fc46>] kthread+0x96/0xa0 >> [ 719.552866] [<ffffffff81622dd4>] kernel_thread_helper+0x4/0x10 >> [ 719.552867] [<ffffffff8105fbb0>] ? kthread_worker_fn+0x130/0x130 >> [ 719.552868] [<ffffffff81622dd0>] ? gs_change+0xb/0xb >> [ 719.552869] INFO: task kworker/5:1:1186 blocked for more than 120 >> seconds. >> [ 719.552869] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables >> this message. >> [ 719.552870] kworker/5:1 D ffffffff8180ebc0 0 1186 2 >> 0x00000000 >> [ 719.552872] ffff880e448a3c90 0000000000000046 ffff880e46d36600 >> 0000000000012280 >> [ 719.552873] ffff880e448a3fd8 ffff880e448a2010 0000000000012280 >> 0000000000012280 >> [ 719.552874] ffff880e448a3fd8 0000000000012280 ffff880e78e88300 >> ffff880e46d36600 >> [ 719.552874] Call Trace: >> [ 719.552876] [<ffffffff81620334>] schedule+0x24/0x70 >> [ 719.552881] [<ffffffffa009b4c5>] wait_for_commit+0x55/0x90 [btrfs] >> [ 719.552883] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 >> [ 719.552888] [<ffffffffa009d095>] btrfs_commit_transaction+0x655/0xab0 >> [btrfs] >> [ 719.552890] [<ffffffff81076058>] ? idle_balance+0x108/0x130 >> [ 719.552891] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 >> [ 719.552896] [<ffffffffa009d7f0>] ? btrfs_end_transaction+0x20/0x20 >> [btrfs] >> [ 719.552901] [<ffffffffa009d80a>] do_async_commit+0x1a/0x30 [btrfs] >> [ 719.552902] [<ffffffff81058eff>] process_one_work+0x11f/0x470 >> [ 719.552903] [<ffffffff8105b128>] worker_thread+0x178/0x400 >> [ 719.552904] [<ffffffff8105afb0>] ? manage_workers+0x210/0x210 >> [ 719.552906] [<ffffffff8105fc46>] kthread+0x96/0xa0 >> [ 719.552907] [<ffffffff81622dd4>] kernel_thread_helper+0x4/0x10 >> [ 719.552909] [<ffffffff8105fbb0>] ? kthread_worker_fn+0x130/0x130 >> [ 719.552910] [<ffffffff81622dd0>] ? gs_change+0xb/0xb >> >> Stefan >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 06/23/2012 04:50 AM, Stefan Priebe wrote:> Hello list, > > i''ve seen this deadlock today and can reproduce it while using ceph. > > Is this a known bug? > > [ 599.514534] INFO: task kworker/6:0:29 blocked for more than 120 seconds. > [ 599.530394] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > disables this message. > [ 599.546832] kworker/6:0 D ffffffff8180ebc0 0 29 2 > 0x00000000 > [ 599.563725] ffff880e78f4dc90 0000000000000046 ffff880e78f4a180 > 0000000000012280 > [ 599.580812] ffff880e78f4dfd8 ffff880e78f4c010 0000000000012280 > 0000000000012280 > [ 599.580813] ffff880e78f4dfd8 0000000000012280 ffff880e78e8c340 > ffff880e78f4a180 > [ 599.580814] Call Trace: > [ 599.580820] [<ffffffff81620334>] schedule+0x24/0x70 > [ 599.580832] [<ffffffffa009b4c5>] wait_for_commit+0x55/0x90 [btrfs] > [ 599.580836] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 > [ 599.580844] [<ffffffffa009d095>] > btrfs_commit_transaction+0x655/0xab0 [btrfs] > [ 599.580846] [<ffffffff81076058>] ? idle_balance+0x108/0x130 > [ 599.580847] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 > [ 599.580853] [<ffffffffa009d7f0>] ? btrfs_end_transaction+0x20/0x20 > [btrfs] > [ 599.580859] [<ffffffffa009d80a>] do_async_commit+0x1a/0x30 [btrfs] > [ 599.580861] [<ffffffff81058eff>] process_one_work+0x11f/0x470 > [ 599.580862] [<ffffffff8105b128>] worker_thread+0x178/0x400 > [ 599.580863] [<ffffffff8105afb0>] ? manage_workers+0x210/0x210 > [ 599.580865] [<ffffffff8105fc46>] kthread+0x96/0xa0 > [ 599.580866] [<ffffffff81622dd4>] kernel_thread_helper+0x4/0x10 > [ 599.580868] [<ffffffff8105fbb0>] ? kthread_worker_fn+0x130/0x130 > [ 599.580869] [<ffffffff81622dd0>] ? gs_change+0xb/0xb > [ 599.580872] INFO: task kworker/2:1:1184 blocked for more than 120 > seconds. > [ 599.580872] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > disables this message. > [ 599.580873] kworker/2:1 D ffffffff8180ebc0 0 1184 2 > 0x00000000 > [ 599.580874] ffff880e448cfc90 0000000000000046 ffff880e46cce380 > 0000000000012280 > [ 599.580875] ffff880e448cffd8 ffff880e448ce010 0000000000012280 > 0000000000012280 > [ 599.580876] ffff880e448cffd8 0000000000012280 ffff880e78e5a240 > ffff880e46cce380 > [ 599.580877] Call Trace: > [ 599.580879] [<ffffffff81620334>] schedule+0x24/0x70 > [ 599.580884] [<ffffffffa009b4c5>] wait_for_commit+0x55/0x90 [btrfs] > [ 599.580886] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 > [ 599.580891] [<ffffffffa009d095>] > btrfs_commit_transaction+0x655/0xab0 [btrfs] > [ 599.580893] [<ffffffff81076058>] ? idle_balance+0x108/0x130 > [ 599.580894] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 > [ 599.580899] [<ffffffffa009d7f0>] ? btrfs_end_transaction+0x20/0x20 > [btrfs] > [ 599.580905] [<ffffffffa009d80a>] do_async_commit+0x1a/0x30 [btrfs] > [ 599.580906] [<ffffffff81058eff>] process_one_work+0x11f/0x470 > [ 599.580907] [<ffffffff8105b128>] worker_thread+0x178/0x400 > [ 599.580908] [<ffffffff8105afb0>] ? manage_workers+0x210/0x210 > [ 599.580910] [<ffffffff8105fc46>] kthread+0x96/0xa0 > [ 599.580911] [<ffffffff81622dd4>] kernel_thread_helper+0x4/0x10 > [ 599.580913] [<ffffffff8105fbb0>] ? kthread_worker_fn+0x130/0x130 > [ 599.580914] [<ffffffff81622dd0>] ? gs_change+0xb/0xb > [ 599.580915] INFO: task kworker/5:1:1186 blocked for more than 120 > seconds. > [ 599.580915] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > disables this message. > [ 599.580916] kworker/5:1 D ffffffff8180ebc0 0 1186 2 > 0x00000000 > [ 599.580917] ffff880e448a3c90 0000000000000046 ffff880e46d36600 > 0000000000012280 > [ 599.580919] ffff880e448a3fd8 ffff880e448a2010 0000000000012280 > 0000000000012280 > [ 599.580920] ffff880e448a3fd8 0000000000012280 ffff880e78e88300 > ffff880e46d36600 > [ 599.580920] Call Trace: > [ 599.580922] [<ffffffff81620334>] schedule+0x24/0x70 > [ 599.580927] [<ffffffffa009b4c5>] wait_for_commit+0x55/0x90 [btrfs] > [ 599.580929] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 > [ 599.580934] [<ffffffffa009d095>] > btrfs_commit_transaction+0x655/0xab0 [btrfs] > [ 599.580936] [<ffffffff81076058>] ? idle_balance+0x108/0x130 > [ 599.580937] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 > [ 599.580942] [<ffffffffa009d7f0>] ? btrfs_end_transaction+0x20/0x20 > [btrfs] > [ 599.580948] [<ffffffffa009d80a>] do_async_commit+0x1a/0x30 [btrfs] > [ 599.580949] [<ffffffff81058eff>] process_one_work+0x11f/0x470 > [ 599.580950] [<ffffffff8105b128>] worker_thread+0x178/0x400 > [ 599.580951] [<ffffffff8105afb0>] ? manage_workers+0x210/0x210 > [ 599.580952] [<ffffffff8105fc46>] kthread+0x96/0xa0 > [ 599.580954] [<ffffffff81622dd4>] kernel_thread_helper+0x4/0x10 > [ 599.580955] [<ffffffff8105fbb0>] ? kthread_worker_fn+0x130/0x130 > [ 599.580957] [<ffffffff81622dd0>] ? gs_change+0xb/0xb > [ 599.580966] INFO: task ceph-osd:3248 blocked for more than 120 seconds. > [ 599.580966] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > disables this message. > [ 599.580967] ceph-osd D 0000000000000006 0 3248 1 > 0x00000000 > [ 599.580968] ffff880bc4201c38 0000000000000082 ffff880e3efc4480 > 0000000000012280 > [ 599.580970] ffff880bc4201fd8 ffff880bc4200010 0000000000012280 > 0000000000012280 > [ 599.580971] ffff880bc4201fd8 0000000000012280 ffff880e78f4a180 > ffff880e3efc4480 > [ 599.580971] Call Trace: > [ 599.580972] [<ffffffff81620334>] schedule+0x24/0x70 > [ 599.580978] [<ffffffffa009dd65>] > btrfs_commit_transaction_async+0x1d5/0x240 [btrfs] > [ 599.580983] [<ffffffffa0081eb6>] ? block_rsv_add_bytes+0x26/0x60 [btrfs] > [ 599.580985] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 > [ 599.580990] [<ffffffffa0081f25>] ? block_rsv_migrate_bytes+0x35/0x50 > [btrfs] > [ 599.580997] [<ffffffffa00ced2e>] btrfs_mksubvol+0x2be/0x350 [btrfs] > [ 599.581004] [<ffffffffa00ceec9>] > btrfs_ioctl_snap_create_transid+0x109/0x1a0 [btrfs] > [ 599.581010] [<ffffffffa00d0bc4>] > btrfs_ioctl_snap_create_v2+0x84/0xf0 [btrfs] > [ 599.581016] [<ffffffffa00d353f>] btrfs_ioctl+0x76f/0x12d0 [btrfs] > [ 599.581019] [<ffffffff8114459a>] ? fsnotify+0x1ba/0x2e0 > [ 599.581021] [<ffffffff8111ade3>] do_vfs_ioctl+0x93/0x4f0 > [ 599.581022] [<ffffffff8111b28a>] sys_ioctl+0x4a/0x80 > [ 599.581024] [<ffffffff81621ba2>] system_call_fastpath+0x16/0x1b > [ 599.581035] INFO: task ceph-osd:3278 blocked for more than 120 seconds. > [ 599.581035] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > disables this message. > [ 599.581036] ceph-osd D 0000000000000002 0 3278 1 > 0x00000000 > [ 599.581038] ffff880bc41fbc38 0000000000000082 ffff880e412283c0 > 0000000000012280 > [ 599.581039] ffff880bc41fbfd8 ffff880bc41fa010 0000000000012280 > 0000000000012280 > [ 599.581040] ffff880bc41fbfd8 0000000000012280 ffff880e46cce380 > ffff880e412283c0 > [ 599.581040] Call Trace: > [ 599.581042] [<ffffffff81620334>] schedule+0x24/0x70 > [ 599.581048] [<ffffffffa009dd65>] > btrfs_commit_transaction_async+0x1d5/0x240 [btrfs] > [ 599.581053] [<ffffffffa0081eb6>] ? block_rsv_add_bytes+0x26/0x60 [btrfs] > [ 599.581055] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 > [ 599.581059] [<ffffffffa0081f25>] ? block_rsv_migrate_bytes+0x35/0x50 > [btrfs] > [ 599.581066] [<ffffffffa00ced2e>] btrfs_mksubvol+0x2be/0x350 [btrfs] > [ 599.581071] [<ffffffffa00ceec9>] > btrfs_ioctl_snap_create_transid+0x109/0x1a0 [btrfs] > [ 599.581077] [<ffffffffa00d0bc4>] > btrfs_ioctl_snap_create_v2+0x84/0xf0 [btrfs] > [ 599.581082] [<ffffffffa00d353f>] btrfs_ioctl+0x76f/0x12d0 [btrfs] > [ 599.581084] [<ffffffff8114459a>] ? fsnotify+0x1ba/0x2e0 > [ 599.581085] [<ffffffff8111ade3>] do_vfs_ioctl+0x93/0x4f0 > [ 599.581086] [<ffffffff8111b28a>] sys_ioctl+0x4a/0x80 > [ 599.581088] [<ffffffff81621ba2>] system_call_fastpath+0x16/0x1b > [ 599.581094] INFO: task ceph-osd:3330 blocked for more than 120 seconds. > [ 599.581094] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > disables this message. > [ 599.581095] ceph-osd D 0000000000000005 0 3330 1 > 0x00000000 > [ 599.581096] ffff880bc3bf9c38 0000000000000086 ffff880bc3bd4740 > 0000000000012280 > [ 599.581098] ffff880bc3bf9fd8 ffff880bc3bf8010 0000000000012280 > 0000000000012280 > [ 599.581099] ffff880bc3bf9fd8 0000000000012280 ffff880e46d36600 > ffff880bc3bd4740 > [ 599.581099] Call Trace: > [ 599.581101] [<ffffffff81620334>] schedule+0x24/0x70 > [ 599.581106] [<ffffffffa009dd65>] > btrfs_commit_transaction_async+0x1d5/0x240 [btrfs] > [ 599.581111] [<ffffffffa0081eb6>] ? block_rsv_add_bytes+0x26/0x60 [btrfs] > [ 599.581113] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 > [ 599.581118] [<ffffffffa0081f25>] ? block_rsv_migrate_bytes+0x35/0x50 > [btrfs] > [ 599.581123] [<ffffffffa00ced2e>] btrfs_mksubvol+0x2be/0x350 [btrfs] > [ 599.581129] [<ffffffffa00ceec9>] > btrfs_ioctl_snap_create_transid+0x109/0x1a0 [btrfs] > [ 599.581134] [<ffffffffa00d0bc4>] > btrfs_ioctl_snap_create_v2+0x84/0xf0 [btrfs] > [ 599.581139] [<ffffffffa00d353f>] btrfs_ioctl+0x76f/0x12d0 [btrfs] > [ 599.581141] [<ffffffff8114459a>] ? fsnotify+0x1ba/0x2e0 > [ 599.581142] [<ffffffff8111ade3>] do_vfs_ioctl+0x93/0x4f0 > [ 599.581143] [<ffffffff8111b28a>] sys_ioctl+0x4a/0x80 > [ 599.581145] [<ffffffff81621ba2>] system_call_fastpath+0x16/0x1b > [ 719.395814] INFO: task kworker/6:0:29 blocked for more than 120 seconds. > [ 719.417426] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > disables this message. > [ 719.439537] kworker/6:0 D ffffffff8180ebc0 0 29 2 > 0x00000000 > [ 719.461937] ffff880e78f4dc90 0000000000000046 ffff880e78f4a180 > 0000000000012280 > [ 719.484547] ffff880e78f4dfd8 ffff880e78f4c010 0000000000012280 > 0000000000012280 > [ 719.507211] ffff880e78f4dfd8 0000000000012280 ffff880e78e8c340 > ffff880e78f4a180 > [ 719.530071] Call Trace: > [ 719.552729] [<ffffffff81620334>] schedule+0x24/0x70 > [ 719.552741] [<ffffffffa009b4c5>] wait_for_commit+0x55/0x90 [btrfs] > [ 719.552744] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 > [ 719.552752] [<ffffffffa009d095>] > btrfs_commit_transaction+0x655/0xab0 [btrfs] > [ 719.552755] [<ffffffff81076058>] ? idle_balance+0x108/0x130 > [ 719.552756] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 > [ 719.552762] [<ffffffffa009d7f0>] ? btrfs_end_transaction+0x20/0x20 > [btrfs] > [ 719.552768] [<ffffffffa009d80a>] do_async_commit+0x1a/0x30 [btrfs] > [ 719.552770] [<ffffffff81058eff>] process_one_work+0x11f/0x470 > [ 719.552771] [<ffffffff8105b128>] worker_thread+0x178/0x400 > [ 719.552772] [<ffffffff8105afb0>] ? manage_workers+0x210/0x210 > [ 719.552774] [<ffffffff8105fc46>] kthread+0x96/0xa0 > [ 719.552776] [<ffffffff81622dd4>] kernel_thread_helper+0x4/0x10 > [ 719.552777] [<ffffffff8105fbb0>] ? kthread_worker_fn+0x130/0x130 > [ 719.552779] [<ffffffff81622dd0>] ? gs_change+0xb/0xb > [ 719.552781] INFO: task kworker/3:1:578 blocked for more than 120 seconds. > [ 719.552781] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > disables this message. > [ 719.552783] kworker/3:1 D ffffffff8180ebc0 0 578 2 > 0x00000000 > [ 719.552784] ffff880e45dd3c90 0000000000000046 ffff880e7890e080 > 0000000000012280 > [ 719.552786] ffff880e45dd3fd8 ffff880e45dd2010 0000000000012280 > 0000000000012280 > [ 719.552787] ffff880e45dd3fd8 0000000000012280 ffff880e78e7e280 > ffff880e7890e080 > [ 719.552787] Call Trace: > [ 719.552789] [<ffffffff81620334>] schedule+0x24/0x70 > [ 719.552795] [<ffffffffa009b4c5>] wait_for_commit+0x55/0x90 [btrfs] > [ 719.552797] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 > [ 719.552802] [<ffffffffa009d095>] > btrfs_commit_transaction+0x655/0xab0 [btrfs] > [ 719.552804] [<ffffffff81076058>] ? idle_balance+0x108/0x130 > [ 719.552805] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 > [ 719.552810] [<ffffffffa009d7f0>] ? btrfs_end_transaction+0x20/0x20 > [btrfs] > [ 719.552816] [<ffffffffa009d80a>] do_async_commit+0x1a/0x30 [btrfs] > [ 719.552817] [<ffffffff81058eff>] process_one_work+0x11f/0x470 > [ 719.552818] [<ffffffff8105b128>] worker_thread+0x178/0x400 > [ 719.552819] [<ffffffff8105afb0>] ? manage_workers+0x210/0x210 > [ 719.552820] [<ffffffff8105fc46>] kthread+0x96/0xa0 > [ 719.552822] [<ffffffff81622dd4>] kernel_thread_helper+0x4/0x10 > [ 719.552824] [<ffffffff8105fbb0>] ? kthread_worker_fn+0x130/0x130 > [ 719.552825] [<ffffffff81622dd0>] ? gs_change+0xb/0xb > [ 719.552827] INFO: task kworker/2:1:1184 blocked for more than 120 > seconds. > [ 719.552827] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > disables this message. > [ 719.552828] kworker/2:1 D ffffffff8180ebc0 0 1184 2 > 0x00000000 > [ 719.552830] ffff880e448cfc90 0000000000000046 ffff880e46cce380 > 0000000000012280 > [ 719.552831] ffff880e448cffd8 ffff880e448ce010 0000000000012280 > 0000000000012280 > [ 719.552832] ffff880e448cffd8 0000000000012280 ffff880e78e5a240 > ffff880e46cce380 > [ 719.552832] Call Trace: > [ 719.552834] [<ffffffff81620334>] schedule+0x24/0x70 > [ 719.552839] [<ffffffffa009b4c5>] wait_for_commit+0x55/0x90 [btrfs] > [ 719.552841] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 > [ 719.552846] [<ffffffffa009d095>] > btrfs_commit_transaction+0x655/0xab0 [btrfs] > [ 719.552848] [<ffffffff81076058>] ? idle_balance+0x108/0x130 > [ 719.552849] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 > [ 719.552854] [<ffffffffa009d7f0>] ? btrfs_end_transaction+0x20/0x20 > [btrfs] > [ 719.552859] [<ffffffffa009d80a>] do_async_commit+0x1a/0x30 [btrfs] > [ 719.552861] [<ffffffff81058eff>] process_one_work+0x11f/0x470 > [ 719.552862] [<ffffffff8105b128>] worker_thread+0x178/0x400 > [ 719.552863] [<ffffffff8105afb0>] ? manage_workers+0x210/0x210 > [ 719.552864] [<ffffffff8105fc46>] kthread+0x96/0xa0 > [ 719.552866] [<ffffffff81622dd4>] kernel_thread_helper+0x4/0x10 > [ 719.552867] [<ffffffff8105fbb0>] ? kthread_worker_fn+0x130/0x130 > [ 719.552868] [<ffffffff81622dd0>] ? gs_change+0xb/0xb > [ 719.552869] INFO: task kworker/5:1:1186 blocked for more than 120 > seconds. > [ 719.552869] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > disables this message. > [ 719.552870] kworker/5:1 D ffffffff8180ebc0 0 1186 2 > 0x00000000 > [ 719.552872] ffff880e448a3c90 0000000000000046 ffff880e46d36600 > 0000000000012280 > [ 719.552873] ffff880e448a3fd8 ffff880e448a2010 0000000000012280 > 0000000000012280 > [ 719.552874] ffff880e448a3fd8 0000000000012280 ffff880e78e88300 > ffff880e46d36600 > [ 719.552874] Call Trace: > [ 719.552876] [<ffffffff81620334>] schedule+0x24/0x70 > [ 719.552881] [<ffffffffa009b4c5>] wait_for_commit+0x55/0x90 [btrfs] > [ 719.552883] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 > [ 719.552888] [<ffffffffa009d095>] > btrfs_commit_transaction+0x655/0xab0 [btrfs] > [ 719.552890] [<ffffffff81076058>] ? idle_balance+0x108/0x130 > [ 719.552891] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 > [ 719.552896] [<ffffffffa009d7f0>] ? btrfs_end_transaction+0x20/0x20 > [btrfs] > [ 719.552901] [<ffffffffa009d80a>] do_async_commit+0x1a/0x30 [btrfs] > [ 719.552902] [<ffffffff81058eff>] process_one_work+0x11f/0x470 > [ 719.552903] [<ffffffff8105b128>] worker_thread+0x178/0x400 > [ 719.552904] [<ffffffff8105afb0>] ? manage_workers+0x210/0x210 > [ 719.552906] [<ffffffff8105fc46>] kthread+0x96/0xa0 > [ 719.552907] [<ffffffff81622dd4>] kernel_thread_helper+0x4/0x10 > [ 719.552909] [<ffffffff8105fbb0>] ? kthread_worker_fn+0x130/0x130 > [ 719.552910] [<ffffffff81622dd0>] ? gs_change+0xb/0xb >This isn''t showing the guy who''s actually trying to commit the transaction. Next time this happens can you do a sysrq+w and capture the output and post it here so we can see what everybody is doing? Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Am 25.06.2012 15:08, schrieb Josef Bacik:> > This isn''t showing the guy who''s actually trying to commit the > transaction. Next time this happens can you do a sysrq+w and capture > the output and post it here so we can see what everybody is doing? Thanks, > > Josef No problem. Kernel trace: http://pastebin.com/raw.php?i=puZkCRCn # sys rq w trigger: # echo w > /proc/sysrq-trigger http://pastebin.com/raw.php?i=AQA8xxCX Hope that helps. I''m able to trigger this pretty easily with ceph. So i can produce as much info as you want. Thanks! Greets, Stefan -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Jun 25, 2012 at 08:08:31AM -0600, Stefan Priebe - Profihost AG wrote:> Am 25.06.2012 15:08, schrieb Josef Bacik:> > > This isn''t showing the guy who''s actually trying to commit the > > transaction. Next time this happens can you do a sysrq+w and capture > > the output and post it here so we can see what everybody is doing? > Thanks, > > > > Josef > > No problem. > > Kernel trace: > http://pastebin.com/raw.php?i=puZkCRCn > > # sys rq w trigger: > # echo w > /proc/sysrq-trigger > http://pastebin.com/raw.php?i=AQA8xxCXThats weird, sysrq+w should have a bunch of stacktraces but it''s empty, so unless theres a bug theres nothing blocked. Is the box actually hung or is it just taking forever? Maybe try sysrq+w again to see if the one you pasted was just a fluke? Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> > Thats weird, sysrq+w should have a bunch of stacktraces but it''s empty, so > unless theres a bug theres nothing blocked. Is the box actually hung or is it > just taking forever? Maybe try sysrq+w again to see if the one you pasted was > just a fluke? Thanks,This one looks better: http://pastebin.com/raw.php?i=R4pztDRt Sorry. Stefan -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Jun 25, 2012 at 08:45:02AM -0600, Stefan Priebe - Profihost AG wrote:> > > > Thats weird, sysrq+w should have a bunch of stacktraces but it''s empty, so > > unless theres a bug theres nothing blocked. Is the box actually hung or is it > > just taking forever? Maybe try sysrq+w again to see if the one you pasted was > > just a fluke? Thanks, > > This one looks better: > http://pastebin.com/raw.php?i=R4pztDRt >Ok looks like you have discard turned on. Can you turn that off and see if you can still reproduce the deadlock? If so sysrq+w again, if not then I know where to look ;). Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Am 25.06.2012 16:48, schrieb Josef Bacik:> On Mon, Jun 25, 2012 at 08:45:02AM -0600, Stefan Priebe - Profihost AG wrote: >>> >>> Thats weird, sysrq+w should have a bunch of stacktraces but it''s empty, so >>> unless theres a bug theres nothing blocked. Is the box actually hung or is it >>> just taking forever? Maybe try sysrq+w again to see if the one you pasted was >>> just a fluke? Thanks, >> >> This one looks better: >> http://pastebin.com/raw.php?i=R4pztDRt >> > > Ok looks like you have discard turned on.Yes > Can you turn that off and see if you> can still reproduce the deadlock? If so sysrq+w again, if not then I know where > to look ;). Thanks,without discard i can''t reproduce but random write speed with ceph without discard is a LOT slower (around 8000 iops/s instead of 13000iops/s). So i don''t know if it is discard or if i''m just not able to trigger it. Thanks! Stefan -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Jun 25, 2012 at 11:38:38AM -0600, Stefan Priebe wrote:> > Am 25.06.2012 16:48, schrieb Josef Bacik: > > On Mon, Jun 25, 2012 at 08:45:02AM -0600, Stefan Priebe - Profihost AG wrote: > >>> > >>> Thats weird, sysrq+w should have a bunch of stacktraces but it''s empty, so > >>> unless theres a bug theres nothing blocked. Is the box actually hung or is it > >>> just taking forever? Maybe try sysrq+w again to see if the one you pasted was > >>> just a fluke? Thanks, > >> > >> This one looks better: > >> http://pastebin.com/raw.php?i=R4pztDRt > >> > > > > Ok looks like you have discard turned on. > Yes > > > Can you turn that off and see if you > > can still reproduce the deadlock? If so sysrq+w again, if not then I know where > > to look ;). Thanks, > without discard i can''t reproduce but random write speed with ceph > without discard is a LOT slower (around 8000 iops/s instead of > 13000iops/s). So i don''t know if it is discard or if i''m just not able > to trigger it. >Ouch, what kind of drive goes faster with discard _on_? Anyway it looks like we''re waiting for the discard to come back, so either its your drive or theres a bug in the block layer. Maybe try an older kernel and see if it''s broken there, and then bisect it down? Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Am 25.06.2012 20:02, schrieb Josef Bacik:>> > Can you turn that off and see if you >>> can still reproduce the deadlock? If so sysrq+w again, if not then I know where >>> to look ;). Thanks, >> without discard i can''t reproduce but random write speed with ceph >> without discard is a LOT slower (around 8000 iops/s instead of >> 13000iops/s). So i don''t know if it is discard or if i''m just not able >> to trigger it. >> > > Ouch, what kind of drive goes faster with discard _on_? Anyway it looks like > we''re waiting for the discard to come back, so either its your drive or theres a > bug in the block layer. Maybe try an older kernel and see if it''s broken there, > and then bisect it down? Thanks,Intel 520 series SSD. I can''t try an older kernel version as ceph crashes with them pretty fast due to other bugs which where fixed with 3.5-rc1. Greets Stefan -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
With v3.4 the same. Can''t go back more as this really results in very fast corruption. Any ideas how to debug? Stefan Am 25.06.2012 20:28, schrieb Stefan Priebe:> Am 25.06.2012 20:02, schrieb Josef Bacik: >>> > Can you turn that off and see if you >>>> can still reproduce the deadlock? If so sysrq+w again, if not then >>>> I know where >>>> to look ;). Thanks, >>> without discard i can''t reproduce but random write speed with ceph >>> without discard is a LOT slower (around 8000 iops/s instead of >>> 13000iops/s). So i don''t know if it is discard or if i''m just not able >>> to trigger it. >>> >> >> Ouch, what kind of drive goes faster with discard _on_? Anyway it >> looks like >> we''re waiting for the discard to come back, so either its your drive >> or theres a >> bug in the block layer. Maybe try an older kernel and see if it''s >> broken there, >> and then bisect it down? Thanks, > > Intel 520 series SSD. I can''t try an older kernel version as ceph > crashes with them pretty fast due to other bugs which where fixed with > 3.5-rc1. > > Greets > Stefan-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Jun 25, 2012 at 01:33:09PM -0600, Stefan Priebe wrote:> With v3.4 the same. Can''t go back more as this really results in very > fast corruption. Any ideas how to debug? >What workload are you running? I have a ssd here with discard support I can try and reproduce on. Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Am 25.06.2012 22:11, schrieb Josef Bacik:> On Mon, Jun 25, 2012 at 01:33:09PM -0600, Stefan Priebe wrote: >> With v3.4 the same. Can''t go back more as this really results in very >> fast corruption. Any ideas how to debug? >> > > What workload are you running? I have a ssd here with discard support I can try > and reproduce on. Thanks,i''m using fio with 50 jobs and randwrite of 4k blocks in ceph but i don''t know which load ceph then exactly generates. ;-( Stefan -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Jun 25, 2012 at 02:20:31PM -0600, Stefan Priebe wrote:> Am 25.06.2012 22:11, schrieb Josef Bacik: > > On Mon, Jun 25, 2012 at 01:33:09PM -0600, Stefan Priebe wrote: > >> With v3.4 the same. Can''t go back more as this really results in very > >> fast corruption. Any ideas how to debug? > >> > > > > What workload are you running? I have a ssd here with discard support I can try > > and reproduce on. Thanks, > > i''m using fio with 50 jobs and randwrite of 4k blocks in ceph but i > don''t know which load ceph then exactly generates. ;-( >Thats fine, I have this handy "create a local ceph cluster" script from an earlier problem, just send me your fio job and I''ll run it locally. Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Am 25.06.2012 22:23, schrieb Josef Bacik:> On Mon, Jun 25, 2012 at 02:20:31PM -0600, Stefan Priebe wrote: >> Am 25.06.2012 22:11, schrieb Josef Bacik: >>> On Mon, Jun 25, 2012 at 01:33:09PM -0600, Stefan Priebe wrote: >>>> With v3.4 the same. Can''t go back more as this really results in very >>>> fast corruption. Any ideas how to debug? >>>> >>> >>> What workload are you running? I have a ssd here with discard support I can try >>> and reproduce on. Thanks, >> >> i''m using fio with 50 jobs and randwrite of 4k blocks in ceph but i >> don''t know which load ceph then exactly generates. ;-( >> > > Thats fine, I have this handy "create a local ceph cluster" script from an > earlier problem, just send me your fio job and I''ll run it locally. Thanks,OK my fio job is running in a KVM using a RBD: fio --filename=$DISK --direct=1 --rw=randwrite --bs=4k --size=200G --numjobs=50 --runtime=300 --group_reporting --name=file1 Backed by 3 server with 4 OSDs (all intel SSDs) each running btrfs on top. THANKS! Greets Stefan -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Am 25.06.2012 22:23, schrieb Josef Bacik:> On Mon, Jun 25, 2012 at 02:20:31PM -0600, Stefan Priebe wrote: >> Am 25.06.2012 22:11, schrieb Josef Bacik: >>> On Mon, Jun 25, 2012 at 01:33:09PM -0600, Stefan Priebe wrote: >>>> With v3.4 the same. Can''t go back more as this really results in very >>>> fast corruption. Any ideas how to debug? >>>> >>> >>> What workload are you running? I have a ssd here with discard support I can try >>> and reproduce on. Thanks, >> >> i''m using fio with 50 jobs and randwrite of 4k blocks in ceph but i >> don''t know which load ceph then exactly generates. ;-( >> > > Thats fine, I have this handy "create a local ceph cluster" script from an > earlier problem, just send me your fio job and I''ll run it locally. Thanks,Where you able to find anything? Can i do more or different testing? Stefan -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Jun 26, 2012 at 10:47:56AM -0600, Stefan Priebe wrote:> Am 25.06.2012 22:23, schrieb Josef Bacik: > > On Mon, Jun 25, 2012 at 02:20:31PM -0600, Stefan Priebe wrote: > >> Am 25.06.2012 22:11, schrieb Josef Bacik: > >>> On Mon, Jun 25, 2012 at 01:33:09PM -0600, Stefan Priebe wrote: > >>>> With v3.4 the same. Can''t go back more as this really results in very > >>>> fast corruption. Any ideas how to debug? > >>>> > >>> > >>> What workload are you running? I have a ssd here with discard support I can try > >>> and reproduce on. Thanks, > >> > >> i''m using fio with 50 jobs and randwrite of 4k blocks in ceph but i > >> don''t know which load ceph then exactly generates. ;-( > >> > > > > Thats fine, I have this handy "create a local ceph cluster" script from an > > earlier problem, just send me your fio job and I''ll run it locally. Thanks, > > Where you able to find anything? Can i do more or different testing? >I can''t reproduce so I''m going to have to figure out a way to debug it through you, as soon as I think of something I will let you know. Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Am 26.06.2012 22:14, schrieb Josef Bacik:> I can''t reproduce so I''m going to have to figure out a way to debug it through > you, as soon as I think of something I will let you know. Thanks, >Thanks. You mentioned that discard shouldn''t have any positive effects on a SSD. May i see a sideffect? I mean with discard 13.000 IOPs in ceph without discard just 6000-9000 IOPs could this be real or might this just happen due to the bug i see? Stefan -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Jun 26, 2012 at 02:19:17PM -0600, Stefan Priebe wrote:> Am 26.06.2012 22:14, schrieb Josef Bacik: > > I can''t reproduce so I''m going to have to figure out a way to debug it through > > you, as soon as I think of something I will let you know. Thanks, > > > > Thanks. You mentioned that discard shouldn''t have any positive effects > on a SSD. > > May i see a sideffect? I mean with discard 13.000 IOPs in ceph without > discard just 6000-9000 IOPs could this be real or might this just happen > due to the bug i see? >Beats me, it would seem to me that discard would make things slower since we have to wait for the commit to discard everybody, but who knows, stranger things have happened. Can you reproduce 2 more times and get sysrq+w each time so I have a few different outputs to compare, maybe I''m missing something. Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Yes i will do so. Right now i was trying to compare discard with non discard with this simple command: for i in `seq 0 1 1000`; do dd if=/dev/zero of=t_$i bs=4M count=1; rm t_$i; done; But i hit a new bug: [39577.660228] BUG: unable to handle kernel paging request at fffffffffffffe50 [39577.686517] IP: [<ffffffff8131b4f3>] btrfs_finish_ordered_io+0x23/0x3f0 [39577.713417] PGD 1c0d067 PUD 1c0e067 PMD 0 [39577.740039] Oops: 0000 [#1] SMP [39577.766401] CPU 6 [39577.792540] Modules linked in: nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 ipv6 i2c_i801 coretemp i2c_core ixgbe(O) [last unloaded: scsi_wait_scan] [39577.847511] [39577.847513] Pid: 3447, comm: btrfs-endio-wri Tainted: G O 3.5.0-rc4intel #15 Supermicro X9SRE/X9SRE-3F/X9SRi/X9SRi-3F/X9SRE/X9SRE-3F/X9SRi/X9SRi-3F [39577.847516] RIP: 0010:[<ffffffff8131b4f3>] [<ffffffff8131b4f3>] btrfs_finish_ordered_io+0x23/0x3f0 [39577.847516] RSP: 0018:ffff880e3b861d90 EFLAGS: 00010287 [39577.847517] RAX: ffff880e3b861e90 RBX: ffff880e3a8fb100 RCX: ffff880e3b861e90 [39577.847517] RDX: ffff880e3b861e90 RSI: ffff880e3a8fb190 RDI: ffff880e3a8fb100 [39577.847518] RBP: ffff880e3b861e10 R08: dead000000100100 R09: dead000000200200 [39577.847518] R10: 0000000000000000 R11: 0000000000000001 R12: ffff880e3a624770 [39577.847518] R13: 0000000000000000 R14: ffff880e3a8fb1b8 R15: ffff880e3b861e80 [39577.847519] FS: 0000000000000000(0000) GS:ffff880e7fd80000(0000) knlGS:0000000000000000 [39577.847520] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [39577.847520] CR2: fffffffffffffe50 CR3: 0000000001c0b000 CR4: 00000000000407e0 [39577.847521] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [39577.847521] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [39577.847522] Process btrfs-endio-wri (pid: 3447, threadinfo ffff880e3b860000, task ffff880e40e58000) [39577.847522] Stack: [39577.847524] 0000000000000000 dead000000200200 0000000100965b86 ffff880e40e94000 [39577.847525] ffffffff8104dc20 ffff880e40e58000 ffffffffffffffff 0000000000000000 [39577.847526] 0000000000000000 0000000000000000 ffff880e40e58000 ffff880e3a624720 [39577.847527] Call Trace: [39577.847530] [<ffffffff8104dc20>] ? lock_timer_base+0x70/0x70 [39577.847531] [<ffffffff8131b8d0>] finish_ordered_fn+0x10/0x20 [39577.847533] [<ffffffff8133f38e>] worker_loop+0x14e/0x530 [39577.847534] [<ffffffff8133f240>] ? btrfs_queue_worker+0x310/0x310 [39577.847535] [<ffffffff8133f240>] ? btrfs_queue_worker+0x310/0x310 [39577.847538] [<ffffffff8105ffd6>] kthread+0x96/0xa0 [39577.847541] [<ffffffff816dc594>] kernel_thread_helper+0x4/0x10 [39577.847543] [<ffffffff8105ff40>] ? kthread_worker_fn+0x130/0x130 [39577.847544] [<ffffffff816dc590>] ? gs_change+0xb/0xb [39577.847555] Code: 0f 1f 84 00 00 00 00 00 55 48 89 e5 48 83 c4 80 48 89 5d d8 4c 89 65 e0 4c 89 6d e8 4c 89 75 f0 4c 89 7d f8 48 89 fb 4c 8b 6f 38 <4d> 8b a5 50 fe ff ff 4d 8d 95 50 fe ff ff 48 c7 45 c8 00 00 00 [39577.847556] RIP [<ffffffff8131b4f3>] btrfs_finish_ordered_io+0x23/0x3f0 [39577.847557] RSP <ffff880e3b861d90> [39577.847557] CR2: fffffffffffffe50 [39577.847558] ---[ end trace 27bdc0b318ad6463 ]--- Am 26.06.2012 22:48, schrieb Josef Bacik:> On Tue, Jun 26, 2012 at 02:19:17PM -0600, Stefan Priebe wrote: >> Am 26.06.2012 22:14, schrieb Josef Bacik: >>> I can''t reproduce so I''m going to have to figure out a way to debug it through >>> you, as soon as I think of something I will let you know. Thanks, >>> >> >> Thanks. You mentioned that discard shouldn''t have any positive effects >> on a SSD. >> >> May i see a sideffect? I mean with discard 13.000 IOPs in ceph without >> discard just 6000-9000 IOPs could this be real or might this just happen >> due to the bug i see? >> > > Beats me, it would seem to me that discard would make things slower since we > have to wait for the commit to discard everybody, but who knows, stranger things > have happened. Can you reproduce 2 more times and get sysrq+w each time so I > have a few different outputs to compare, maybe I''m missing something. Thanks, > > Josef >-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Jun 26, 2012 at 11:47:15PM -0600, Stefan Priebe - Profihost AG wrote:> Yes i will do so. Right now i was trying to compare discard with non > discard with this simple command: > for i in `seq 0 1 1000`; do dd if=/dev/zero of=t_$i bs=4M count=1; rm > t_$i; done; > > But i hit a new bug: >Can you gdb btrfs.ko and do list *(btrfs_finish_ordered_io+0x23) and what are your mount options? Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Jun 26, 2012 at 11:47:15PM -0600, Stefan Priebe - Profihost AG wrote:> Yes i will do so. Right now i was trying to compare discard with non > discard with this simple command: > for i in `seq 0 1 1000`; do dd if=/dev/zero of=t_$i bs=4M count=1; rm > t_$i; done; > > But i hit a new bug: >Fixed this one with Btrfs: hold a ref on the inode during writepages Apply that and carry on with your tests :). Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html