Hi everyone, I''ve got a nice set of fixes from Josef, Jan, Ilya and others in my for-linus branch: git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git for-linus Some of the changes are fixes for the tree logging code, so I ran some extra crash runs against them Friday night. I ended up with a new crash in the tree log directory deletion replay code, so I didn''t send out the pull request to Linus. It isn''t clear yet if the new crash is because I was testing differently or if it is a regression. I''m nailing it down this weekend, but please give my for-linus a shot. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> Hi everyone, > > I''ve got a nice set of fixes from Josef, Jan, Ilya and others in my > for-linus branch: > > git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git for-linus > > Some of the changes are fixes for the tree logging code, so I ran some > extra crash runs against them Friday night. > > I ended up with a new crash in the tree log directory deletion replay > code, so I didn''t send out the pull request to Linus. > > It isn''t clear yet if the new crash is because I was testing differently > or if it is a regression. I''m nailing it down this weekend, but please > give my for-linus a shot.With this branch (3.4.0), my test has consistently been hitting the BUG_ON(owner < BTRFS_FIRST_FREE_OBJECTID) in insert_inline_extent_backref [1]. This is followed by a string of other issues [2] and a hard lockup, so I used netconsole to collect this. I''m preparing my btrfs test for xfstests integration, but can slip you it if interested. It hits this case in ~30s. Thanks, Daniel --- [1] kernel BUG at fs/btrfs/extent-tree.c:1769! invalid opcode: 0000 [#1] SMP CPU 0 Modules linked in: brd netconsole dm_crypt dm_mod kvm_intel kvm coretemp microcode uvcvideo videobuf2_core iwlwifi videodev videobuf2_vmalloc videobuf2_memops btrfs i915 cfbcopyarea video cfbimgblt cfbfillrect Pid: 3219, comm: btrfs Not tainted 3.4.0-debug+ #1 Dell Inc. Latitude E5420/0H5TG2 RIP: 0010:[<ffffffffa009b867>] [<ffffffffa009b867>] insert_inline_extent_backref+0xe7/0xf0 [btrfs] RSP: 0018:ffff8801924df8c8 EFLAGS: 00010293 RAX: 0000000000000000 RBX: ffff8801ea7ae3f0 RCX: ffff8801924df910 RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000000 RBP: ffff8801924df948 R08: 0000000000000f4c R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801101e0000 R13: ffff8801e6ed30f0 R14: 0000000000000000 R15: 0000000000000000 FS: 00007f1b3bf80740(0000) GS:ffff88022ec00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 000000000042c430 CR3: 0000000195a05000 CR4: 00000000000407f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process btrfs (pid: 3219, threadinfo ffff8801924de000, task ffff880223e23dc0) Stack: 0000000000000000 0000000000000005 0000000000000000 0000000000000000 ffff880200000001 0000000000000005 ffff8801924df938 ffffffff81110457 ffff8801101e1800 0000000000000f43 ffff8801101e1800 ffff8801ea7ae3f0 Call Trace: [<ffffffff81110457>] ? kmem_cache_alloc+0xe7/0x180 [<ffffffffa009b90a>] __btrfs_inc_extent_ref+0x9a/0x1f0 [btrfs] [<ffffffffa009d0a7>] run_delayed_tree_ref+0x167/0x190 [btrfs] [<ffffffffa00a0f2e>] run_one_delayed_ref+0xde/0xf0 [btrfs] [<ffffffffa00a101d>] run_clustered_refs+0xdd/0x370 [btrfs] [<ffffffffa00a13f9>] btrfs_run_delayed_refs+0x149/0x340 [btrfs] [<ffffffffa00b29c7>] __btrfs_end_transaction+0xa7/0x360 [btrfs] [<ffffffffa00b2cc3>] btrfs_end_transaction_throttle+0x13/0x20 [btrfs] [<ffffffffa00fecc9>] relocate_block_group+0x439/0x560 [btrfs] [<ffffffffa00fefb4>] btrfs_relocate_block_group+0x1c4/0x300 [btrfs] [<ffffffffa00dc84a>] btrfs_relocate_chunk.isra.52+0x4a/0x240 [btrfs] [<ffffffffa00d7592>] ? free_extent_buffer+0x32/0x90 [btrfs] [<ffffffffa00dfb14>] __btrfs_balance+0x2f4/0x3f0 [btrfs] [<ffffffffa00dff03>] btrfs_balance+0x2f3/0x4d0 [btrfs] [<ffffffffa00e5c30>] btrfs_ioctl_balance+0x140/0x290 [btrfs] [<ffffffffa00e96c7>] btrfs_ioctl+0x5c7/0x7f0 [btrfs] [<ffffffff810f2526>] ? do_brk+0x246/0x360 [<ffffffff81130987>] do_vfs_ioctl+0x87/0x340 [<ffffffff8122b894>] ? lockdep_sys_exit_thunk+0x35/0x67 [<ffffffff81130c8a>] sys_ioctl+0x4a/0x80 [<ffffffff815bc622>] system_call_fastpath+0x16/0x1b Code: 89 e6 4c 89 ef 48 8b 4d c8 4c 89 3c 24 48 89 44 24 18 8b 45 28 89 44 24 10 48 8b 45 20 48 89 44 24 08 e8 1d fa ff ff 31 c0 eb a4 <0f> 0b 0f 1f 80 00 00 00 00 55 48 89 e5 48 83 c4 80 48 89 5d d8 RIP [<ffffffffa009b867>] insert_inline_extent_backref+0xe7/0xf0 [btrfs] RSP <ffff8801924df8c8> --- [2] BUG: sleeping function called from invalid context at kernel/rwsem.c:20 in_atomic(): 1, irqs_disabled(): 0, pid: 3219, name: btrfs INFO: lockdep is turned off. Pid: 3219, comm: btrfs Tainted: G D 3.4.0-debug+ #1 Call Trace: [<ffffffff81069ae2>] __might_sleep+0x142/0x240 [<ffffffff815b940f>] down_read+0x1f/0x5c [<ffffffff8105143f>] exit_signals+0x1f/0x130 [<ffffffff81042956>] do_exit+0xb6/0x480 [<ffffffff81005677>] oops_end+0x77/0xb0 [<ffffffff810057f3>] die+0x53/0x80 [<ffffffff81002354>] do_trap+0xc4/0x170 [<ffffffff81002630>] do_invalid_op+0x90/0xb0 [<ffffffffa009b867>] ? insert_inline_extent_backref+0xe7/0xf0 [btrfs] [<ffffffffa009466b>] ? btrfs_search_slot+0x67b/0x760 [btrfs] [<ffffffffa00923ff>] ? btrfs_leaf_free_space+0x5f/0xb0 [btrfs] [<ffffffff8122b85d>] ? trace_hardirqs_off_thunk+0x3a/0x3c [<ffffffff815bbf09>] ? restore_args+0x30/0x30 [<ffffffff815bd695>] invalid_op+0x15/0x20 [<ffffffffa009b867>] ? insert_inline_extent_backref+0xe7/0xf0 [btrfs] [<ffffffffa009b7de>] ? insert_inline_extent_backref+0x5e/0xf0 [btrfs] [<ffffffff81110457>] ? kmem_cache_alloc+0xe7/0x180 [<ffffffffa009b90a>] __btrfs_inc_extent_ref+0x9a/0x1f0 [btrfs] [<ffffffffa009d0a7>] run_delayed_tree_ref+0x167/0x190 [btrfs] [<ffffffffa00a0f2e>] run_one_delayed_ref+0xde/0xf0 [btrfs] [<ffffffffa00a101d>] run_clustered_refs+0xdd/0x370 [btrfs] [<ffffffffa00a13f9>] btrfs_run_delayed_refs+0x149/0x340 [btrfs] [<ffffffffa00b29c7>] __btrfs_end_transaction+0xa7/0x360 [btrfs] [<ffffffffa00b2cc3>] btrfs_end_transaction_throttle+0x13/0x20 [btrfs] [<ffffffffa00fecc9>] relocate_block_group+0x439/0x560 [btrfs] [<ffffffffa00fefb4>] btrfs_relocate_block_group+0x1c4/0x300 [btrfs] [<ffffffffa00dc84a>] btrfs_relocate_chunk.isra.52+0x4a/0x240 [btrfs] [<ffffffffa00d7592>] ? free_extent_buffer+0x32/0x90 [btrfs] [<ffffffffa00dfb14>] __btrfs_balance+0x2f4/0x3f0 [btrfs] [<ffffffffa00dff03>] btrfs_balance+0x2f3/0x4d0 [btrfs] [<ffffffffa00e5c30>] btrfs_ioctl_balance+0x140/0x290 [btrfs] [<ffffffffa00e96c7>] btrfs_ioctl+0x5c7/0x7f0 [btrfs] [<ffffffff810f2526>] ? do_brk+0x246/0x360 [<ffffffff81130987>] do_vfs_ioctl+0x87/0x340 [<ffffffff8122b894>] ? lockdep_sys_exit_thunk+0x35/0x67 [<ffffffff81130c8a>] sys_ioctl+0x4a/0x80 [<ffffffff815bc622>] system_call_fastpath+0x16/0x1b BUG: scheduling while atomic: btrfs/3219/0x10000002 INFO: lockdep is turned off. Modules linked in: brd netconsole dm_crypt dm_mod kvm_intel kvm coretemp microcode uvcvideo videobuf2_core iwlwifi videodev videobuf2_vmalloc videobuf2_memops btrfs i915 cfbcopyarea video cfbimgblt cfbfillrect Pid: 3219, comm: btrfs Tainted: G D 3.4.0-debug+ #1 Call Trace: [<ffffffff815a674a>] __schedule_bug+0x5d/0x61 [<ffffffff815ba0fb>] __schedule+0x8fb/0x9a0 [<ffffffff810055a7>] ? show_trace_log_lvl+0x57/0x70 [<ffffffff810055d0>] ? show_trace+0x10/0x20 [<ffffffff815a469f>] ? dump_stack+0x72/0x7b [<ffffffff8106c4e5>] __cond_resched+0x25/0x40 [<ffffffff815ba21d>] _cond_resched+0x2d/0x40 [<ffffffff815b9414>] down_read+0x24/0x5c [<ffffffff8105143f>] exit_signals+0x1f/0x130 [<ffffffff81042956>] do_exit+0xb6/0x480 [<ffffffff81005677>] oops_end+0x77/0xb0 [<ffffffff810057f3>] die+0x53/0x80 [<ffffffff81002354>] do_trap+0xc4/0x170 [<ffffffff81002630>] do_invalid_op+0x90/0xb0 [<ffffffffa009b867>] ? insert_inline_extent_backref+0xe7/0xf0 [btrfs] [<ffffffffa009466b>] ? btrfs_search_slot+0x67b/0x760 [btrfs] [<ffffffffa00923ff>] ? btrfs_leaf_free_space+0x5f/0xb0 [btrfs] [<ffffffff8122b85d>] ? trace_hardirqs_off_thunk+0x3a/0x3c [<ffffffff815bbf09>] ? restore_args+0x30/0x30 [<ffffffff815bd695>] invalid_op+0x15/0x20 [<ffffffffa009b867>] ? insert_inline_extent_backref+0xe7/0xf0 [btrfs] [<ffffffffa009b7de>] ? insert_inline_extent_backref+0x5e/0xf0 [btrfs] [<ffffffff81110457>] ? kmem_cache_alloc+0xe7/0x180 [<ffffffffa009b90a>] __btrfs_inc_extent_ref+0x9a/0x1f0 [btrfs] [<ffffffffa009d0a7>] run_delayed_tree_ref+0x167/0x190 [btrfs] [<ffffffffa00a0f2e>] run_one_delayed_ref+0xde/0xf0 [btrfs] [<ffffffffa00a101d>] run_clustered_refs+0xdd/0x370 [btrfs] [<ffffffffa00a13f9>] btrfs_run_delayed_refs+0x149/0x340 [btrfs] [<ffffffffa00b29c7>] __btrfs_end_transaction+0xa7/0x360 [btrfs] [<ffffffffa00b2cc3>] btrfs_end_transaction_throttle+0x13/0x20 [btrfs] [<ffffffffa00fecc9>] relocate_block_group+0x439/0x560 [btrfs] [<ffffffffa00fefb4>] btrfs_relocate_block_group+0x1c4/0x300 [btrfs] [<ffffffffa00dc84a>] btrfs_relocate_chunk.isra.52+0x4a/0x240 [btrfs] [<ffffffffa00d7592>] ? free_extent_buffer+0x32/0x90 [btrfs] [<ffffffffa00dfb14>] __btrfs_balance+0x2f4/0x3f0 [btrfs] [<ffffffffa00dff03>] btrfs_balance+0x2f3/0x4d0 [btrfs] [<ffffffffa00e5c30>] btrfs_ioctl_balance+0x140/0x290 [btrfs] [<ffffffffa00e96c7>] btrfs_ioctl+0x5c7/0x7f0 [btrfs] [<ffffffff810f2526>] ? do_brk+0x246/0x360 [<ffffffff81130987>] do_vfs_ioctl+0x87/0x340 [<ffffffff8122b894>] ? lockdep_sys_exit_thunk+0x35/0x67 [<ffffffff81130c8a>] sys_ioctl+0x4a/0x80 [<ffffffff815bc622>] system_call_fastpath+0x16/0x1b note: btrfs[3219] exited with preempt_count 1 -- Daniel J Blueman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 07/02/2012 11:35 AM, Daniel J Blueman wrote:>> Hi everyone, >> >> I''ve got a nice set of fixes from Josef, Jan, Ilya and others in my >> for-linus branch: >> >> git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git for-linus >> >> Some of the changes are fixes for the tree logging code, so I ran some >> extra crash runs against them Friday night. >> >> I ended up with a new crash in the tree log directory deletion replay >> code, so I didn''t send out the pull request to Linus. >> >> It isn''t clear yet if the new crash is because I was testing differently >> or if it is a regression. I''m nailing it down this weekend, but please >> give my for-linus a shot. > > With this branch (3.4.0), my test has consistently been hitting the > BUG_ON(owner < BTRFS_FIRST_FREE_OBJECTID) in > insert_inline_extent_backref [1]. This is followed by a string of > other issues [2] and a hard lockup, so I used netconsole to collect > this. > > I''m preparing my btrfs test for xfstests integration, but can slip you > it if interested. It hits this case in ~30s. >IMO the BUG_ON is meant to avoid to mix ''log tree'' in, it should be: BUG_ON(owner < BTRFS_FIRST_FREE_OBJECTID && root_objectid == BTRFS_TREE_LOG_OBJECTID); This should help you, can you give it a try? diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 4b5a1e1..a006017 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -1766,7 +1766,8 @@ int insert_inline_extent_backref(struct btrfs_trans_handle *trans, bytenr, num_bytes, parent, root_objectid, owner, offset, 1); if (ret == 0) { - BUG_ON(owner < BTRFS_FIRST_FREE_OBJECTID); + BUG_ON(owner < BTRFS_FIRST_FREE_OBJECTID && + root_objectid == BTRFS_TREE_LOG_OBJECTID); update_inline_extent_backref(trans, root, path, iref, refs_to_add, extent_op); } else if (ret == -ENOENT) { thanks, liubo> Thanks, > Daniel > > --- [1] > > kernel BUG at fs/btrfs/extent-tree.c:1769! > invalid opcode: 0000 [#1] SMP > CPU 0 > Modules linked in: brd netconsole dm_crypt dm_mod kvm_intel kvm > coretemp microcode uvcvideo videobuf2_core iwlwifi videodev > videobuf2_vmalloc videobuf2_memops btrfs i915 cfbcopyarea video > cfbimgblt cfbfillrect > > Pid: 3219, comm: btrfs Not tainted 3.4.0-debug+ #1 Dell Inc. Latitude > E5420/0H5TG2 > RIP: 0010:[<ffffffffa009b867>] [<ffffffffa009b867>] > insert_inline_extent_backref+0xe7/0xf0 [btrfs] > RSP: 0018:ffff8801924df8c8 EFLAGS: 00010293 > RAX: 0000000000000000 RBX: ffff8801ea7ae3f0 RCX: ffff8801924df910 > RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000000 > RBP: ffff8801924df948 R08: 0000000000000f4c R09: 0000000000000001 > R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801101e0000 > R13: ffff8801e6ed30f0 R14: 0000000000000000 R15: 0000000000000000 > FS: 00007f1b3bf80740(0000) GS:ffff88022ec00000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > CR2: 000000000042c430 CR3: 0000000195a05000 CR4: 00000000000407f0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > Process btrfs (pid: 3219, threadinfo ffff8801924de000, task ffff880223e23dc0) > Stack: > 0000000000000000 0000000000000005 0000000000000000 0000000000000000 > ffff880200000001 0000000000000005 ffff8801924df938 ffffffff81110457 > ffff8801101e1800 0000000000000f43 ffff8801101e1800 ffff8801ea7ae3f0 > Call Trace: > [<ffffffff81110457>] ? kmem_cache_alloc+0xe7/0x180 > [<ffffffffa009b90a>] __btrfs_inc_extent_ref+0x9a/0x1f0 [btrfs] > [<ffffffffa009d0a7>] run_delayed_tree_ref+0x167/0x190 [btrfs] > [<ffffffffa00a0f2e>] run_one_delayed_ref+0xde/0xf0 [btrfs] > [<ffffffffa00a101d>] run_clustered_refs+0xdd/0x370 [btrfs] > [<ffffffffa00a13f9>] btrfs_run_delayed_refs+0x149/0x340 [btrfs] > [<ffffffffa00b29c7>] __btrfs_end_transaction+0xa7/0x360 [btrfs] > [<ffffffffa00b2cc3>] btrfs_end_transaction_throttle+0x13/0x20 [btrfs] > [<ffffffffa00fecc9>] relocate_block_group+0x439/0x560 [btrfs] > [<ffffffffa00fefb4>] btrfs_relocate_block_group+0x1c4/0x300 [btrfs] > [<ffffffffa00dc84a>] btrfs_relocate_chunk.isra.52+0x4a/0x240 [btrfs] > [<ffffffffa00d7592>] ? free_extent_buffer+0x32/0x90 [btrfs] > [<ffffffffa00dfb14>] __btrfs_balance+0x2f4/0x3f0 [btrfs] > [<ffffffffa00dff03>] btrfs_balance+0x2f3/0x4d0 [btrfs] > [<ffffffffa00e5c30>] btrfs_ioctl_balance+0x140/0x290 [btrfs] > [<ffffffffa00e96c7>] btrfs_ioctl+0x5c7/0x7f0 [btrfs] > [<ffffffff810f2526>] ? do_brk+0x246/0x360 > [<ffffffff81130987>] do_vfs_ioctl+0x87/0x340 > [<ffffffff8122b894>] ? lockdep_sys_exit_thunk+0x35/0x67 > [<ffffffff81130c8a>] sys_ioctl+0x4a/0x80 > [<ffffffff815bc622>] system_call_fastpath+0x16/0x1b > Code: 89 e6 4c 89 ef 48 8b 4d c8 4c 89 3c 24 48 89 44 24 18 8b 45 28 > 89 44 24 10 48 8b 45 20 48 89 44 24 08 e8 1d fa ff ff 31 c0 eb a4 <0f> > 0b 0f 1f 80 00 00 00 00 55 48 89 e5 48 83 c4 80 48 89 5d d8 > RIP [<ffffffffa009b867>] insert_inline_extent_backref+0xe7/0xf0 [btrfs] > RSP <ffff8801924df8c8> > > --- [2] > > BUG: sleeping function called from invalid context at kernel/rwsem.c:20 > in_atomic(): 1, irqs_disabled(): 0, pid: 3219, name: btrfs > INFO: lockdep is turned off. > Pid: 3219, comm: btrfs Tainted: G D 3.4.0-debug+ #1 > Call Trace: > [<ffffffff81069ae2>] __might_sleep+0x142/0x240 > [<ffffffff815b940f>] down_read+0x1f/0x5c > [<ffffffff8105143f>] exit_signals+0x1f/0x130 > [<ffffffff81042956>] do_exit+0xb6/0x480 > [<ffffffff81005677>] oops_end+0x77/0xb0 > [<ffffffff810057f3>] die+0x53/0x80 > [<ffffffff81002354>] do_trap+0xc4/0x170 > [<ffffffff81002630>] do_invalid_op+0x90/0xb0 > [<ffffffffa009b867>] ? insert_inline_extent_backref+0xe7/0xf0 [btrfs] > [<ffffffffa009466b>] ? btrfs_search_slot+0x67b/0x760 [btrfs] > [<ffffffffa00923ff>] ? btrfs_leaf_free_space+0x5f/0xb0 [btrfs] > [<ffffffff8122b85d>] ? trace_hardirqs_off_thunk+0x3a/0x3c > [<ffffffff815bbf09>] ? restore_args+0x30/0x30 > [<ffffffff815bd695>] invalid_op+0x15/0x20 > [<ffffffffa009b867>] ? insert_inline_extent_backref+0xe7/0xf0 [btrfs] > [<ffffffffa009b7de>] ? insert_inline_extent_backref+0x5e/0xf0 [btrfs] > [<ffffffff81110457>] ? kmem_cache_alloc+0xe7/0x180 > [<ffffffffa009b90a>] __btrfs_inc_extent_ref+0x9a/0x1f0 [btrfs] > [<ffffffffa009d0a7>] run_delayed_tree_ref+0x167/0x190 [btrfs] > [<ffffffffa00a0f2e>] run_one_delayed_ref+0xde/0xf0 [btrfs] > [<ffffffffa00a101d>] run_clustered_refs+0xdd/0x370 [btrfs] > [<ffffffffa00a13f9>] btrfs_run_delayed_refs+0x149/0x340 [btrfs] > [<ffffffffa00b29c7>] __btrfs_end_transaction+0xa7/0x360 [btrfs] > [<ffffffffa00b2cc3>] btrfs_end_transaction_throttle+0x13/0x20 [btrfs] > [<ffffffffa00fecc9>] relocate_block_group+0x439/0x560 [btrfs] > [<ffffffffa00fefb4>] btrfs_relocate_block_group+0x1c4/0x300 [btrfs] > [<ffffffffa00dc84a>] btrfs_relocate_chunk.isra.52+0x4a/0x240 [btrfs] > [<ffffffffa00d7592>] ? free_extent_buffer+0x32/0x90 [btrfs] > [<ffffffffa00dfb14>] __btrfs_balance+0x2f4/0x3f0 [btrfs] > [<ffffffffa00dff03>] btrfs_balance+0x2f3/0x4d0 [btrfs] > [<ffffffffa00e5c30>] btrfs_ioctl_balance+0x140/0x290 [btrfs] > [<ffffffffa00e96c7>] btrfs_ioctl+0x5c7/0x7f0 [btrfs] > [<ffffffff810f2526>] ? do_brk+0x246/0x360 > [<ffffffff81130987>] do_vfs_ioctl+0x87/0x340 > [<ffffffff8122b894>] ? lockdep_sys_exit_thunk+0x35/0x67 > [<ffffffff81130c8a>] sys_ioctl+0x4a/0x80 > [<ffffffff815bc622>] system_call_fastpath+0x16/0x1b > > BUG: scheduling while atomic: btrfs/3219/0x10000002 > INFO: lockdep is turned off. > Modules linked in: brd netconsole dm_crypt dm_mod kvm_intel kvm > coretemp microcode uvcvideo videobuf2_core iwlwifi videodev > videobuf2_vmalloc videobuf2_memops btrfs i915 cfbcopyarea video > cfbimgblt cfbfillrect > Pid: 3219, comm: btrfs Tainted: G D 3.4.0-debug+ #1 > Call Trace: > [<ffffffff815a674a>] __schedule_bug+0x5d/0x61 > [<ffffffff815ba0fb>] __schedule+0x8fb/0x9a0 > [<ffffffff810055a7>] ? show_trace_log_lvl+0x57/0x70 > [<ffffffff810055d0>] ? show_trace+0x10/0x20 > [<ffffffff815a469f>] ? dump_stack+0x72/0x7b > [<ffffffff8106c4e5>] __cond_resched+0x25/0x40 > [<ffffffff815ba21d>] _cond_resched+0x2d/0x40 > [<ffffffff815b9414>] down_read+0x24/0x5c > [<ffffffff8105143f>] exit_signals+0x1f/0x130 > [<ffffffff81042956>] do_exit+0xb6/0x480 > [<ffffffff81005677>] oops_end+0x77/0xb0 > [<ffffffff810057f3>] die+0x53/0x80 > [<ffffffff81002354>] do_trap+0xc4/0x170 > [<ffffffff81002630>] do_invalid_op+0x90/0xb0 > [<ffffffffa009b867>] ? insert_inline_extent_backref+0xe7/0xf0 [btrfs] > [<ffffffffa009466b>] ? btrfs_search_slot+0x67b/0x760 [btrfs] > [<ffffffffa00923ff>] ? btrfs_leaf_free_space+0x5f/0xb0 [btrfs] > [<ffffffff8122b85d>] ? trace_hardirqs_off_thunk+0x3a/0x3c > [<ffffffff815bbf09>] ? restore_args+0x30/0x30 > [<ffffffff815bd695>] invalid_op+0x15/0x20 > [<ffffffffa009b867>] ? insert_inline_extent_backref+0xe7/0xf0 [btrfs] > [<ffffffffa009b7de>] ? insert_inline_extent_backref+0x5e/0xf0 [btrfs] > [<ffffffff81110457>] ? kmem_cache_alloc+0xe7/0x180 > [<ffffffffa009b90a>] __btrfs_inc_extent_ref+0x9a/0x1f0 [btrfs] > [<ffffffffa009d0a7>] run_delayed_tree_ref+0x167/0x190 [btrfs] > [<ffffffffa00a0f2e>] run_one_delayed_ref+0xde/0xf0 [btrfs] > [<ffffffffa00a101d>] run_clustered_refs+0xdd/0x370 [btrfs] > [<ffffffffa00a13f9>] btrfs_run_delayed_refs+0x149/0x340 [btrfs] > [<ffffffffa00b29c7>] __btrfs_end_transaction+0xa7/0x360 [btrfs] > [<ffffffffa00b2cc3>] btrfs_end_transaction_throttle+0x13/0x20 [btrfs] > [<ffffffffa00fecc9>] relocate_block_group+0x439/0x560 [btrfs] > [<ffffffffa00fefb4>] btrfs_relocate_block_group+0x1c4/0x300 [btrfs] > [<ffffffffa00dc84a>] btrfs_relocate_chunk.isra.52+0x4a/0x240 [btrfs] > [<ffffffffa00d7592>] ? free_extent_buffer+0x32/0x90 [btrfs] > [<ffffffffa00dfb14>] __btrfs_balance+0x2f4/0x3f0 [btrfs] > [<ffffffffa00dff03>] btrfs_balance+0x2f3/0x4d0 [btrfs] > [<ffffffffa00e5c30>] btrfs_ioctl_balance+0x140/0x290 [btrfs] > [<ffffffffa00e96c7>] btrfs_ioctl+0x5c7/0x7f0 [btrfs] > [<ffffffff810f2526>] ? do_brk+0x246/0x360 > [<ffffffff81130987>] do_vfs_ioctl+0x87/0x340 > [<ffffffff8122b894>] ? lockdep_sys_exit_thunk+0x35/0x67 > [<ffffffff81130c8a>] sys_ioctl+0x4a/0x80 > [<ffffffff815bc622>] system_call_fastpath+0x16/0x1b > note: btrfs[3219] exited with preempt_count 1-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sun, Jul 01, 2012 at 09:35:01PM -0600, Daniel J Blueman wrote:> > Hi everyone, > > > > I''ve got a nice set of fixes from Josef, Jan, Ilya and others in my > > for-linus branch: > > > > git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git for-linus > > > > Some of the changes are fixes for the tree logging code, so I ran some > > extra crash runs against them Friday night. > > > > I ended up with a new crash in the tree log directory deletion replay > > code, so I didn''t send out the pull request to Linus. > > > > It isn''t clear yet if the new crash is because I was testing differently > > or if it is a regression. I''m nailing it down this weekend, but please > > give my for-linus a shot. > > With this branch (3.4.0), my test has consistently been hitting the > BUG_ON(owner < BTRFS_FIRST_FREE_OBJECTID) in > insert_inline_extent_backref [1]. This is followed by a string of > other issues [2] and a hard lockup, so I used netconsole to collect > this. > > I''m preparing my btrfs test for xfstests integration, but can slip you > it if interested. It hits this case in ~30s. >Can you apply this and capture the output, I have a feeling I know what this is. Thanks, Josef diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 5775dc4..917ea70 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -1766,7 +1766,13 @@ int insert_inline_extent_backref(struct btrfs_trans_handle *trans, bytenr, num_bytes, parent, root_objectid, owner, offset, 1); if (ret == 0) { - BUG_ON(owner < BTRFS_FIRST_FREE_OBJECTID); + if (owner < BTRFS_FIRST_FREE_OBJECTID) { + printk(KERN_ERR "bad inline extent, bytenr=%Lu, " + "num_bytes=%Lu, parent=%Lu, root=%Lu, owner=%Lu" + ", offset=%Lu\n", bytenr, num_bytes, parent, + root_objectid, owner, offset); + BUG_ON(owner < BTRFS_FIRST_FREE_OBJECTID); + } update_inline_extent_backref(trans, root, path, iref, refs_to_add, extent_op); } else if (ret == -ENOENT) { -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
David Sterba
2012-Jul-02 14:10 UTC
Re: xfstests/224 lockup/slowdown (was: Please hammer my for-linus branch)
Hi, I''m seeing a machine lockup in xfstests/224, logs attached. Friday''s xfstests round with 3.5-rc4 was ok, all tests passed. The ''dd'' processes are in D-state with this stacktraces 5597 pts/0 D+ 0:00 dd status=noxfer if=/dev/zero of=/mnt/a2/testfile.8 bs=4k conv=notrunc [<ffffffffa001bb3e>] reserve_metadata_bytes+0x33e/0x8f0 [btrfs] [<ffffffffa001cd64>] btrfs_delalloc_reserve_metadata+0x134/0x3b0 [btrfs] [<ffffffffa001d16b>] btrfs_delalloc_reserve_space+0x3b/0x60 [btrfs] [<ffffffffa004132b>] __btrfs_buffered_write+0x17b/0x380 [btrfs] [<ffffffffa0041783>] btrfs_file_aio_write+0x253/0x4e0 [btrfs] [<ffffffff81144892>] do_sync_write+0xe2/0x120 [<ffffffff8114519e>] vfs_write+0xce/0x190 [<ffffffff811454e4>] sys_write+0x54/0xa0 [<ffffffff818b4fa9>] system_call_fastpath+0x16/0x1b [<ffffffffffffffff>] 0xffffffffffffffff and (not sure if there are more) 5666 pts/0 D+ 0:00 dd status=noxfer if=/dev/zero of=/mnt/a2/testfile.6 bs=4k conv=notrunc [<ffffffffa001bb3e>] reserve_metadata_bytes+0x33e/0x8f0 [btrfs] [<ffffffffa001c56a>] btrfs_block_rsv_add+0x3a/0x60 [btrfs] [<ffffffffa003155e>] start_transaction+0x26e/0x330 [btrfs] [<ffffffffa0031903>] btrfs_start_transaction+0x13/0x20 [btrfs] [<ffffffffa003cae0>] btrfs_dirty_inode+0xb0/0xe0 [btrfs] [<ffffffffa003cdad>] btrfs_update_time+0xcd/0x180 [btrfs] [<ffffffffa00416f8>] btrfs_file_aio_write+0x1c8/0x4e0 [btrfs] [<ffffffff81144892>] do_sync_write+0xe2/0x120 [<ffffffff8114519e>] vfs_write+0xce/0x190 [<ffffffff811454e4>] sys_write+0x54/0xa0 [<ffffffff818b4fa9>] system_call_fastpath+0x16/0x1b all btrfs kernel threads are idle. Mount options: -o space_cache Mkfs: fresh, default options # btrfs fi df /mnt/a2 System: total=4.00MiB, used=4.00KiB Data+Metadata: total=1020.00MiB, used=987.32MiB [meanwhile] While grabbing lockdep stats the test respawned 224 236s ... [14:57:42] [15:46:56] 2954s but there was no disk activity, I wonder if touching /proc/lockdep or /proc/lock_stat is affecting this. Finishing this report anyway, and will redo the tests again. Looking again into the logs, the first process snapshot (only D-state processes) is much longer than process snapshot of containing all, unfortuntelly I don''t have timestamps recorded, but this suggests that it''s very slowly going on, so slowly that I considered it stalled looking at the io graphs. david
David Sterba
2012-Jul-02 14:34 UTC
Re: xfstests/224 lockup/slowdown (was: Please hammer my for-linus branch)
On Mon, Jul 02, 2012 at 04:10:52PM +0200, David Sterba wrote:> Finishing this report anyway, and will redo the tests again. > > Looking again into the logs, the first process snapshot (only D-state > processes) is much longer than process snapshot of containing all, > unfortuntelly I don''t have timestamps recorded, but this suggests that it''s > very slowly going on, so slowly that I considered it stalled looking at the > io graphs.Fresh build, reboot, and single xfstests/224 run: during first ~20 seconds, there''s high write activity, ie. file setup, then it goes to a "few tens-to-hundreds of KB every 4 seconds". Cpu is idle, sample output from dstat: ----total-cpu-usage---- --dsk/sda9- ---system-- usr sys idl wai hiq siq| read writ| int csw 1 1 99 0 0 0| 0 0 | 923 1856 0 1 98 0 1 0| 0 8192B| 904 2796 0 1 99 0 0 0| 0 0 | 945 1914 1 1 98 0 0 0| 0 0 | 899 1849 1 1 98 0 0 1| 0 0 | 906 1848 0 3 97 0 0 0| 0 20k| 901 3740 0 0 100 0 0 0| 0 0 | 905 1851 1 1 98 0 0 1| 0 0 | 946 1917 0 1 99 0 0 0| 0 0 | 904 1858 0 1 99 0 0 0| 0 8192B| 907 2805 1 1 98 0 0 1| 0 0 | 891 1836 0 1 99 0 0 0| 0 0 | 900 1847 0 1 99 0 0 0| 0 0 | 940 1905 1 4 95 0 0 0| 0 32k| 904 5153 1 2 97 0 0 0| 0 36k| 913 4240 0 1 99 0 0 0| 0 0 | 907 1849 0 1 99 0 0 0| 0 0 | 908 1852 1 1 98 0 0 1| 0 0 | 933 1901 1 2 98 0 0 0| 0 8192B| 916 2808 0 1 99 0 0 0| 0 0 | 917 1843 0 1 99 0 0 1| 0 0 | 908 1844 1 1 99 0 0 0| 0 0 | 905 1860 0 5 95 0 0 0| 0 36k| 943 7565 1 1 99 0 0 0| 0 0 | 911 1861 0 1 99 0 0 0| 0 0 | 910 1852 1 1 98 0 0 0| 0 0 | 944 1878 1 2 97 0 0 1| 0 16k| 898 3753 0 9 87 4 0 1| 0 1020k|1035 11k 0 19 74 7 0 1| 0 2092k|3052 24k 0 1 99 0 0 0| 0 0 | 909 1851 1 1 98 0 0 1| 0 0 | 915 1856 1 1 99 0 0 0| 0 0 | 896 1847 0 2 98 0 0 0| 0 8192B| 931 2847 0 1 99 0 0 0| 0 0 | 899 1850 1 1 98 0 0 1| 0 0 | 896 1861 0 1 99 0 0 0| 0 0 | 911 1855 1 5 94 0 0 0| 0 28k| 891 6521 0 9 87 3 0 1| 0 1100k| 963 11k 0 1 99 0 0 0| 0 0 | 905 1857 1 1 99 0 0 0| 0 0 | 895 1851 1 1 98 0 0 0| 0 0 | 911 1852 0 7 88 4 0 1| 0 700k| 911 8533 0 1 99 0 0 0| 0 0 | 940 1905 1 1 99 0 0 0| 0 0 | 912 1851 1 1 99 0 0 0| 0 0 | 895 1851 0 10 89 0 0 1| 0 100k| 912 13k and repeats more or less the same. Bisection in progress. david -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
David Sterba
2012-Jul-02 16:10 UTC
Re: xfstests/224 lockup/slowdown (was: Please hammer my for-linus branch)
On Mon, Jul 02, 2012 at 04:34:53PM +0200, David Sterba wrote:> Bisection in progress.commit cae76522b19735c576803bec273f49062aa418ab Author: Josef Bacik <jbacik@fusionio.com> Date: Thu Jun 21 14:05:49 2012 -0400 Btrfs: flush delayed inodes if we''re short on space Those crazy gentoo guys have been complaining about ENOSPC errors on their portage volumes. This is because doing things like untar tends to create lots of new files which will soak up all the reservation space in the delayed inodes. Usually this gets papered over by the fact that we will try and commit the transaction, however if this happens in the wrong spot or we choose not to commit the transaction you will be screwed. So add the ability to expclitly flush delayed inodes to free up space. Please test this out guys to make sure it works since as usual I cannot reproduce. Thanks, -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, Jun 30, 2012 at 09:22:59PM -0400, Chris Mason wrote:> Hi everyone, > > I''ve got a nice set of fixes from Josef, Jan, Ilya and others in my > for-linus branch: > > git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git for-linus > > Some of the changes are fixes for the tree logging code, so I ran some > extra crash runs against them Friday night. > > I ended up with a new crash in the tree log directory deletion replay > code, so I didn''t send out the pull request to Linus. > > It isn''t clear yet if the new crash is because I was testing differently > or if it is a regression. I''m nailing it down this weekend, but please > give my for-linus a shot.Ok, I''ve just rebased for-linus. I''ve dropped Josef''s enospc patch, which should fix the regression Dave hit. I''ve also added a fix for my log replay crash, which was definitely an old bug. The delayed directory operations were queuing up the changes made during replay, and it was confusing the replay code. Looks like there''s a fix pending from Liu Bo, but I''ll let Daniel test that before pulling it in as well. Thanks everyone. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 2 July 2012 21:34, Josef Bacik <jbacik@fusionio.com> wrote:> On Sun, Jul 01, 2012 at 09:35:01PM -0600, Daniel J Blueman wrote: >> > Hi everyone, >> > >> > I''ve got a nice set of fixes from Josef, Jan, Ilya and others in my >> > for-linus branch: >> > >> > git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git for-linus >> > >> > Some of the changes are fixes for the tree logging code, so I ran some >> > extra crash runs against them Friday night. >> > >> > I ended up with a new crash in the tree log directory deletion replay >> > code, so I didn''t send out the pull request to Linus. >> > >> > It isn''t clear yet if the new crash is because I was testing differently >> > or if it is a regression. I''m nailing it down this weekend, but please >> > give my for-linus a shot. >> >> With this branch (3.4.0), my test has consistently been hitting the >> BUG_ON(owner < BTRFS_FIRST_FREE_OBJECTID) in >> insert_inline_extent_backref [1]. This is followed by a string of >> other issues [2] and a hard lockup, so I used netconsole to collect >> this. >> >> I''m preparing my btrfs test for xfstests integration, but can slip you >> it if interested. It hits this case in ~30s. >> > > Can you apply this and capture the output, I have a feeling I know what this is. > Thanks, > > Josef > > diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c > index 5775dc4..917ea70 100644 > --- a/fs/btrfs/extent-tree.c > +++ b/fs/btrfs/extent-tree.c > @@ -1766,7 +1766,13 @@ int insert_inline_extent_backref(struct btrfs_trans_handle *trans, > bytenr, num_bytes, parent, > root_objectid, owner, offset, 1); > if (ret == 0) { > - BUG_ON(owner < BTRFS_FIRST_FREE_OBJECTID); > + if (owner < BTRFS_FIRST_FREE_OBJECTID) { > + printk(KERN_ERR "bad inline extent, bytenr=%Lu, " > + "num_bytes=%Lu, parent=%Lu, root=%Lu, owner=%Lu" > + ", offset=%Lu\n", bytenr, num_bytes, parent, > + root_objectid, owner, offset); > + BUG_ON(owner < BTRFS_FIRST_FREE_OBJECTID); > + } > update_inline_extent_backref(trans, root, path, iref, > refs_to_add, extent_op); > } else if (ret == -ENOENT) {Bo''s additional condition ''root_objectid == BTRFS_TREE_LOG_OBJECTID'' seemed to hold it off. Here is the debug you asked for [1]. After we''ve determined the right fix for this issue, I''ll post the other issues I was seeing. Thanks! Daniel --- [1] device fsid c5cf90d4-0301-4877-8f34-e8e82fe6ab0a devid 1 transid 3 /dev/ram3 device fsid c5cf90d4-0301-4877-8f34-e8e82fe6ab0a devid 2 transid 3 /dev/ram0 device fsid c5cf90d4-0301-4877-8f34-e8e82fe6ab0a devid 1 transid 4 /dev/ram3 btrfs: allowing degraded mounts btrfs: force zlib compression btrfs: disabling disk space caching btrfs: enabling auto defrag btrfs: enabling auto recovery btrfs: no dev_stats entry found for device /dev/ram0 (devid 2) (OK on first mount after mkfs) btrfs: no dev_stats entry found for device /dev/ram3 (devid 1) (OK on first mount after mkfs) btrfs: relocating block group 512425984 flags 20 btrfs: found 2 extents btrfs: relocating block group 190382080 flags 9 btrfs: found 4756 extents btrfs: found 4756 extents bad inline extent, bytenr=36909056, num_bytes=4096, parent=0, root=5, owner=0, offset=0 ------------[ cut here ]------------ kernel BUG at fs/btrfs/extent-tree.c:1774! invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC CPU 0 Modules linked in: brd dm_crypt dm_mod kvm_intel kvm uvcvideo videobuf2_core videodev videobuf2_vmalloc videobuf2_memops coretemp microcode iwlwifi netconsole btrfs i915 cfbcopyarea cfbimgblt cfbfillrect video Pid: 8055, comm: btrfs-endio-wri Not tainted 3.4.0-debug+ #5 Dell Inc. Latitude E5420/0H5TG2 RIP: 0010:[<ffffffffa009685b>] [<ffffffffa009685b>] insert_inline_extent_backref+0x11b/0x120 [btrfs] RSP: 0018:ffff880200415a40 EFLAGS: 00010282 RAX: 000000000000006d RBX: ffff88020e99c1b0 RCX: 0000000000000000 RDX: ffffffff8103cde5 RSI: 0000000000000001 RDI: ffffffff8103d170 RBP: ffff880200415ac0 R08: 0000000000000002 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffff88020e7f7000 R13: ffff88020ef80e60 R14: 0000000000000000 R15: 0000000000001000 FS: 0000000000000000(0000) GS:ffff88022ec00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00007f6916262000 CR3: 0000000221e24000 CR4: 00000000000407f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process btrfs-endio-wri (pid: 8055, threadinfo ffff880200414000, task ffff880222029ee0) Stack: 0000000000000000 0000000000000005 0000000000000000 0000000000000000 ffff880200000001 ffffffffa0089be5 ffff880200415aa0 0000000002333000 ffff8801f6f15000 0000000000000ea1 ffff8801f6f15000 ffff88020e99c1b0 Call Trace: [<ffffffffa0089be5>] ? btrfs_alloc_path+0x15/0x20 [btrfs] [<ffffffffa00968fa>] __btrfs_inc_extent_ref+0x9a/0x1f0 [btrfs] [<ffffffffa0098097>] run_delayed_tree_ref+0x167/0x190 [btrfs] [<ffffffffa009bf1e>] run_one_delayed_ref+0xde/0xf0 [btrfs] [<ffffffffa009c00d>] run_clustered_refs+0xdd/0x370 [btrfs] [<ffffffffa009c3e9>] btrfs_run_delayed_refs+0x149/0x340 [btrfs] [<ffffffffa00adcb7>] __btrfs_end_transaction+0xa7/0x360 [btrfs] [<ffffffffa00adfd0>] btrfs_end_transaction+0x10/0x20 [btrfs] [<ffffffffa00b4985>] btrfs_finish_ordered_io+0x185/0x3b0 [btrfs] [<ffffffff81095bad>] ? trace_hardirqs_on+0xd/0x10 [<ffffffffa00b4bc0>] finish_ordered_fn+0x10/0x20 [btrfs] [<ffffffffa00de6d6>] worker_loop+0x86/0x330 [btrfs] [<ffffffffa00de650>] ? check_pending_worker_creates.isra.1+0xd0/0xd0 [btrfs] [<ffffffff8105e2ee>] kthread+0x8e/0xa0 [<ffffffff815b9314>] kernel_thread_helper+0x4/0x10 [<ffffffff81069a77>] ? finish_task_switch+0x77/0x100 [<ffffffff815b754b>] ? _raw_spin_unlock_irq+0x2b/0x50 [<ffffffff815b79d9>] ? retint_restore_args+0xe/0xe [<ffffffff8105e260>] ? __init_kthread_worker+0x70/0x70 [<ffffffff815b9310>] ? gs_change+0xb/0xb Code: c0 eb a4 48 8b 45 20 4c 89 f1 48 c7 c7 58 d4 10 a0 4c 8b 4d 18 4c 89 fa 4c 8b 45 10 48 8b 75 b8 48 89 04 24 31 c0 e8 32 b3 50 e1 f>0b 0f 1f 00 55 48 89 e5 48 83 c4 80 48 89 5d d8 4c 89 65 e0 RIP [<ffffffffa009685b>] insert_inline_extent_backref+0x11b/0x120 [btrfs] RSP <ffff880200415a40> -- Daniel J Blueman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Jul 02, 2012 at 04:17:37PM -0400, Chris Mason wrote:> Ok, I''ve just rebased for-linus. I''ve dropped Josef''s enospc patch, > which should fix the regression Dave hit.JFYI, fixed. No other problems observed so far. david -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> Hi everyone, > > I''ve got a nice set of fixes from Josef, Jan, Ilya and others in my > for-linus branch: > > git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git for-linus > > Some of the changes are fixes for the tree logging code, so I ran some > extra crash runs against them Friday night. > > I ended up with a new crash in the tree log directory deletion replay > code, so I didn''t send out the pull request to Linus. > > It isn''t clear yet if the new crash is because I was testing differently > or if it is a regression. I''m nailing it down this weekend, but please > give my for-linus a shot.I consistently run into this assertion [1] while running a fio workload on a fresh RAID10 filesystem with a balance running. Let me know if you need steps to reproduce, debug etc. Thanks, Daniel --- [1] kernel BUG at fs/btrfs/extent-tree.c:1728! invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC CPU 1 Modules linked in: brd dm_crypt dm_mod kvm_intel kvm binfmt_misc coretemp microcode uvcvideo videobuf2_core videodev videobuf2_vmalloc videobuf2_memops iwlwifi netconsole btrfs i915 cfbcopyarea cfbimgblt cfbfillrect video Pid: 31436, comm: btrfs Tainted: G W 3.4.0-debug+ #6 Dell Inc. Latitude E5420/0H5TG2 RIP: 0010:[<ffffffffa00ad739>] [<ffffffffa00ad739>] update_inline_extent_backref+0x2a9/0x2b0 [btrfs] RSP: 0018:ffff88021dfab858 EFLAGS: 00010213 RAX: 00000000000000b0 RBX: ffff8802061555a0 RCX: ffff88021cf1d000 RDX: 0000000000000000 RSI: 0000000000000f3a RDI: ffff8800c4e5bc20 RBP: ffff88021dfab8b8 R08: 00000000000000b0 R09: ffff88021dfab808 R10: 0000000000000000 R11: 0000000000000000 R12: ffff8800c4e5bc20 R13: 0000000000000001 R14: 0000000000000001 R15: 0000000000000f10 FS: 00007fdb25012740(0000) GS:ffff88022ec40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00007f882b763f70 CR3: 000000021d547000 CR4: 00000000000407e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process btrfs (pid: 31436, threadinfo ffff88021dfaa000, task ffff88021fdc5ca0) Stack: ffff8801ff3ed000 000000098123124f ffff8801ff3ed000 ffff8802083270a0 0000000000000000 0000000000000f3a ffff880206156000 ffff8802061555a0 ffff8801ff3ed000 ffff8802083270a0 0000000000000000 0000000000000000 Call Trace: [<ffffffffa00ad7c8>] insert_inline_extent_backref+0x88/0x100 [btrfs] [<ffffffffa00a0be5>] ? btrfs_alloc_path+0x15/0x20 [btrfs] [<ffffffffa00ad8da>] __btrfs_inc_extent_ref+0x9a/0x1f0 [btrfs] [<ffffffffa00af077>] run_delayed_tree_ref+0x167/0x190 [btrfs] [<ffffffffa00b2efe>] run_one_delayed_ref+0xde/0xf0 [btrfs] [<ffffffffa00b2fed>] run_clustered_refs+0xdd/0x370 [btrfs] [<ffffffffa00b33c9>] btrfs_run_delayed_refs+0x149/0x340 [btrfs] [<ffffffffa00c4c97>] __btrfs_end_transaction+0xa7/0x360 [btrfs] [<ffffffffa00c4f93>] btrfs_end_transaction_throttle+0x13/0x20 [btrfs] [<ffffffffa01114e9>] relocate_block_group+0x439/0x560 [btrfs] [<ffffffffa01117d4>] btrfs_relocate_block_group+0x1c4/0x300 [btrfs] [<ffffffffa00eea4a>] btrfs_relocate_chunk.isra.52+0x4a/0x240 [btrfs] [<ffffffffa00e9722>] ? free_extent_buffer+0x32/0x90 [btrfs] [<ffffffffa00f1db4>] __btrfs_balance+0x2f4/0x3f0 [btrfs] [<ffffffffa00f21a3>] btrfs_balance+0x2f3/0x4d0 [btrfs] [<ffffffffa00f7f30>] btrfs_ioctl_balance+0x140/0x440 [btrfs] [<ffffffffa00fbd67>] btrfs_ioctl+0x5c7/0x7f0 [btrfs] [<ffffffff810f1616>] ? do_brk+0x246/0x360 [<ffffffff8112f607>] do_vfs_ioctl+0x87/0x340 [<ffffffff8122a434>] ? lockdep_sys_exit_thunk+0x35/0x67 [<ffffffff8112f90a>] sys_ioctl+0x4a/0x80 [<ffffffff815b8122>] system_call_fastpath+0x16/0x1b Code:e8 5d f6 02 00 45 31 c9 48 8b 4d a0 89 c2 44 8b 45 a8 eb b5 66 0f 1f 44 00 00 41 bd 0d 00 00 00 00 41 be 0d 00 00 00 00 e9 6b fe ff ff f>0b 0f 0b 0f 1f 00 55 48 89 e5 48 83 c4 80 48 8b 45 20 4c 89 RIP [<ffffffffa00ad739>] update_inline_extent_backref+0x2a9/0x2b0 [btrfs] RSP <ffff88021dfab858> -- Daniel J Blueman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 07/04/2012 11:37 AM, Daniel J Blueman wrote:>> Hi everyone, >> >> I''ve got a nice set of fixes from Josef, Jan, Ilya and others in my >> for-linus branch: >> >> git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git for-linus >> >> Some of the changes are fixes for the tree logging code, so I ran some >> extra crash runs against them Friday night. >> >> I ended up with a new crash in the tree log directory deletion replay >> code, so I didn''t send out the pull request to Linus. >> >> It isn''t clear yet if the new crash is because I was testing differently >> or if it is a regression. I''m nailing it down this weekend, but please >> give my for-linus a shot. > > I consistently run into this assertion [1] while running a fio > workload on a fresh RAID10 filesystem with a balance running. > > Let me know if you need steps to reproduce, debug etc. >Seems that additional condition does not catch the bug. Plz show us the steps to reproduce, I''ll try to reproduce it locally and nail it down. thanks, liubo> Thanks, > Daniel > > --- [1] > > kernel BUG at fs/btrfs/extent-tree.c:1728! > invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC CPU 1 > > Modules linked in: brd dm_crypt dm_mod kvm_intel kvm binfmt_misc > coretemp microcode uvcvideo videobuf2_core videodev videobuf2_vmalloc > videobuf2_memops iwlwifi netconsole btrfs i915 cfbcopyarea cfbimgblt > cfbfillrect video > > Pid: 31436, comm: btrfs Tainted: G W 3.4.0-debug+ #6 Dell > Inc. Latitude E5420/0H5TG2 > RIP: 0010:[<ffffffffa00ad739>] [<ffffffffa00ad739>] > update_inline_extent_backref+0x2a9/0x2b0 [btrfs] > RSP: 0018:ffff88021dfab858 EFLAGS: 00010213 > RAX: 00000000000000b0 RBX: ffff8802061555a0 RCX: ffff88021cf1d000 > RDX: 0000000000000000 RSI: 0000000000000f3a RDI: ffff8800c4e5bc20 > RBP: ffff88021dfab8b8 R08: 00000000000000b0 R09: ffff88021dfab808 > R10: 0000000000000000 R11: 0000000000000000 R12: ffff8800c4e5bc20 > R13: 0000000000000001 R14: 0000000000000001 R15: 0000000000000f10 > FS: 00007fdb25012740(0000) GS:ffff88022ec40000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > CR2: 00007f882b763f70 CR3: 000000021d547000 CR4: 00000000000407e0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > Process btrfs (pid: 31436, threadinfo ffff88021dfaa000, task ffff88021fdc5ca0) > Stack: > ffff8801ff3ed000 000000098123124f ffff8801ff3ed000 ffff8802083270a0 > 0000000000000000 0000000000000f3a ffff880206156000 ffff8802061555a0 > ffff8801ff3ed000 ffff8802083270a0 0000000000000000 0000000000000000 > Call Trace: > [<ffffffffa00ad7c8>] insert_inline_extent_backref+0x88/0x100 [btrfs] > [<ffffffffa00a0be5>] ? btrfs_alloc_path+0x15/0x20 [btrfs] > [<ffffffffa00ad8da>] __btrfs_inc_extent_ref+0x9a/0x1f0 [btrfs] > [<ffffffffa00af077>] run_delayed_tree_ref+0x167/0x190 [btrfs] > [<ffffffffa00b2efe>] run_one_delayed_ref+0xde/0xf0 [btrfs] > [<ffffffffa00b2fed>] run_clustered_refs+0xdd/0x370 [btrfs] > [<ffffffffa00b33c9>] btrfs_run_delayed_refs+0x149/0x340 [btrfs] > [<ffffffffa00c4c97>] __btrfs_end_transaction+0xa7/0x360 [btrfs] > [<ffffffffa00c4f93>] btrfs_end_transaction_throttle+0x13/0x20 [btrfs] > [<ffffffffa01114e9>] relocate_block_group+0x439/0x560 [btrfs] > [<ffffffffa01117d4>] btrfs_relocate_block_group+0x1c4/0x300 [btrfs] > [<ffffffffa00eea4a>] btrfs_relocate_chunk.isra.52+0x4a/0x240 [btrfs] > [<ffffffffa00e9722>] ? free_extent_buffer+0x32/0x90 [btrfs] > [<ffffffffa00f1db4>] __btrfs_balance+0x2f4/0x3f0 [btrfs] > [<ffffffffa00f21a3>] btrfs_balance+0x2f3/0x4d0 [btrfs] > [<ffffffffa00f7f30>] btrfs_ioctl_balance+0x140/0x440 [btrfs] > [<ffffffffa00fbd67>] btrfs_ioctl+0x5c7/0x7f0 [btrfs] > [<ffffffff810f1616>] ? do_brk+0x246/0x360 > [<ffffffff8112f607>] do_vfs_ioctl+0x87/0x340 > [<ffffffff8122a434>] ? lockdep_sys_exit_thunk+0x35/0x67 > [<ffffffff8112f90a>] sys_ioctl+0x4a/0x80 > [<ffffffff815b8122>] system_call_fastpath+0x16/0x1b > Code:e8 5d f6 02 00 45 31 c9 48 8b 4d a0 89 c2 44 8b 45 a8 eb b5 66 0f > 1f 44 00 00 41 bd 0d 00 00 00 00 41 be 0d 00 00 00 00 e9 6b fe ff ff > f>0b 0f 0b 0f 1f 00 55 48 89 e5 48 83 c4 80 48 8b 45 20 4c 89 > RIP [<ffffffffa00ad739>] update_inline_extent_backref+0x2a9/0x2b0 [btrfs] > RSP <ffff88021dfab858>-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 4 July 2012 13:19, Liu Bo <liubo2009@cn.fujitsu.com> wrote:> On 07/04/2012 11:37 AM, Daniel J Blueman wrote: >>> Hi everyone, >>> >>> I''ve got a nice set of fixes from Josef, Jan, Ilya and others in my >>> for-linus branch: >>> >>> git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git for-linus >>> >>> Some of the changes are fixes for the tree logging code, so I ran some >>> extra crash runs against them Friday night. >>> >>> I ended up with a new crash in the tree log directory deletion replay >>> code, so I didn''t send out the pull request to Linus. >>> >>> It isn''t clear yet if the new crash is because I was testing differently >>> or if it is a regression. I''m nailing it down this weekend, but please >>> give my for-linus a shot. >> >> I consistently run into this assertion [1] while running a fio >> workload on a fresh RAID10 filesystem with a balance running. >> >> Let me know if you need steps to reproduce, debug etc. > > Seems that additional condition does not catch the bug. > > Plz show us the steps to reproduce, I''ll try to reproduce it locally and nail it down.The reproducer auto-generated from my test [1] consistently hits the spot here; config @ http://quora.org/2012/kconfig-btrfs . You''ll need the fio workload file [2] in the same dir. Thanks, Daniel --- [1] #!/bin/bash -ex modprobe brd rd_size=1572864 rd_nr=4 # or use kernel param: ramdisk_size=1572864 mkdir -p /tmp/btrfsathon sync mkfs.btrfs -m raid1 -d raid1 -l 4096 -n 4096 /dev/ram2 /dev/ram3 /dev/ram1 mount /dev/ram1 /tmp/btrfsathon -o nodatacow,autodefrag,ssd,flushoncommit btrfs filesystem defragment /tmp/btrfsathon ||: & sleep 0.017 fio --timeout=60 ./workload ||: & sleep 0.000 btrfs filesystem defragment /tmp/btrfsathon ||: & sleep 0.012 btrfs filesystem defragment /tmp/btrfsathon ||: & sleep 0.010 btrfs filesystem defragment /tmp/btrfsathon ||: & sleep 0.003 btrfs filesystem defragment /tmp/btrfsathon ||: & sleep 0.003 btrfs filesystem balance /tmp/btrfsathon ||: & sleep 0.003 fio --timeout=60 ./workload ||: & sleep 0.000 wait umount /tmp/btrfsathon --- [2] ''workload'' [global] directory=/tmp/btrfsathon rw=randread size=128m ioengine=libaio iodepth=32 invalidate=1 direct=1 [bgwriter] rw=randwrite iodepth=32 [queryA] iodepth=2 ioengine=mmap direct=0 thinktime=1 [queryB] iodepth=2 ioengine=mmap direct=0 thinktime=1 [bgupdater] rw=randrw iodepth=32 size=64m -- Daniel J Blueman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jul 04, 2012 at 12:53:54AM -0600, Daniel J Blueman wrote:> On 4 July 2012 13:19, Liu Bo <liubo2009@cn.fujitsu.com> wrote: > > On 07/04/2012 11:37 AM, Daniel J Blueman wrote: > >>> Hi everyone, > >>> > >>> I''ve got a nice set of fixes from Josef, Jan, Ilya and others in my > >>> for-linus branch: > >>> > >>> git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git for-linus > >>> > >>> Some of the changes are fixes for the tree logging code, so I ran some > >>> extra crash runs against them Friday night. > >>> > >>> I ended up with a new crash in the tree log directory deletion replay > >>> code, so I didn''t send out the pull request to Linus. > >>> > >>> It isn''t clear yet if the new crash is because I was testing differently > >>> or if it is a regression. I''m nailing it down this weekend, but please > >>> give my for-linus a shot. > >> > >> I consistently run into this assertion [1] while running a fio > >> workload on a fresh RAID10 filesystem with a balance running. > >> > >> Let me know if you need steps to reproduce, debug etc. > > > > Seems that additional condition does not catch the bug. > > > > Plz show us the steps to reproduce, I''ll try to reproduce it locally and nail it down. > > The reproducer auto-generated from my test [1] consistently hits the > spot here; config @ http://quora.org/2012/kconfig-btrfs . You''ll need > the fio workload file [2] in the same dir. >Wow I hit this straight away, I will look into it, thanks! Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jul 04, 2012 at 12:53:54AM -0600, Daniel J Blueman wrote:> On 4 July 2012 13:19, Liu Bo <liubo2009@cn.fujitsu.com> wrote: > > On 07/04/2012 11:37 AM, Daniel J Blueman wrote: > >>> Hi everyone, > >>> > >>> I''ve got a nice set of fixes from Josef, Jan, Ilya and others in my > >>> for-linus branch: > >>> > >>> git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git for-linus > >>> > >>> Some of the changes are fixes for the tree logging code, so I ran some > >>> extra crash runs against them Friday night. > >>> > >>> I ended up with a new crash in the tree log directory deletion replay > >>> code, so I didn''t send out the pull request to Linus. > >>> > >>> It isn''t clear yet if the new crash is because I was testing differently > >>> or if it is a regression. I''m nailing it down this weekend, but please > >>> give my for-linus a shot. > >> > >> I consistently run into this assertion [1] while running a fio > >> workload on a fresh RAID10 filesystem with a balance running. > >> > >> Let me know if you need steps to reproduce, debug etc. > > > > Seems that additional condition does not catch the bug. > > > > Plz show us the steps to reproduce, I''ll try to reproduce it locally and nail it down. > > The reproducer auto-generated from my test [1] consistently hits the > spot here; config @ http://quora.org/2012/kconfig-btrfs . You''ll need > the fio workload file [2] in the same dir. >Well that was a huge pain in the ass, you are going to have to tell me how to fix this Arne or fix it yourself. The problem was introduced here 00f04b88791ff49dc64ada18819d40a5b0671709 The problem is we no longer merge delayed refs on the fs trees anymore, and somehow we end up with this sequence of events alloc block add backref for some random block remove implicit backref add implicit backref back <-- I''m not entirely sure why/how this happens, I just assume its some relocate magic run refs because we do the sequence thing we go to add the implicit backref and panic because we find there is one already there, and that''s not supposed to happen with tree blocks. If we had run the remove first we would have been fine or if we had just merged the delayed refs they would have cancelled each other out and we would have been fine. In order to test this theory I took the seq comparisons out of comp_entry in delayed-refs.c and the test has been running for about 20 minutes, before it would die in less than 30 seconds. So why is this needed? I assume you need it for something, but I figure its easier for you to fix this than for me to go figure out what it''s used for. Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 2 July 2012 12:20, Liu Bo <liubo2009@cn.fujitsu.com> wrote:> On 07/02/2012 11:35 AM, Daniel J Blueman wrote: > >>> Hi everyone, >>> >>> I''ve got a nice set of fixes from Josef, Jan, Ilya and others in my >>> for-linus branch: >>> >>> git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git for-linus >>> >>> Some of the changes are fixes for the tree logging code, so I ran some >>> extra crash runs against them Friday night. >>> >>> I ended up with a new crash in the tree log directory deletion replay >>> code, so I didn''t send out the pull request to Linus. >>> >>> It isn''t clear yet if the new crash is because I was testing differently >>> or if it is a regression. I''m nailing it down this weekend, but please >>> give my for-linus a shot. >> >> With this branch (3.4.0), my test has consistently been hitting the >> BUG_ON(owner < BTRFS_FIRST_FREE_OBJECTID) in >> insert_inline_extent_backref [1]. This is followed by a string of >> other issues [2] and a hard lockup, so I used netconsole to collect >> this. >> >> I''m preparing my btrfs test for xfstests integration, but can slip you >> it if interested. It hits this case in ~30s. >> > > > IMO the BUG_ON is meant to avoid to mix ''log tree'' in, it should be: > > BUG_ON(owner < BTRFS_FIRST_FREE_OBJECTID && root_objectid == BTRFS_TREE_LOG_OBJECTID); > > This should help you, can you give it a try?Bo, this did address the assertion I was tripping, so looks good from here; it allowed me to report the second (different) assertion of course. If you still think the fix is sound, is it a good idea for 3.5-rc7? Thanks, Daniel -- Daniel J Blueman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 07/10/2012 08:18 PM, Daniel J Blueman wrote:> On 2 July 2012 12:20, Liu Bo <liubo2009@cn.fujitsu.com> wrote: >> On 07/02/2012 11:35 AM, Daniel J Blueman wrote: >> >>>> Hi everyone, >>>> >>>> I''ve got a nice set of fixes from Josef, Jan, Ilya and others in my >>>> for-linus branch: >>>> >>>> git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git for-linus >>>> >>>> Some of the changes are fixes for the tree logging code, so I ran some >>>> extra crash runs against them Friday night. >>>> >>>> I ended up with a new crash in the tree log directory deletion replay >>>> code, so I didn''t send out the pull request to Linus. >>>> >>>> It isn''t clear yet if the new crash is because I was testing differently >>>> or if it is a regression. I''m nailing it down this weekend, but please >>>> give my for-linus a shot. >>> With this branch (3.4.0), my test has consistently been hitting the >>> BUG_ON(owner < BTRFS_FIRST_FREE_OBJECTID) in >>> insert_inline_extent_backref [1]. This is followed by a string of >>> other issues [2] and a hard lockup, so I used netconsole to collect >>> this. >>> >>> I''m preparing my btrfs test for xfstests integration, but can slip you >>> it if interested. It hits this case in ~30s. >>> >> >> IMO the BUG_ON is meant to avoid to mix ''log tree'' in, it should be: >> >> BUG_ON(owner < BTRFS_FIRST_FREE_OBJECTID && root_objectid == BTRFS_TREE_LOG_OBJECTID); >> >> This should help you, can you give it a try? > > Bo, this did address the assertion I was tripping, so looks good from > here; it allowed me to report the second (different) assertion of > course. > > If you still think the fix is sound, is it a good idea for 3.5-rc7? >Hi Daniel, I''m sorry but it is not ready yet, as it does not catch the root cause of the bug. Josef has found that the bug comes from disabling merging delayed refs and is working on the bug with Arne. As the root cause has been found, the bug will be fixed soon IMO. Btw, while testing with your great test scripts, I also post patches for two bugs, which may have address your other issues. Their links are http://www.spinics.net/lists/linux-btrfs/msg17761.html http://www.spinics.net/lists/linux-btrfs/msg17764.html thanks, liubo> Thanks, > Daniel-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 11 July 2012 09:37, Liu Bo <liubo2009@cn.fujitsu.com> wrote:> On 07/10/2012 08:18 PM, Daniel J Blueman wrote: > >> On 2 July 2012 12:20, Liu Bo <liubo2009@cn.fujitsu.com> wrote: >>> On 07/02/2012 11:35 AM, Daniel J Blueman wrote: >>> >>>>> Hi everyone, >>>>> >>>>> I''ve got a nice set of fixes from Josef, Jan, Ilya and others in my >>>>> for-linus branch: >>>>> >>>>> git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git for-linus >>>>> >>>>> Some of the changes are fixes for the tree logging code, so I ran some >>>>> extra crash runs against them Friday night. >>>>> >>>>> I ended up with a new crash in the tree log directory deletion replay >>>>> code, so I didn''t send out the pull request to Linus. >>>>> >>>>> It isn''t clear yet if the new crash is because I was testing differently >>>>> or if it is a regression. I''m nailing it down this weekend, but please >>>>> give my for-linus a shot. >>>> With this branch (3.4.0), my test has consistently been hitting the >>>> BUG_ON(owner < BTRFS_FIRST_FREE_OBJECTID) in >>>> insert_inline_extent_backref [1]. This is followed by a string of >>>> other issues [2] and a hard lockup, so I used netconsole to collect >>>> this. >>>> >>>> I''m preparing my btrfs test for xfstests integration, but can slip you >>>> it if interested. It hits this case in ~30s. >>>> >>> >>> IMO the BUG_ON is meant to avoid to mix ''log tree'' in, it should be: >>> >>> BUG_ON(owner < BTRFS_FIRST_FREE_OBJECTID && root_objectid == BTRFS_TREE_LOG_OBJECTID); >>> >>> This should help you, can you give it a try? >> >> Bo, this did address the assertion I was tripping, so looks good from >> here; it allowed me to report the second (different) assertion of >> course. >> >> If you still think the fix is sound, is it a good idea for 3.5-rc7? > > > Hi Daniel, > > I''m sorry but it is not ready yet, as it does not catch the root cause of the bug. > > Josef has found that the bug comes from disabling merging delayed refs and is working on the bug > with Arne. As the root cause has been found, the bug will be fixed soon IMO.Now I see the two issues are connected.> Btw, while testing with your great test scripts, I also post patches for two bugs, which may have address your > other issues. Their links are > > http://www.spinics.net/lists/linux-btrfs/msg17761.html > http://www.spinics.net/lists/linux-btrfs/msg17764.htmlGreat work indeed! Thanks Bo, Daniel -- Daniel J Blueman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html