thr3ads.net - Btrfs devel - Please hammer my for-linus branch [Jul 2012]

If this information is useful, please help other people find it:
Share via:

Chris Mason

2012-Jul-01 01:22 UTC

Please hammer my for-linus branch

Hi everyone,

I''ve got a nice set of fixes from Josef, Jan, Ilya and others in my
for-linus branch:

git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git for-linus

Some of the changes are fixes for the tree logging code, so I ran some
extra crash runs against them Friday night.

I ended up with a new crash in the tree log directory deletion replay
code, so I didn''t send out the pull request to Linus.

It isn''t clear yet if the new crash is because I was testing
differently
or if it is a regression.  I''m nailing it down this weekend, but please
give my for-linus a shot.

-chris

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Daniel J Blueman

2012-Jul-02 03:35 UTC

head link

Re: Please hammer my for-linus branch

> Hi everyone,
>
> I''ve got a nice set of fixes from Josef, Jan, Ilya and others in
my
> for-linus branch:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git
for-linus
>
> Some of the changes are fixes for the tree logging code, so I ran some
> extra crash runs against them Friday night.
>
> I ended up with a new crash in the tree log directory deletion replay
> code, so I didn''t send out the pull request to Linus.
>
> It isn''t clear yet if the new crash is because I was testing
differently
> or if it is a regression.  I''m nailing it down this weekend, but
please
> give my for-linus a shot.
With this branch (3.4.0), my test has consistently been hitting the
BUG_ON(owner < BTRFS_FIRST_FREE_OBJECTID) in
insert_inline_extent_backref [1]. This is followed by a string of
other issues [2] and a hard lockup, so I used netconsole to collect
this.

I''m preparing my btrfs test for xfstests integration, but can slip you
it if interested. It hits this case in ~30s.

Thanks,
  Daniel

--- [1]

kernel BUG at fs/btrfs/extent-tree.c:1769!
invalid opcode: 0000 [#1] SMP
CPU 0
Modules linked in: brd netconsole dm_crypt dm_mod kvm_intel kvm
coretemp microcode uvcvideo videobuf2_core iwlwifi videodev
videobuf2_vmalloc videobuf2_memops btrfs i915 cfbcopyarea video
cfbimgblt cfbfillrect

Pid: 3219, comm: btrfs Not tainted 3.4.0-debug+ #1 Dell Inc. Latitude
E5420/0H5TG2
RIP: 0010:[<ffffffffa009b867>]  [<ffffffffa009b867>]
insert_inline_extent_backref+0xe7/0xf0 [btrfs]
RSP: 0018:ffff8801924df8c8  EFLAGS: 00010293
RAX: 0000000000000000 RBX: ffff8801ea7ae3f0 RCX: ffff8801924df910
RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000000
RBP: ffff8801924df948 R08: 0000000000000f4c R09: 0000000000000001
R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801101e0000
R13: ffff8801e6ed30f0 R14: 0000000000000000 R15: 0000000000000000
FS:  00007f1b3bf80740(0000) GS:ffff88022ec00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 000000000042c430 CR3: 0000000195a05000 CR4: 00000000000407f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process btrfs (pid: 3219, threadinfo ffff8801924de000, task ffff880223e23dc0)
Stack:
 0000000000000000 0000000000000005 0000000000000000 0000000000000000
 ffff880200000001 0000000000000005 ffff8801924df938 ffffffff81110457
 ffff8801101e1800 0000000000000f43 ffff8801101e1800 ffff8801ea7ae3f0
Call Trace:
 [<ffffffff81110457>] ? kmem_cache_alloc+0xe7/0x180
 [<ffffffffa009b90a>] __btrfs_inc_extent_ref+0x9a/0x1f0 [btrfs]
 [<ffffffffa009d0a7>] run_delayed_tree_ref+0x167/0x190 [btrfs]
 [<ffffffffa00a0f2e>] run_one_delayed_ref+0xde/0xf0 [btrfs]
 [<ffffffffa00a101d>] run_clustered_refs+0xdd/0x370 [btrfs]
 [<ffffffffa00a13f9>] btrfs_run_delayed_refs+0x149/0x340 [btrfs]
 [<ffffffffa00b29c7>] __btrfs_end_transaction+0xa7/0x360 [btrfs]
 [<ffffffffa00b2cc3>] btrfs_end_transaction_throttle+0x13/0x20 [btrfs]
 [<ffffffffa00fecc9>] relocate_block_group+0x439/0x560 [btrfs]
 [<ffffffffa00fefb4>] btrfs_relocate_block_group+0x1c4/0x300 [btrfs]
 [<ffffffffa00dc84a>] btrfs_relocate_chunk.isra.52+0x4a/0x240 [btrfs]
 [<ffffffffa00d7592>] ? free_extent_buffer+0x32/0x90 [btrfs]
 [<ffffffffa00dfb14>] __btrfs_balance+0x2f4/0x3f0 [btrfs]
 [<ffffffffa00dff03>] btrfs_balance+0x2f3/0x4d0 [btrfs]
 [<ffffffffa00e5c30>] btrfs_ioctl_balance+0x140/0x290 [btrfs]
 [<ffffffffa00e96c7>] btrfs_ioctl+0x5c7/0x7f0 [btrfs]
 [<ffffffff810f2526>] ? do_brk+0x246/0x360
 [<ffffffff81130987>] do_vfs_ioctl+0x87/0x340
 [<ffffffff8122b894>] ? lockdep_sys_exit_thunk+0x35/0x67
 [<ffffffff81130c8a>] sys_ioctl+0x4a/0x80
 [<ffffffff815bc622>] system_call_fastpath+0x16/0x1b
Code: 89 e6 4c 89 ef 48 8b 4d c8 4c 89 3c 24 48 89 44 24 18 8b 45 28
89 44 24 10 48 8b 45 20 48 89 44 24 08 e8 1d fa ff ff 31 c0 eb a4 <0f>
0b 0f 1f 80 00 00 00 00 55 48 89 e5 48 83 c4 80 48 89 5d d8
RIP  [<ffffffffa009b867>] insert_inline_extent_backref+0xe7/0xf0 [btrfs]
 RSP <ffff8801924df8c8>

--- [2]

BUG: sleeping function called from invalid context at kernel/rwsem.c:20
in_atomic(): 1, irqs_disabled(): 0, pid: 3219, name: btrfs
INFO: lockdep is turned off.
Pid: 3219, comm: btrfs Tainted: G      D      3.4.0-debug+ #1
Call Trace:
 [<ffffffff81069ae2>] __might_sleep+0x142/0x240
 [<ffffffff815b940f>] down_read+0x1f/0x5c
 [<ffffffff8105143f>] exit_signals+0x1f/0x130
 [<ffffffff81042956>] do_exit+0xb6/0x480
 [<ffffffff81005677>] oops_end+0x77/0xb0
 [<ffffffff810057f3>] die+0x53/0x80
 [<ffffffff81002354>] do_trap+0xc4/0x170
 [<ffffffff81002630>] do_invalid_op+0x90/0xb0
 [<ffffffffa009b867>] ? insert_inline_extent_backref+0xe7/0xf0 [btrfs]
 [<ffffffffa009466b>] ? btrfs_search_slot+0x67b/0x760 [btrfs]
 [<ffffffffa00923ff>] ? btrfs_leaf_free_space+0x5f/0xb0 [btrfs]
 [<ffffffff8122b85d>] ? trace_hardirqs_off_thunk+0x3a/0x3c
 [<ffffffff815bbf09>] ? restore_args+0x30/0x30
 [<ffffffff815bd695>] invalid_op+0x15/0x20
 [<ffffffffa009b867>] ? insert_inline_extent_backref+0xe7/0xf0 [btrfs]
 [<ffffffffa009b7de>] ? insert_inline_extent_backref+0x5e/0xf0 [btrfs]
 [<ffffffff81110457>] ? kmem_cache_alloc+0xe7/0x180
 [<ffffffffa009b90a>] __btrfs_inc_extent_ref+0x9a/0x1f0 [btrfs]
 [<ffffffffa009d0a7>] run_delayed_tree_ref+0x167/0x190 [btrfs]
 [<ffffffffa00a0f2e>] run_one_delayed_ref+0xde/0xf0 [btrfs]
 [<ffffffffa00a101d>] run_clustered_refs+0xdd/0x370 [btrfs]
 [<ffffffffa00a13f9>] btrfs_run_delayed_refs+0x149/0x340 [btrfs]
 [<ffffffffa00b29c7>] __btrfs_end_transaction+0xa7/0x360 [btrfs]
 [<ffffffffa00b2cc3>] btrfs_end_transaction_throttle+0x13/0x20 [btrfs]
 [<ffffffffa00fecc9>] relocate_block_group+0x439/0x560 [btrfs]
 [<ffffffffa00fefb4>] btrfs_relocate_block_group+0x1c4/0x300 [btrfs]
 [<ffffffffa00dc84a>] btrfs_relocate_chunk.isra.52+0x4a/0x240 [btrfs]
 [<ffffffffa00d7592>] ? free_extent_buffer+0x32/0x90 [btrfs]
 [<ffffffffa00dfb14>] __btrfs_balance+0x2f4/0x3f0 [btrfs]
 [<ffffffffa00dff03>] btrfs_balance+0x2f3/0x4d0 [btrfs]
 [<ffffffffa00e5c30>] btrfs_ioctl_balance+0x140/0x290 [btrfs]
 [<ffffffffa00e96c7>] btrfs_ioctl+0x5c7/0x7f0 [btrfs]
 [<ffffffff810f2526>] ? do_brk+0x246/0x360
 [<ffffffff81130987>] do_vfs_ioctl+0x87/0x340
 [<ffffffff8122b894>] ? lockdep_sys_exit_thunk+0x35/0x67
 [<ffffffff81130c8a>] sys_ioctl+0x4a/0x80
 [<ffffffff815bc622>] system_call_fastpath+0x16/0x1b

BUG: scheduling while atomic: btrfs/3219/0x10000002
INFO: lockdep is turned off.
Modules linked in: brd netconsole dm_crypt dm_mod kvm_intel kvm
coretemp microcode uvcvideo videobuf2_core iwlwifi videodev
videobuf2_vmalloc videobuf2_memops btrfs i915 cfbcopyarea video
cfbimgblt cfbfillrect
Pid: 3219, comm: btrfs Tainted: G      D      3.4.0-debug+ #1
Call Trace:
 [<ffffffff815a674a>] __schedule_bug+0x5d/0x61
 [<ffffffff815ba0fb>] __schedule+0x8fb/0x9a0
 [<ffffffff810055a7>] ? show_trace_log_lvl+0x57/0x70
 [<ffffffff810055d0>] ? show_trace+0x10/0x20
 [<ffffffff815a469f>] ? dump_stack+0x72/0x7b
 [<ffffffff8106c4e5>] __cond_resched+0x25/0x40
 [<ffffffff815ba21d>] _cond_resched+0x2d/0x40
 [<ffffffff815b9414>] down_read+0x24/0x5c
 [<ffffffff8105143f>] exit_signals+0x1f/0x130
 [<ffffffff81042956>] do_exit+0xb6/0x480
 [<ffffffff81005677>] oops_end+0x77/0xb0
 [<ffffffff810057f3>] die+0x53/0x80
 [<ffffffff81002354>] do_trap+0xc4/0x170
 [<ffffffff81002630>] do_invalid_op+0x90/0xb0
 [<ffffffffa009b867>] ? insert_inline_extent_backref+0xe7/0xf0 [btrfs]
 [<ffffffffa009466b>] ? btrfs_search_slot+0x67b/0x760 [btrfs]
 [<ffffffffa00923ff>] ? btrfs_leaf_free_space+0x5f/0xb0 [btrfs]
 [<ffffffff8122b85d>] ? trace_hardirqs_off_thunk+0x3a/0x3c
 [<ffffffff815bbf09>] ? restore_args+0x30/0x30
 [<ffffffff815bd695>] invalid_op+0x15/0x20
 [<ffffffffa009b867>] ? insert_inline_extent_backref+0xe7/0xf0 [btrfs]
 [<ffffffffa009b7de>] ? insert_inline_extent_backref+0x5e/0xf0 [btrfs]
 [<ffffffff81110457>] ? kmem_cache_alloc+0xe7/0x180
 [<ffffffffa009b90a>] __btrfs_inc_extent_ref+0x9a/0x1f0 [btrfs]
 [<ffffffffa009d0a7>] run_delayed_tree_ref+0x167/0x190 [btrfs]
 [<ffffffffa00a0f2e>] run_one_delayed_ref+0xde/0xf0 [btrfs]
 [<ffffffffa00a101d>] run_clustered_refs+0xdd/0x370 [btrfs]
 [<ffffffffa00a13f9>] btrfs_run_delayed_refs+0x149/0x340 [btrfs]
 [<ffffffffa00b29c7>] __btrfs_end_transaction+0xa7/0x360 [btrfs]
 [<ffffffffa00b2cc3>] btrfs_end_transaction_throttle+0x13/0x20 [btrfs]
 [<ffffffffa00fecc9>] relocate_block_group+0x439/0x560 [btrfs]
 [<ffffffffa00fefb4>] btrfs_relocate_block_group+0x1c4/0x300 [btrfs]
 [<ffffffffa00dc84a>] btrfs_relocate_chunk.isra.52+0x4a/0x240 [btrfs]
 [<ffffffffa00d7592>] ? free_extent_buffer+0x32/0x90 [btrfs]
 [<ffffffffa00dfb14>] __btrfs_balance+0x2f4/0x3f0 [btrfs]
 [<ffffffffa00dff03>] btrfs_balance+0x2f3/0x4d0 [btrfs]
 [<ffffffffa00e5c30>] btrfs_ioctl_balance+0x140/0x290 [btrfs]
 [<ffffffffa00e96c7>] btrfs_ioctl+0x5c7/0x7f0 [btrfs]
 [<ffffffff810f2526>] ? do_brk+0x246/0x360
 [<ffffffff81130987>] do_vfs_ioctl+0x87/0x340
 [<ffffffff8122b894>] ? lockdep_sys_exit_thunk+0x35/0x67
 [<ffffffff81130c8a>] sys_ioctl+0x4a/0x80
 [<ffffffff815bc622>] system_call_fastpath+0x16/0x1b
note: btrfs[3219] exited with preempt_count 1
-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Liu Bo

2012-Jul-02 04:20 UTC

head link

Re: Please hammer my for-linus branch

On 07/02/2012 11:35 AM, Daniel J Blueman wrote:
>> Hi everyone,
>>
>> I''ve got a nice set of fixes from Josef, Jan, Ilya and others
in my
>> for-linus branch:
>>
>> git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git
for-linus
>>
>> Some of the changes are fixes for the tree logging code, so I ran some
>> extra crash runs against them Friday night.
>>
>> I ended up with a new crash in the tree log directory deletion replay
>> code, so I didn''t send out the pull request to Linus.
>>
>> It isn''t clear yet if the new crash is because I was testing
differently
>> or if it is a regression.  I''m nailing it down this weekend,
but please
>> give my for-linus a shot.
> 
> With this branch (3.4.0), my test has consistently been hitting the
> BUG_ON(owner < BTRFS_FIRST_FREE_OBJECTID) in
> insert_inline_extent_backref [1]. This is followed by a string of
> other issues [2] and a hard lockup, so I used netconsole to collect
> this.
> 
> I''m preparing my btrfs test for xfstests integration, but can slip
you
> it if interested. It hits this case in ~30s.
> 

IMO the BUG_ON is meant to avoid to mix ''log tree'' in, it
should be:

BUG_ON(owner < BTRFS_FIRST_FREE_OBJECTID && root_objectid ==
BTRFS_TREE_LOG_OBJECTID);

This should help you, can you give it a try?

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 4b5a1e1..a006017 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -1766,7 +1766,8 @@ int insert_inline_extent_backref(struct btrfs_trans_handle
*trans,
 					   bytenr, num_bytes, parent,
 					   root_objectid, owner, offset, 1);
 	if (ret == 0) {
-		BUG_ON(owner < BTRFS_FIRST_FREE_OBJECTID);
+		BUG_ON(owner < BTRFS_FIRST_FREE_OBJECTID &&
+		       root_objectid == BTRFS_TREE_LOG_OBJECTID);
 		update_inline_extent_backref(trans, root, path, iref,
 					     refs_to_add, extent_op);
 	} else if (ret == -ENOENT) {


thanks,
liubo
> Thanks,
>   Daniel
> 
> --- [1]
> 
> kernel BUG at fs/btrfs/extent-tree.c:1769!
> invalid opcode: 0000 [#1] SMP
> CPU 0
> Modules linked in: brd netconsole dm_crypt dm_mod kvm_intel kvm
> coretemp microcode uvcvideo videobuf2_core iwlwifi videodev
> videobuf2_vmalloc videobuf2_memops btrfs i915 cfbcopyarea video
> cfbimgblt cfbfillrect
> 
> Pid: 3219, comm: btrfs Not tainted 3.4.0-debug+ #1 Dell Inc. Latitude
> E5420/0H5TG2
> RIP: 0010:[<ffffffffa009b867>]  [<ffffffffa009b867>]
> insert_inline_extent_backref+0xe7/0xf0 [btrfs]
> RSP: 0018:ffff8801924df8c8  EFLAGS: 00010293
> RAX: 0000000000000000 RBX: ffff8801ea7ae3f0 RCX: ffff8801924df910
> RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000000
> RBP: ffff8801924df948 R08: 0000000000000f4c R09: 0000000000000001
> R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801101e0000
> R13: ffff8801e6ed30f0 R14: 0000000000000000 R15: 0000000000000000
> FS:  00007f1b3bf80740(0000) GS:ffff88022ec00000(0000)
knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 000000000042c430 CR3: 0000000195a05000 CR4: 00000000000407f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process btrfs (pid: 3219, threadinfo ffff8801924de000, task
ffff880223e23dc0)
> Stack:
>  0000000000000000 0000000000000005 0000000000000000 0000000000000000
>  ffff880200000001 0000000000000005 ffff8801924df938 ffffffff81110457
>  ffff8801101e1800 0000000000000f43 ffff8801101e1800 ffff8801ea7ae3f0
> Call Trace:
>  [<ffffffff81110457>] ? kmem_cache_alloc+0xe7/0x180
>  [<ffffffffa009b90a>] __btrfs_inc_extent_ref+0x9a/0x1f0 [btrfs]
>  [<ffffffffa009d0a7>] run_delayed_tree_ref+0x167/0x190 [btrfs]
>  [<ffffffffa00a0f2e>] run_one_delayed_ref+0xde/0xf0 [btrfs]
>  [<ffffffffa00a101d>] run_clustered_refs+0xdd/0x370 [btrfs]
>  [<ffffffffa00a13f9>] btrfs_run_delayed_refs+0x149/0x340 [btrfs]
>  [<ffffffffa00b29c7>] __btrfs_end_transaction+0xa7/0x360 [btrfs]
>  [<ffffffffa00b2cc3>] btrfs_end_transaction_throttle+0x13/0x20
[btrfs]
>  [<ffffffffa00fecc9>] relocate_block_group+0x439/0x560 [btrfs]
>  [<ffffffffa00fefb4>] btrfs_relocate_block_group+0x1c4/0x300 [btrfs]
>  [<ffffffffa00dc84a>] btrfs_relocate_chunk.isra.52+0x4a/0x240 [btrfs]
>  [<ffffffffa00d7592>] ? free_extent_buffer+0x32/0x90 [btrfs]
>  [<ffffffffa00dfb14>] __btrfs_balance+0x2f4/0x3f0 [btrfs]
>  [<ffffffffa00dff03>] btrfs_balance+0x2f3/0x4d0 [btrfs]
>  [<ffffffffa00e5c30>] btrfs_ioctl_balance+0x140/0x290 [btrfs]
>  [<ffffffffa00e96c7>] btrfs_ioctl+0x5c7/0x7f0 [btrfs]
>  [<ffffffff810f2526>] ? do_brk+0x246/0x360
>  [<ffffffff81130987>] do_vfs_ioctl+0x87/0x340
>  [<ffffffff8122b894>] ? lockdep_sys_exit_thunk+0x35/0x67
>  [<ffffffff81130c8a>] sys_ioctl+0x4a/0x80
>  [<ffffffff815bc622>] system_call_fastpath+0x16/0x1b
> Code: 89 e6 4c 89 ef 48 8b 4d c8 4c 89 3c 24 48 89 44 24 18 8b 45 28
> 89 44 24 10 48 8b 45 20 48 89 44 24 08 e8 1d fa ff ff 31 c0 eb a4
<0f>
> 0b 0f 1f 80 00 00 00 00 55 48 89 e5 48 83 c4 80 48 89 5d d8
> RIP  [<ffffffffa009b867>] insert_inline_extent_backref+0xe7/0xf0
[btrfs]
>  RSP <ffff8801924df8c8>
> 
> --- [2]
> 
> BUG: sleeping function called from invalid context at kernel/rwsem.c:20
> in_atomic(): 1, irqs_disabled(): 0, pid: 3219, name: btrfs
> INFO: lockdep is turned off.
> Pid: 3219, comm: btrfs Tainted: G      D      3.4.0-debug+ #1
> Call Trace:
>  [<ffffffff81069ae2>] __might_sleep+0x142/0x240
>  [<ffffffff815b940f>] down_read+0x1f/0x5c
>  [<ffffffff8105143f>] exit_signals+0x1f/0x130
>  [<ffffffff81042956>] do_exit+0xb6/0x480
>  [<ffffffff81005677>] oops_end+0x77/0xb0
>  [<ffffffff810057f3>] die+0x53/0x80
>  [<ffffffff81002354>] do_trap+0xc4/0x170
>  [<ffffffff81002630>] do_invalid_op+0x90/0xb0
>  [<ffffffffa009b867>] ? insert_inline_extent_backref+0xe7/0xf0
[btrfs]
>  [<ffffffffa009466b>] ? btrfs_search_slot+0x67b/0x760 [btrfs]
>  [<ffffffffa00923ff>] ? btrfs_leaf_free_space+0x5f/0xb0 [btrfs]
>  [<ffffffff8122b85d>] ? trace_hardirqs_off_thunk+0x3a/0x3c
>  [<ffffffff815bbf09>] ? restore_args+0x30/0x30
>  [<ffffffff815bd695>] invalid_op+0x15/0x20
>  [<ffffffffa009b867>] ? insert_inline_extent_backref+0xe7/0xf0
[btrfs]
>  [<ffffffffa009b7de>] ? insert_inline_extent_backref+0x5e/0xf0
[btrfs]
>  [<ffffffff81110457>] ? kmem_cache_alloc+0xe7/0x180
>  [<ffffffffa009b90a>] __btrfs_inc_extent_ref+0x9a/0x1f0 [btrfs]
>  [<ffffffffa009d0a7>] run_delayed_tree_ref+0x167/0x190 [btrfs]
>  [<ffffffffa00a0f2e>] run_one_delayed_ref+0xde/0xf0 [btrfs]
>  [<ffffffffa00a101d>] run_clustered_refs+0xdd/0x370 [btrfs]
>  [<ffffffffa00a13f9>] btrfs_run_delayed_refs+0x149/0x340 [btrfs]
>  [<ffffffffa00b29c7>] __btrfs_end_transaction+0xa7/0x360 [btrfs]
>  [<ffffffffa00b2cc3>] btrfs_end_transaction_throttle+0x13/0x20
[btrfs]
>  [<ffffffffa00fecc9>] relocate_block_group+0x439/0x560 [btrfs]
>  [<ffffffffa00fefb4>] btrfs_relocate_block_group+0x1c4/0x300 [btrfs]
>  [<ffffffffa00dc84a>] btrfs_relocate_chunk.isra.52+0x4a/0x240 [btrfs]
>  [<ffffffffa00d7592>] ? free_extent_buffer+0x32/0x90 [btrfs]
>  [<ffffffffa00dfb14>] __btrfs_balance+0x2f4/0x3f0 [btrfs]
>  [<ffffffffa00dff03>] btrfs_balance+0x2f3/0x4d0 [btrfs]
>  [<ffffffffa00e5c30>] btrfs_ioctl_balance+0x140/0x290 [btrfs]
>  [<ffffffffa00e96c7>] btrfs_ioctl+0x5c7/0x7f0 [btrfs]
>  [<ffffffff810f2526>] ? do_brk+0x246/0x360
>  [<ffffffff81130987>] do_vfs_ioctl+0x87/0x340
>  [<ffffffff8122b894>] ? lockdep_sys_exit_thunk+0x35/0x67
>  [<ffffffff81130c8a>] sys_ioctl+0x4a/0x80
>  [<ffffffff815bc622>] system_call_fastpath+0x16/0x1b
> 
> BUG: scheduling while atomic: btrfs/3219/0x10000002
> INFO: lockdep is turned off.
> Modules linked in: brd netconsole dm_crypt dm_mod kvm_intel kvm
> coretemp microcode uvcvideo videobuf2_core iwlwifi videodev
> videobuf2_vmalloc videobuf2_memops btrfs i915 cfbcopyarea video
> cfbimgblt cfbfillrect
> Pid: 3219, comm: btrfs Tainted: G      D      3.4.0-debug+ #1
> Call Trace:
>  [<ffffffff815a674a>] __schedule_bug+0x5d/0x61
>  [<ffffffff815ba0fb>] __schedule+0x8fb/0x9a0
>  [<ffffffff810055a7>] ? show_trace_log_lvl+0x57/0x70
>  [<ffffffff810055d0>] ? show_trace+0x10/0x20
>  [<ffffffff815a469f>] ? dump_stack+0x72/0x7b
>  [<ffffffff8106c4e5>] __cond_resched+0x25/0x40
>  [<ffffffff815ba21d>] _cond_resched+0x2d/0x40
>  [<ffffffff815b9414>] down_read+0x24/0x5c
>  [<ffffffff8105143f>] exit_signals+0x1f/0x130
>  [<ffffffff81042956>] do_exit+0xb6/0x480
>  [<ffffffff81005677>] oops_end+0x77/0xb0
>  [<ffffffff810057f3>] die+0x53/0x80
>  [<ffffffff81002354>] do_trap+0xc4/0x170
>  [<ffffffff81002630>] do_invalid_op+0x90/0xb0
>  [<ffffffffa009b867>] ? insert_inline_extent_backref+0xe7/0xf0
[btrfs]
>  [<ffffffffa009466b>] ? btrfs_search_slot+0x67b/0x760 [btrfs]
>  [<ffffffffa00923ff>] ? btrfs_leaf_free_space+0x5f/0xb0 [btrfs]
>  [<ffffffff8122b85d>] ? trace_hardirqs_off_thunk+0x3a/0x3c
>  [<ffffffff815bbf09>] ? restore_args+0x30/0x30
>  [<ffffffff815bd695>] invalid_op+0x15/0x20
>  [<ffffffffa009b867>] ? insert_inline_extent_backref+0xe7/0xf0
[btrfs]
>  [<ffffffffa009b7de>] ? insert_inline_extent_backref+0x5e/0xf0
[btrfs]
>  [<ffffffff81110457>] ? kmem_cache_alloc+0xe7/0x180
>  [<ffffffffa009b90a>] __btrfs_inc_extent_ref+0x9a/0x1f0 [btrfs]
>  [<ffffffffa009d0a7>] run_delayed_tree_ref+0x167/0x190 [btrfs]
>  [<ffffffffa00a0f2e>] run_one_delayed_ref+0xde/0xf0 [btrfs]
>  [<ffffffffa00a101d>] run_clustered_refs+0xdd/0x370 [btrfs]
>  [<ffffffffa00a13f9>] btrfs_run_delayed_refs+0x149/0x340 [btrfs]
>  [<ffffffffa00b29c7>] __btrfs_end_transaction+0xa7/0x360 [btrfs]
>  [<ffffffffa00b2cc3>] btrfs_end_transaction_throttle+0x13/0x20
[btrfs]
>  [<ffffffffa00fecc9>] relocate_block_group+0x439/0x560 [btrfs]
>  [<ffffffffa00fefb4>] btrfs_relocate_block_group+0x1c4/0x300 [btrfs]
>  [<ffffffffa00dc84a>] btrfs_relocate_chunk.isra.52+0x4a/0x240 [btrfs]
>  [<ffffffffa00d7592>] ? free_extent_buffer+0x32/0x90 [btrfs]
>  [<ffffffffa00dfb14>] __btrfs_balance+0x2f4/0x3f0 [btrfs]
>  [<ffffffffa00dff03>] btrfs_balance+0x2f3/0x4d0 [btrfs]
>  [<ffffffffa00e5c30>] btrfs_ioctl_balance+0x140/0x290 [btrfs]
>  [<ffffffffa00e96c7>] btrfs_ioctl+0x5c7/0x7f0 [btrfs]
>  [<ffffffff810f2526>] ? do_brk+0x246/0x360
>  [<ffffffff81130987>] do_vfs_ioctl+0x87/0x340
>  [<ffffffff8122b894>] ? lockdep_sys_exit_thunk+0x35/0x67
>  [<ffffffff81130c8a>] sys_ioctl+0x4a/0x80
>  [<ffffffff815bc622>] system_call_fastpath+0x16/0x1b
> note: btrfs[3219] exited with preempt_count 1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Josef Bacik

2012-Jul-02 13:34 UTC

head link

Re: Please hammer my for-linus branch

On Sun, Jul 01, 2012 at 09:35:01PM -0600, Daniel J Blueman
wrote:> > Hi everyone,
> >
> > I''ve got a nice set of fixes from Josef, Jan, Ilya and others
in my
> > for-linus branch:
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git
for-linus
> >
> > Some of the changes are fixes for the tree logging code, so I ran some
> > extra crash runs against them Friday night.
> >
> > I ended up with a new crash in the tree log directory deletion replay
> > code, so I didn''t send out the pull request to Linus.
> >
> > It isn''t clear yet if the new crash is because I was testing
differently
> > or if it is a regression.  I''m nailing it down this weekend,
but please
> > give my for-linus a shot.
> 
> With this branch (3.4.0), my test has consistently been hitting the
> BUG_ON(owner < BTRFS_FIRST_FREE_OBJECTID) in
> insert_inline_extent_backref [1]. This is followed by a string of
> other issues [2] and a hard lockup, so I used netconsole to collect
> this.
> 
> I''m preparing my btrfs test for xfstests integration, but can slip
you
> it if interested. It hits this case in ~30s.
> 
Can you apply this and capture the output, I have a feeling I know what this is.
Thanks,

Josef

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 5775dc4..917ea70 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -1766,7 +1766,13 @@ int insert_inline_extent_backref(struct
btrfs_trans_handle *trans,
 					   bytenr, num_bytes, parent,
 					   root_objectid, owner, offset, 1);
 	if (ret == 0) {
-		BUG_ON(owner < BTRFS_FIRST_FREE_OBJECTID);
+		if (owner < BTRFS_FIRST_FREE_OBJECTID) {
+			printk(KERN_ERR "bad inline extent, bytenr=%Lu, "
+			       "num_bytes=%Lu, parent=%Lu, root=%Lu, owner=%Lu"
+			       ", offset=%Lu\n", bytenr, num_bytes, parent,
+			       root_objectid, owner, offset);
+			BUG_ON(owner < BTRFS_FIRST_FREE_OBJECTID);
+		}
 		update_inline_extent_backref(trans, root, path, iref,
 					     refs_to_add, extent_op);
 	} else if (ret == -ENOENT) {
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

David Sterba

2012-Jul-02 14:10 UTC

head link

Re: xfstests/224 lockup/slowdown (was: Please hammer my for-linus branch)

Hi,

I''m seeing a machine lockup in xfstests/224, logs attached.
Friday''s
xfstests round with 3.5-rc4 was ok, all tests passed.

The ''dd'' processes are in D-state with this stacktraces

 5597 pts/0    D+     0:00 dd status=noxfer if=/dev/zero of=/mnt/a2/testfile.8
bs=4k conv=notrunc
[<ffffffffa001bb3e>] reserve_metadata_bytes+0x33e/0x8f0 [btrfs]
[<ffffffffa001cd64>] btrfs_delalloc_reserve_metadata+0x134/0x3b0 [btrfs]
[<ffffffffa001d16b>] btrfs_delalloc_reserve_space+0x3b/0x60 [btrfs]
[<ffffffffa004132b>] __btrfs_buffered_write+0x17b/0x380 [btrfs]
[<ffffffffa0041783>] btrfs_file_aio_write+0x253/0x4e0 [btrfs]
[<ffffffff81144892>] do_sync_write+0xe2/0x120
[<ffffffff8114519e>] vfs_write+0xce/0x190
[<ffffffff811454e4>] sys_write+0x54/0xa0
[<ffffffff818b4fa9>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff

and (not sure if there are more)

 5666 pts/0    D+     0:00 dd status=noxfer if=/dev/zero of=/mnt/a2/testfile.6
bs=4k conv=notrunc
[<ffffffffa001bb3e>] reserve_metadata_bytes+0x33e/0x8f0 [btrfs]
[<ffffffffa001c56a>] btrfs_block_rsv_add+0x3a/0x60 [btrfs]
[<ffffffffa003155e>] start_transaction+0x26e/0x330 [btrfs]
[<ffffffffa0031903>] btrfs_start_transaction+0x13/0x20 [btrfs]
[<ffffffffa003cae0>] btrfs_dirty_inode+0xb0/0xe0 [btrfs]
[<ffffffffa003cdad>] btrfs_update_time+0xcd/0x180 [btrfs]
[<ffffffffa00416f8>] btrfs_file_aio_write+0x1c8/0x4e0 [btrfs]
[<ffffffff81144892>] do_sync_write+0xe2/0x120
[<ffffffff8114519e>] vfs_write+0xce/0x190
[<ffffffff811454e4>] sys_write+0x54/0xa0
[<ffffffff818b4fa9>] system_call_fastpath+0x16/0x1b

all btrfs kernel threads are idle.

Mount options: -o space_cache
Mkfs: fresh, default options

# btrfs fi df /mnt/a2
System: total=4.00MiB, used=4.00KiB
Data+Metadata: total=1020.00MiB, used=987.32MiB

[meanwhile]

While grabbing lockdep stats the test respawned

224 236s ...    [14:57:42] [15:46:56] 2954s

but there was no disk activity, I wonder if touching /proc/lockdep or
/proc/lock_stat is affecting this.

Finishing this report anyway, and will redo the tests again.

Looking again into the logs, the first process snapshot (only D-state
processes) is much longer than process snapshot of containing all,
unfortuntelly I don''t have timestamps recorded, but this suggests that
it''s
very slowly going on, so slowly that I considered it stalled looking at the
io graphs.


david

David Sterba

2012-Jul-02 14:34 UTC

head link

Re: xfstests/224 lockup/slowdown (was: Please hammer my for-linus branch)

On Mon, Jul 02, 2012 at 04:10:52PM +0200, David Sterba
wrote:> Finishing this report anyway, and will redo the tests again.
> 
> Looking again into the logs, the first process snapshot (only D-state
> processes) is much longer than process snapshot of containing all,
> unfortuntelly I don''t have timestamps recorded, but this suggests
that it''s
> very slowly going on, so slowly that I considered it stalled looking at the
> io graphs.
Fresh build, reboot, and single xfstests/224 run:

during first ~20 seconds, there''s high write activity, ie. file setup,
then it goes to a "few tens-to-hundreds of KB every 4 seconds". Cpu is
idle,
sample output from dstat:

----total-cpu-usage---- --dsk/sda9- ---system--
usr sys idl wai hiq siq| read  writ| int   csw
  1   1  99   0   0   0|   0     0 | 923  1856
  0   1  98   0   1   0|   0  8192B| 904  2796
  0   1  99   0   0   0|   0     0 | 945  1914
  1   1  98   0   0   0|   0     0 | 899  1849
  1   1  98   0   0   1|   0     0 | 906  1848
  0   3  97   0   0   0|   0    20k| 901  3740
  0   0 100   0   0   0|   0     0 | 905  1851
  1   1  98   0   0   1|   0     0 | 946  1917
  0   1  99   0   0   0|   0     0 | 904  1858
  0   1  99   0   0   0|   0  8192B| 907  2805
  1   1  98   0   0   1|   0     0 | 891  1836
  0   1  99   0   0   0|   0     0 | 900  1847
  0   1  99   0   0   0|   0     0 | 940  1905
  1   4  95   0   0   0|   0    32k| 904  5153
  1   2  97   0   0   0|   0    36k| 913  4240
  0   1  99   0   0   0|   0     0 | 907  1849
  0   1  99   0   0   0|   0     0 | 908  1852
  1   1  98   0   0   1|   0     0 | 933  1901
  1   2  98   0   0   0|   0  8192B| 916  2808
  0   1  99   0   0   0|   0     0 | 917  1843
  0   1  99   0   0   1|   0     0 | 908  1844
  1   1  99   0   0   0|   0     0 | 905  1860
  0   5  95   0   0   0|   0    36k| 943  7565
  1   1  99   0   0   0|   0     0 | 911  1861
  0   1  99   0   0   0|   0     0 | 910  1852
  1   1  98   0   0   0|   0     0 | 944  1878
  1   2  97   0   0   1|   0    16k| 898  3753
  0   9  87   4   0   1|   0  1020k|1035    11k
  0  19  74   7   0   1|   0  2092k|3052    24k
  0   1  99   0   0   0|   0     0 | 909  1851
  1   1  98   0   0   1|   0     0 | 915  1856
  1   1  99   0   0   0|   0     0 | 896  1847
  0   2  98   0   0   0|   0  8192B| 931  2847
  0   1  99   0   0   0|   0     0 | 899  1850
  1   1  98   0   0   1|   0     0 | 896  1861
  0   1  99   0   0   0|   0     0 | 911  1855
  1   5  94   0   0   0|   0    28k| 891  6521
  0   9  87   3   0   1|   0  1100k| 963    11k
  0   1  99   0   0   0|   0     0 | 905  1857
  1   1  99   0   0   0|   0     0 | 895  1851
  1   1  98   0   0   0|   0     0 | 911  1852
  0   7  88   4   0   1|   0   700k| 911  8533
  0   1  99   0   0   0|   0     0 | 940  1905
  1   1  99   0   0   0|   0     0 | 912  1851
  1   1  99   0   0   0|   0     0 | 895  1851
  0  10  89   0   0   1|   0   100k| 912    13k

and repeats more or less the same.

Bisection in progress.


david
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

David Sterba

2012-Jul-02 16:10 UTC

head link

Re: xfstests/224 lockup/slowdown (was: Please hammer my for-linus branch)

On Mon, Jul 02, 2012 at 04:34:53PM +0200, David Sterba
wrote:> Bisection in progress.
commit cae76522b19735c576803bec273f49062aa418ab
Author: Josef Bacik <jbacik@fusionio.com>
Date:   Thu Jun 21 14:05:49 2012 -0400

    Btrfs: flush delayed inodes if we''re short on space

    Those crazy gentoo guys have been complaining about ENOSPC errors on their
    portage volumes.  This is because doing things like untar tends to create
    lots of new files which will soak up all the reservation space in the
    delayed inodes.  Usually this gets papered over by the fact that we will try
    and commit the transaction, however if this happens in the wrong spot or we
    choose not to commit the transaction you will be screwed.  So add the
    ability to expclitly flush delayed inodes to free up space.  Please test
    this out guys to make sure it works since as usual I cannot reproduce.
    Thanks,
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Chris Mason

2012-Jul-02 20:17 UTC

head link

Re: Please hammer my for-linus branch

On Sat, Jun 30, 2012 at 09:22:59PM -0400, Chris Mason
wrote:> Hi everyone,
> 
> I''ve got a nice set of fixes from Josef, Jan, Ilya and others in
my
> for-linus branch:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git
for-linus
> 
> Some of the changes are fixes for the tree logging code, so I ran some
> extra crash runs against them Friday night.
> 
> I ended up with a new crash in the tree log directory deletion replay
> code, so I didn''t send out the pull request to Linus.
> 
> It isn''t clear yet if the new crash is because I was testing
differently
> or if it is a regression.  I''m nailing it down this weekend, but
please
> give my for-linus a shot.
Ok, I''ve just rebased for-linus.  I''ve dropped
Josef''s enospc patch,
which should fix the regression Dave hit.  I''ve also added a fix for my
log replay crash, which was definitely an old bug.  The delayed
directory operations were queuing up the changes made during replay, and
it was confusing the replay code.

Looks like there''s a fix pending from Liu Bo, but I''ll let
Daniel test
that before pulling it in as well.

Thanks everyone.

-chris


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Daniel J Blueman

2012-Jul-03 03:55 UTC

head link

Re: Please hammer my for-linus branch

On 2 July 2012 21:34, Josef Bacik <jbacik@fusionio.com>
wrote:> On Sun, Jul 01, 2012 at 09:35:01PM -0600, Daniel J Blueman wrote:
>> > Hi everyone,
>> >
>> > I''ve got a nice set of fixes from Josef, Jan, Ilya and
others in my
>> > for-linus branch:
>> >
>> >
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git for-linus
>> >
>> > Some of the changes are fixes for the tree logging code, so I ran
some
>> > extra crash runs against them Friday night.
>> >
>> > I ended up with a new crash in the tree log directory deletion
replay
>> > code, so I didn''t send out the pull request to Linus.
>> >
>> > It isn''t clear yet if the new crash is because I was
testing differently
>> > or if it is a regression.  I''m nailing it down this
weekend, but please
>> > give my for-linus a shot.
>>
>> With this branch (3.4.0), my test has consistently been hitting the
>> BUG_ON(owner < BTRFS_FIRST_FREE_OBJECTID) in
>> insert_inline_extent_backref [1]. This is followed by a string of
>> other issues [2] and a hard lockup, so I used netconsole to collect
>> this.
>>
>> I''m preparing my btrfs test for xfstests integration, but can
slip you
>> it if interested. It hits this case in ~30s.
>>
>
> Can you apply this and capture the output, I have a feeling I know what
this is.
> Thanks,
>
> Josef
>
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 5775dc4..917ea70 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -1766,7 +1766,13 @@ int insert_inline_extent_backref(struct
btrfs_trans_handle *trans,
>                                            bytenr, num_bytes, parent,
>                                            root_objectid, owner, offset,
1);
>         if (ret == 0) {
> -               BUG_ON(owner < BTRFS_FIRST_FREE_OBJECTID);
> +               if (owner < BTRFS_FIRST_FREE_OBJECTID) {
> +                       printk(KERN_ERR "bad inline extent,
bytenr=%Lu, "
> +                              "num_bytes=%Lu, parent=%Lu, root=%Lu,
owner=%Lu"
> +                              ", offset=%Lu\n", bytenr,
num_bytes, parent,
> +                              root_objectid, owner, offset);
> +                       BUG_ON(owner < BTRFS_FIRST_FREE_OBJECTID);
> +               }
>                 update_inline_extent_backref(trans, root, path, iref,
>                                              refs_to_add, extent_op);
>         } else if (ret == -ENOENT) {
Bo''s additional condition ''root_objectid ==
BTRFS_TREE_LOG_OBJECTID''
seemed to hold it off.

Here is the debug you asked for [1]. After we''ve determined the right
fix for this issue, I''ll post the other issues I was seeing.

Thanks!
  Daniel

--- [1]

device fsid c5cf90d4-0301-4877-8f34-e8e82fe6ab0a devid 1 transid 3 /dev/ram3
device fsid c5cf90d4-0301-4877-8f34-e8e82fe6ab0a devid 2 transid 3 /dev/ram0
device fsid c5cf90d4-0301-4877-8f34-e8e82fe6ab0a devid 1 transid 4 /dev/ram3
btrfs: allowing degraded mounts
btrfs: force zlib compression
btrfs: disabling disk space caching
btrfs: enabling auto defrag
btrfs: enabling auto recovery
btrfs: no dev_stats entry found for device /dev/ram0 (devid 2) (OK on
first mount after mkfs)
btrfs: no dev_stats entry found for device /dev/ram3 (devid 1) (OK on
first mount after mkfs)
btrfs: relocating block group 512425984 flags 20
btrfs: found 2 extents
btrfs: relocating block group 190382080 flags 9
btrfs: found 4756 extents
btrfs: found 4756 extents
bad inline extent, bytenr=36909056, num_bytes=4096, parent=0, root=5,
owner=0, offset=0
------------[ cut here ]------------
kernel BUG at fs/btrfs/extent-tree.c:1774!
invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC CPU 0

Modules linked in:
 brd dm_crypt dm_mod kvm_intel kvm uvcvideo videobuf2_core videodev
videobuf2_vmalloc videobuf2_memops coretemp microcode iwlwifi
netconsole btrfs i915 cfbcopyarea cfbimgblt cfbfillrect video

Pid: 8055, comm: btrfs-endio-wri Not tainted 3.4.0-debug+ #5 Dell Inc.
Latitude E5420/0H5TG2

RIP: 0010:[<ffffffffa009685b>] [<ffffffffa009685b>]
insert_inline_extent_backref+0x11b/0x120 [btrfs]
RSP: 0018:ffff880200415a40  EFLAGS: 00010282
RAX: 000000000000006d RBX: ffff88020e99c1b0 RCX: 0000000000000000
RDX: ffffffff8103cde5 RSI: 0000000000000001 RDI: ffffffff8103d170
RBP: ffff880200415ac0 R08: 0000000000000002 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff88020e7f7000
R13: ffff88020ef80e60 R14: 0000000000000000 R15: 0000000000001000
FS:  0000000000000000(0000) GS:ffff88022ec00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007f6916262000 CR3: 0000000221e24000 CR4: 00000000000407f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process btrfs-endio-wri (pid: 8055, threadinfo ffff880200414000, task
ffff880222029ee0)
Stack:
 0000000000000000 0000000000000005 0000000000000000 0000000000000000
 ffff880200000001 ffffffffa0089be5 ffff880200415aa0 0000000002333000
 ffff8801f6f15000 0000000000000ea1 ffff8801f6f15000 ffff88020e99c1b0

Call Trace:
 [<ffffffffa0089be5>] ? btrfs_alloc_path+0x15/0x20 [btrfs]
 [<ffffffffa00968fa>] __btrfs_inc_extent_ref+0x9a/0x1f0 [btrfs]
 [<ffffffffa0098097>] run_delayed_tree_ref+0x167/0x190 [btrfs]
 [<ffffffffa009bf1e>] run_one_delayed_ref+0xde/0xf0 [btrfs]
 [<ffffffffa009c00d>] run_clustered_refs+0xdd/0x370 [btrfs]
 [<ffffffffa009c3e9>] btrfs_run_delayed_refs+0x149/0x340 [btrfs]
 [<ffffffffa00adcb7>] __btrfs_end_transaction+0xa7/0x360 [btrfs]
 [<ffffffffa00adfd0>] btrfs_end_transaction+0x10/0x20 [btrfs]
 [<ffffffffa00b4985>] btrfs_finish_ordered_io+0x185/0x3b0 [btrfs]
 [<ffffffff81095bad>] ? trace_hardirqs_on+0xd/0x10
 [<ffffffffa00b4bc0>] finish_ordered_fn+0x10/0x20 [btrfs]
 [<ffffffffa00de6d6>] worker_loop+0x86/0x330 [btrfs]
 [<ffffffffa00de650>] ? check_pending_worker_creates.isra.1+0xd0/0xd0
[btrfs]
 [<ffffffff8105e2ee>] kthread+0x8e/0xa0
 [<ffffffff815b9314>] kernel_thread_helper+0x4/0x10
 [<ffffffff81069a77>] ? finish_task_switch+0x77/0x100
 [<ffffffff815b754b>] ? _raw_spin_unlock_irq+0x2b/0x50
 [<ffffffff815b79d9>] ? retint_restore_args+0xe/0xe
 [<ffffffff8105e260>] ? __init_kthread_worker+0x70/0x70
 [<ffffffff815b9310>] ? gs_change+0xb/0xb
Code:
c0 eb a4 48 8b 45 20 4c 89 f1 48 c7 c7 58 d4 10 a0 4c 8b 4d 18 4c 89
fa 4c 8b 45 10 48 8b 75 b8 48 89 04 24 31 c0 e8 32 b3 50 e1
f>0b 0f 1f 00 55 48 89 e5 48 83 c4 80 48 89 5d d8 4c 89 65 e0
RIP [<ffffffffa009685b>] insert_inline_extent_backref+0x11b/0x120 [btrfs]
RSP <ffff880200415a40>
-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

David Sterba

2012-Jul-03 14:39 UTC

head link

Re: Please hammer my for-linus branch

On Mon, Jul 02, 2012 at 04:17:37PM -0400, Chris Mason
wrote:> Ok, I''ve just rebased for-linus.  I''ve dropped
Josef''s enospc patch,
> which should fix the regression Dave hit.
JFYI, fixed. No other problems observed so far.

david
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Daniel J Blueman

2012-Jul-04 03:37 UTC

head link

Re: Please hammer my for-linus branch

> Hi everyone,
>
> I''ve got a nice set of fixes from Josef, Jan, Ilya and others in
my
> for-linus branch:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git
for-linus
>
> Some of the changes are fixes for the tree logging code, so I ran some
> extra crash runs against them Friday night.
>
> I ended up with a new crash in the tree log directory deletion replay
> code, so I didn''t send out the pull request to Linus.
>
> It isn''t clear yet if the new crash is because I was testing
differently
> or if it is a regression.  I''m nailing it down this weekend, but
please
> give my for-linus a shot.
I consistently run into this assertion [1] while running a fio
workload on a fresh RAID10 filesystem with a balance running.

Let me know if you need steps to reproduce, debug etc.

Thanks,
  Daniel

--- [1]

kernel BUG at fs/btrfs/extent-tree.c:1728!
invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC CPU 1

Modules linked in: brd dm_crypt dm_mod kvm_intel kvm binfmt_misc
coretemp microcode uvcvideo videobuf2_core videodev videobuf2_vmalloc
videobuf2_memops iwlwifi netconsole btrfs i915 cfbcopyarea cfbimgblt
cfbfillrect video

Pid: 31436, comm: btrfs Tainted: G        W    3.4.0-debug+ #6 Dell
Inc. Latitude E5420/0H5TG2
RIP: 0010:[<ffffffffa00ad739>]  [<ffffffffa00ad739>]
update_inline_extent_backref+0x2a9/0x2b0 [btrfs]
RSP: 0018:ffff88021dfab858  EFLAGS: 00010213
RAX: 00000000000000b0 RBX: ffff8802061555a0 RCX: ffff88021cf1d000
RDX: 0000000000000000 RSI: 0000000000000f3a RDI: ffff8800c4e5bc20
RBP: ffff88021dfab8b8 R08: 00000000000000b0 R09: ffff88021dfab808
R10: 0000000000000000 R11: 0000000000000000 R12: ffff8800c4e5bc20
R13: 0000000000000001 R14: 0000000000000001 R15: 0000000000000f10
FS:  00007fdb25012740(0000) GS:ffff88022ec40000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007f882b763f70 CR3: 000000021d547000 CR4: 00000000000407e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process btrfs (pid: 31436, threadinfo ffff88021dfaa000, task ffff88021fdc5ca0)
Stack:
 ffff8801ff3ed000 000000098123124f ffff8801ff3ed000 ffff8802083270a0
 0000000000000000 0000000000000f3a ffff880206156000 ffff8802061555a0
 ffff8801ff3ed000 ffff8802083270a0 0000000000000000 0000000000000000
Call Trace:
 [<ffffffffa00ad7c8>] insert_inline_extent_backref+0x88/0x100 [btrfs]
 [<ffffffffa00a0be5>] ? btrfs_alloc_path+0x15/0x20 [btrfs]
 [<ffffffffa00ad8da>] __btrfs_inc_extent_ref+0x9a/0x1f0 [btrfs]
 [<ffffffffa00af077>] run_delayed_tree_ref+0x167/0x190 [btrfs]
 [<ffffffffa00b2efe>] run_one_delayed_ref+0xde/0xf0 [btrfs]
 [<ffffffffa00b2fed>] run_clustered_refs+0xdd/0x370 [btrfs]
 [<ffffffffa00b33c9>] btrfs_run_delayed_refs+0x149/0x340 [btrfs]
 [<ffffffffa00c4c97>] __btrfs_end_transaction+0xa7/0x360 [btrfs]
 [<ffffffffa00c4f93>] btrfs_end_transaction_throttle+0x13/0x20 [btrfs]
 [<ffffffffa01114e9>] relocate_block_group+0x439/0x560 [btrfs]
 [<ffffffffa01117d4>] btrfs_relocate_block_group+0x1c4/0x300 [btrfs]
 [<ffffffffa00eea4a>] btrfs_relocate_chunk.isra.52+0x4a/0x240 [btrfs]
 [<ffffffffa00e9722>] ? free_extent_buffer+0x32/0x90 [btrfs]
 [<ffffffffa00f1db4>] __btrfs_balance+0x2f4/0x3f0 [btrfs]
 [<ffffffffa00f21a3>] btrfs_balance+0x2f3/0x4d0 [btrfs]
 [<ffffffffa00f7f30>] btrfs_ioctl_balance+0x140/0x440 [btrfs]
 [<ffffffffa00fbd67>] btrfs_ioctl+0x5c7/0x7f0 [btrfs]
 [<ffffffff810f1616>] ? do_brk+0x246/0x360
 [<ffffffff8112f607>] do_vfs_ioctl+0x87/0x340
 [<ffffffff8122a434>] ? lockdep_sys_exit_thunk+0x35/0x67
 [<ffffffff8112f90a>] sys_ioctl+0x4a/0x80
 [<ffffffff815b8122>] system_call_fastpath+0x16/0x1b
Code:e8 5d f6 02 00 45 31 c9 48 8b 4d a0 89 c2 44 8b 45 a8 eb b5 66 0f
1f 44 00 00 41 bd 0d 00 00 00 00 41 be 0d 00 00 00 00 e9 6b fe ff ff
f>0b 0f 0b 0f 1f 00 55 48 89 e5 48 83 c4 80 48 8b 45 20 4c 89
RIP  [<ffffffffa00ad739>] update_inline_extent_backref+0x2a9/0x2b0 [btrfs]
 RSP <ffff88021dfab858>
-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Liu Bo

2012-Jul-04 05:19 UTC

head link

Re: Please hammer my for-linus branch

On 07/04/2012 11:37 AM, Daniel J Blueman wrote:
>> Hi everyone,
>>
>> I''ve got a nice set of fixes from Josef, Jan, Ilya and others
in my
>> for-linus branch:
>>
>> git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git
for-linus
>>
>> Some of the changes are fixes for the tree logging code, so I ran some
>> extra crash runs against them Friday night.
>>
>> I ended up with a new crash in the tree log directory deletion replay
>> code, so I didn''t send out the pull request to Linus.
>>
>> It isn''t clear yet if the new crash is because I was testing
differently
>> or if it is a regression.  I''m nailing it down this weekend,
but please
>> give my for-linus a shot.
> 
> I consistently run into this assertion [1] while running a fio
> workload on a fresh RAID10 filesystem with a balance running.
> 
> Let me know if you need steps to reproduce, debug etc.
> 

Seems that additional condition does not catch the bug.

Plz show us the steps to reproduce, I''ll try to reproduce it locally
and nail it down.

thanks,
liubo
> Thanks,
>   Daniel
> 
> --- [1]
> 
> kernel BUG at fs/btrfs/extent-tree.c:1728!
> invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC CPU 1
> 
> Modules linked in: brd dm_crypt dm_mod kvm_intel kvm binfmt_misc
> coretemp microcode uvcvideo videobuf2_core videodev videobuf2_vmalloc
> videobuf2_memops iwlwifi netconsole btrfs i915 cfbcopyarea cfbimgblt
> cfbfillrect video
> 
> Pid: 31436, comm: btrfs Tainted: G        W    3.4.0-debug+ #6 Dell
> Inc. Latitude E5420/0H5TG2
> RIP: 0010:[<ffffffffa00ad739>]  [<ffffffffa00ad739>]
> update_inline_extent_backref+0x2a9/0x2b0 [btrfs]
> RSP: 0018:ffff88021dfab858  EFLAGS: 00010213
> RAX: 00000000000000b0 RBX: ffff8802061555a0 RCX: ffff88021cf1d000
> RDX: 0000000000000000 RSI: 0000000000000f3a RDI: ffff8800c4e5bc20
> RBP: ffff88021dfab8b8 R08: 00000000000000b0 R09: ffff88021dfab808
> R10: 0000000000000000 R11: 0000000000000000 R12: ffff8800c4e5bc20
> R13: 0000000000000001 R14: 0000000000000001 R15: 0000000000000f10
> FS:  00007fdb25012740(0000) GS:ffff88022ec40000(0000)
knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 00007f882b763f70 CR3: 000000021d547000 CR4: 00000000000407e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process btrfs (pid: 31436, threadinfo ffff88021dfaa000, task
ffff88021fdc5ca0)
> Stack:
>  ffff8801ff3ed000 000000098123124f ffff8801ff3ed000 ffff8802083270a0
>  0000000000000000 0000000000000f3a ffff880206156000 ffff8802061555a0
>  ffff8801ff3ed000 ffff8802083270a0 0000000000000000 0000000000000000
> Call Trace:
>  [<ffffffffa00ad7c8>] insert_inline_extent_backref+0x88/0x100 [btrfs]
>  [<ffffffffa00a0be5>] ? btrfs_alloc_path+0x15/0x20 [btrfs]
>  [<ffffffffa00ad8da>] __btrfs_inc_extent_ref+0x9a/0x1f0 [btrfs]
>  [<ffffffffa00af077>] run_delayed_tree_ref+0x167/0x190 [btrfs]
>  [<ffffffffa00b2efe>] run_one_delayed_ref+0xde/0xf0 [btrfs]
>  [<ffffffffa00b2fed>] run_clustered_refs+0xdd/0x370 [btrfs]
>  [<ffffffffa00b33c9>] btrfs_run_delayed_refs+0x149/0x340 [btrfs]
>  [<ffffffffa00c4c97>] __btrfs_end_transaction+0xa7/0x360 [btrfs]
>  [<ffffffffa00c4f93>] btrfs_end_transaction_throttle+0x13/0x20
[btrfs]
>  [<ffffffffa01114e9>] relocate_block_group+0x439/0x560 [btrfs]
>  [<ffffffffa01117d4>] btrfs_relocate_block_group+0x1c4/0x300 [btrfs]
>  [<ffffffffa00eea4a>] btrfs_relocate_chunk.isra.52+0x4a/0x240 [btrfs]
>  [<ffffffffa00e9722>] ? free_extent_buffer+0x32/0x90 [btrfs]
>  [<ffffffffa00f1db4>] __btrfs_balance+0x2f4/0x3f0 [btrfs]
>  [<ffffffffa00f21a3>] btrfs_balance+0x2f3/0x4d0 [btrfs]
>  [<ffffffffa00f7f30>] btrfs_ioctl_balance+0x140/0x440 [btrfs]
>  [<ffffffffa00fbd67>] btrfs_ioctl+0x5c7/0x7f0 [btrfs]
>  [<ffffffff810f1616>] ? do_brk+0x246/0x360
>  [<ffffffff8112f607>] do_vfs_ioctl+0x87/0x340
>  [<ffffffff8122a434>] ? lockdep_sys_exit_thunk+0x35/0x67
>  [<ffffffff8112f90a>] sys_ioctl+0x4a/0x80
>  [<ffffffff815b8122>] system_call_fastpath+0x16/0x1b
> Code:e8 5d f6 02 00 45 31 c9 48 8b 4d a0 89 c2 44 8b 45 a8 eb b5 66 0f
> 1f 44 00 00 41 bd 0d 00 00 00 00 41 be 0d 00 00 00 00 e9 6b fe ff ff
> f>0b 0f 0b 0f 1f 00 55 48 89 e5 48 83 c4 80 48 8b 45 20 4c 89
> RIP  [<ffffffffa00ad739>] update_inline_extent_backref+0x2a9/0x2b0
[btrfs]
>  RSP <ffff88021dfab858>

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Daniel J Blueman

2012-Jul-04 06:53 UTC

head link

Re: Please hammer my for-linus branch

On 4 July 2012 13:19, Liu Bo <liubo2009@cn.fujitsu.com>
wrote:> On 07/04/2012 11:37 AM, Daniel J Blueman wrote:
>>> Hi everyone,
>>>
>>> I''ve got a nice set of fixes from Josef, Jan, Ilya and
others in my
>>> for-linus branch:
>>>
>>> git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git
for-linus
>>>
>>> Some of the changes are fixes for the tree logging code, so I ran
some
>>> extra crash runs against them Friday night.
>>>
>>> I ended up with a new crash in the tree log directory deletion
replay
>>> code, so I didn''t send out the pull request to Linus.
>>>
>>> It isn''t clear yet if the new crash is because I was
testing differently
>>> or if it is a regression.  I''m nailing it down this
weekend, but please
>>> give my for-linus a shot.
>>
>> I consistently run into this assertion [1] while running a fio
>> workload on a fresh RAID10 filesystem with a balance running.
>>
>> Let me know if you need steps to reproduce, debug etc.
>
> Seems that additional condition does not catch the bug.
>
> Plz show us the steps to reproduce, I''ll try to reproduce it
locally and nail it down.
The reproducer auto-generated from my test [1] consistently hits the
spot here; config @ http://quora.org/2012/kconfig-btrfs . You''ll need
the fio workload file [2] in the same dir.

Thanks,
  Daniel

--- [1]

#!/bin/bash -ex

modprobe brd rd_size=1572864 rd_nr=4
# or use kernel param: ramdisk_size=1572864
mkdir -p /tmp/btrfsathon
sync

mkfs.btrfs -m raid1 -d raid1 -l 4096 -n 4096 /dev/ram2 /dev/ram3 /dev/ram1
mount /dev/ram1 /tmp/btrfsathon -o nodatacow,autodefrag,ssd,flushoncommit
btrfs filesystem defragment /tmp/btrfsathon ||: & sleep 0.017
fio --timeout=60 ./workload ||: & sleep 0.000
btrfs filesystem defragment /tmp/btrfsathon ||: & sleep 0.012
btrfs filesystem defragment /tmp/btrfsathon ||: & sleep 0.010
btrfs filesystem defragment /tmp/btrfsathon ||: & sleep 0.003
btrfs filesystem defragment /tmp/btrfsathon ||: & sleep 0.003
btrfs filesystem balance /tmp/btrfsathon ||: & sleep 0.003
fio --timeout=60 ./workload ||: & sleep 0.000
wait
umount /tmp/btrfsathon

--- [2] ''workload''

[global]
directory=/tmp/btrfsathon
rw=randread
size=128m
ioengine=libaio
iodepth=32
invalidate=1
direct=1

[bgwriter]
rw=randwrite
iodepth=32

[queryA]
iodepth=2
ioengine=mmap
direct=0
thinktime=1

[queryB]
iodepth=2
ioengine=mmap
direct=0
thinktime=1

[bgupdater]
rw=randrw
iodepth=32
size=64m
-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Josef Bacik

2012-Jul-05 13:26 UTC

head link

Re: Please hammer my for-linus branch

On Wed, Jul 04, 2012 at 12:53:54AM -0600, Daniel J Blueman
wrote:> On 4 July 2012 13:19, Liu Bo <liubo2009@cn.fujitsu.com> wrote:
> > On 07/04/2012 11:37 AM, Daniel J Blueman wrote:
> >>> Hi everyone,
> >>>
> >>> I''ve got a nice set of fixes from Josef, Jan, Ilya
and others in my
> >>> for-linus branch:
> >>>
> >>>
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git for-linus
> >>>
> >>> Some of the changes are fixes for the tree logging code, so I
ran some
> >>> extra crash runs against them Friday night.
> >>>
> >>> I ended up with a new crash in the tree log directory deletion
replay
> >>> code, so I didn''t send out the pull request to Linus.
> >>>
> >>> It isn''t clear yet if the new crash is because I was
testing differently
> >>> or if it is a regression.  I''m nailing it down this
weekend, but please
> >>> give my for-linus a shot.
> >>
> >> I consistently run into this assertion [1] while running a fio
> >> workload on a fresh RAID10 filesystem with a balance running.
> >>
> >> Let me know if you need steps to reproduce, debug etc.
> >
> > Seems that additional condition does not catch the bug.
> >
> > Plz show us the steps to reproduce, I''ll try to reproduce it
locally and nail it down.
> 
> The reproducer auto-generated from my test [1] consistently hits the
> spot here; config @ http://quora.org/2012/kconfig-btrfs . You''ll
need
> the fio workload file [2] in the same dir.
> 
Wow I hit this straight away, I will look into it, thanks!

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Josef Bacik

2012-Jul-06 20:59 UTC

head link

Re: Please hammer my for-linus branch

On Wed, Jul 04, 2012 at 12:53:54AM -0600, Daniel J Blueman
wrote:> On 4 July 2012 13:19, Liu Bo <liubo2009@cn.fujitsu.com> wrote:
> > On 07/04/2012 11:37 AM, Daniel J Blueman wrote:
> >>> Hi everyone,
> >>>
> >>> I''ve got a nice set of fixes from Josef, Jan, Ilya
and others in my
> >>> for-linus branch:
> >>>
> >>>
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git for-linus
> >>>
> >>> Some of the changes are fixes for the tree logging code, so I
ran some
> >>> extra crash runs against them Friday night.
> >>>
> >>> I ended up with a new crash in the tree log directory deletion
replay
> >>> code, so I didn''t send out the pull request to Linus.
> >>>
> >>> It isn''t clear yet if the new crash is because I was
testing differently
> >>> or if it is a regression.  I''m nailing it down this
weekend, but please
> >>> give my for-linus a shot.
> >>
> >> I consistently run into this assertion [1] while running a fio
> >> workload on a fresh RAID10 filesystem with a balance running.
> >>
> >> Let me know if you need steps to reproduce, debug etc.
> >
> > Seems that additional condition does not catch the bug.
> >
> > Plz show us the steps to reproduce, I''ll try to reproduce it
locally and nail it down.
> 
> The reproducer auto-generated from my test [1] consistently hits the
> spot here; config @ http://quora.org/2012/kconfig-btrfs . You''ll
need
> the fio workload file [2] in the same dir.
> 
Well that was a huge pain in the ass, you are going to have to tell me how to
fix this Arne or fix it yourself.  The problem was introduced here

00f04b88791ff49dc64ada18819d40a5b0671709

The problem is we no longer merge delayed refs on the fs trees anymore, and
somehow we end up with this sequence of events

alloc block
add backref for some random block
remove implicit backref
add implicit backref back <-- I''m not entirely sure why/how this
happens, I just
			assume its some relocate magic
run refs

because we do the sequence thing we go to add the implicit backref and panic
because we find there is one already there, and that''s not supposed to
happen
with tree blocks.  If we had run the remove first we would have been fine or if
we had just merged the delayed refs they would have cancelled each other out and
we would have been fine.  In order to test this theory I took the seq
comparisons out of comp_entry in delayed-refs.c and the test has been running
for about 20 minutes, before it would die in less than 30 seconds.  So why is
this needed?  I assume you need it for something, but I figure its easier for
you to fix this than for me to go figure out what it''s used for. 
Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Daniel J Blueman

2012-Jul-10 12:18 UTC

head link

Re: Please hammer my for-linus branch

On 2 July 2012 12:20, Liu Bo <liubo2009@cn.fujitsu.com>
wrote:> On 07/02/2012 11:35 AM, Daniel J Blueman wrote:
>
>>> Hi everyone,
>>>
>>> I''ve got a nice set of fixes from Josef, Jan, Ilya and
others in my
>>> for-linus branch:
>>>
>>> git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git
for-linus
>>>
>>> Some of the changes are fixes for the tree logging code, so I ran
some
>>> extra crash runs against them Friday night.
>>>
>>> I ended up with a new crash in the tree log directory deletion
replay
>>> code, so I didn''t send out the pull request to Linus.
>>>
>>> It isn''t clear yet if the new crash is because I was
testing differently
>>> or if it is a regression.  I''m nailing it down this
weekend, but please
>>> give my for-linus a shot.
>>
>> With this branch (3.4.0), my test has consistently been hitting the
>> BUG_ON(owner < BTRFS_FIRST_FREE_OBJECTID) in
>> insert_inline_extent_backref [1]. This is followed by a string of
>> other issues [2] and a hard lockup, so I used netconsole to collect
>> this.
>>
>> I''m preparing my btrfs test for xfstests integration, but can
slip you
>> it if interested. It hits this case in ~30s.
>>
>
>
> IMO the BUG_ON is meant to avoid to mix ''log tree'' in, it
should be:
>
> BUG_ON(owner < BTRFS_FIRST_FREE_OBJECTID && root_objectid ==
BTRFS_TREE_LOG_OBJECTID);
>
> This should help you, can you give it a try?
Bo, this did address the assertion I was tripping, so looks good from
here; it allowed me to report the second (different) assertion of
course.

If you still think the fix is sound, is it a good idea for 3.5-rc7?

Thanks,
  Daniel
-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Liu Bo

2012-Jul-11 01:37 UTC

head link

Re: Please hammer my for-linus branch

On 07/10/2012 08:18 PM, Daniel J Blueman wrote:
> On 2 July 2012 12:20, Liu Bo <liubo2009@cn.fujitsu.com> wrote:
>> On 07/02/2012 11:35 AM, Daniel J Blueman wrote:
>>
>>>> Hi everyone,
>>>>
>>>> I''ve got a nice set of fixes from Josef, Jan, Ilya and
others in my
>>>> for-linus branch:
>>>>
>>>>
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git for-linus
>>>>
>>>> Some of the changes are fixes for the tree logging code, so I
ran some
>>>> extra crash runs against them Friday night.
>>>>
>>>> I ended up with a new crash in the tree log directory deletion
replay
>>>> code, so I didn''t send out the pull request to Linus.
>>>>
>>>> It isn''t clear yet if the new crash is because I was
testing differently
>>>> or if it is a regression.  I''m nailing it down this
weekend, but please
>>>> give my for-linus a shot.
>>> With this branch (3.4.0), my test has consistently been hitting the
>>> BUG_ON(owner < BTRFS_FIRST_FREE_OBJECTID) in
>>> insert_inline_extent_backref [1]. This is followed by a string of
>>> other issues [2] and a hard lockup, so I used netconsole to collect
>>> this.
>>>
>>> I''m preparing my btrfs test for xfstests integration, but
can slip you
>>> it if interested. It hits this case in ~30s.
>>>
>>
>> IMO the BUG_ON is meant to avoid to mix ''log tree''
in, it should be:
>>
>> BUG_ON(owner < BTRFS_FIRST_FREE_OBJECTID && root_objectid ==
BTRFS_TREE_LOG_OBJECTID);
>>
>> This should help you, can you give it a try?
> 
> Bo, this did address the assertion I was tripping, so looks good from
> here; it allowed me to report the second (different) assertion of
> course.
> 
> If you still think the fix is sound, is it a good idea for 3.5-rc7?
> 

Hi Daniel,

I''m sorry but it is not ready yet, as it does not catch the root cause
of the bug.

Josef has found that the bug comes from disabling merging delayed refs and is
working on the bug
with Arne.  As the root cause has been found, the bug will be fixed soon IMO.

Btw, while testing with your great test scripts, I also post patches for two
bugs, which may have address your
other issues.  Their links are

http://www.spinics.net/lists/linux-btrfs/msg17761.html
http://www.spinics.net/lists/linux-btrfs/msg17764.html

thanks,
liubo
> Thanks,
>   Daniel

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Daniel J Blueman

2012-Jul-11 02:01 UTC

head link

Re: Please hammer my for-linus branch

On 11 July 2012 09:37, Liu Bo <liubo2009@cn.fujitsu.com>
wrote:> On 07/10/2012 08:18 PM, Daniel J Blueman wrote:
>
>> On 2 July 2012 12:20, Liu Bo <liubo2009@cn.fujitsu.com> wrote:
>>> On 07/02/2012 11:35 AM, Daniel J Blueman wrote:
>>>
>>>>> Hi everyone,
>>>>>
>>>>> I''ve got a nice set of fixes from Josef, Jan, Ilya
and others in my
>>>>> for-linus branch:
>>>>>
>>>>>
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git for-linus
>>>>>
>>>>> Some of the changes are fixes for the tree logging code, so
I ran some
>>>>> extra crash runs against them Friday night.
>>>>>
>>>>> I ended up with a new crash in the tree log directory
deletion replay
>>>>> code, so I didn''t send out the pull request to
Linus.
>>>>>
>>>>> It isn''t clear yet if the new crash is because I
was testing differently
>>>>> or if it is a regression.  I''m nailing it down
this weekend, but please
>>>>> give my for-linus a shot.
>>>> With this branch (3.4.0), my test has consistently been hitting
the
>>>> BUG_ON(owner < BTRFS_FIRST_FREE_OBJECTID) in
>>>> insert_inline_extent_backref [1]. This is followed by a string
of
>>>> other issues [2] and a hard lockup, so I used netconsole to
collect
>>>> this.
>>>>
>>>> I''m preparing my btrfs test for xfstests integration,
but can slip you
>>>> it if interested. It hits this case in ~30s.
>>>>
>>>
>>> IMO the BUG_ON is meant to avoid to mix ''log
tree'' in, it should be:
>>>
>>> BUG_ON(owner < BTRFS_FIRST_FREE_OBJECTID &&
root_objectid == BTRFS_TREE_LOG_OBJECTID);
>>>
>>> This should help you, can you give it a try?
>>
>> Bo, this did address the assertion I was tripping, so looks good from
>> here; it allowed me to report the second (different) assertion of
>> course.
>>
>> If you still think the fix is sound, is it a good idea for 3.5-rc7?
>
>
> Hi Daniel,
>
> I''m sorry but it is not ready yet, as it does not catch the root
cause of the bug.
>
> Josef has found that the bug comes from disabling merging delayed refs and
is working on the bug
> with Arne.  As the root cause has been found, the bug will be fixed soon
IMO.
Now I see the two issues are connected.
> Btw, while testing with your great test scripts, I also post patches for
two bugs, which may have address your
> other issues.  Their links are
>
> http://www.spinics.net/lists/linux-btrfs/msg17761.html
> http://www.spinics.net/lists/linux-btrfs/msg17764.html
Great work indeed!

Thanks Bo,
  Daniel
-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Btrfs devel - Jul 2012 - Please hammer my for-linus branch

Please hammer my for-linus branch

Re: Please hammer my for-linus branch

Re: Please hammer my for-linus branch

Re: Please hammer my for-linus branch

Re: xfstests/224 lockup/slowdown (was: Please hammer my for-linus branch)

Re: xfstests/224 lockup/slowdown (was: Please hammer my for-linus branch)

Re: xfstests/224 lockup/slowdown (was: Please hammer my for-linus branch)

Re: Please hammer my for-linus branch

Re: Please hammer my for-linus branch

Re: Please hammer my for-linus branch

Re: Please hammer my for-linus branch

Re: Please hammer my for-linus branch

Re: Please hammer my for-linus branch

Re: Please hammer my for-linus branch

Re: Please hammer my for-linus branch

Re: Please hammer my for-linus branch

Re: Please hammer my for-linus branch

Re: Please hammer my for-linus branch