thr3ads.net - Btrfs devel - kernel BUG at fs/btrfs/extent

If this information is useful, please help other people find it:
Share via:

Jim Schutt

2012-Apr-10 19:39 UTC

kernel BUG at fs/btrfs/extent_io.c:3982!

Hi,

I hit this BUG today.

I''m running 3.3.1 merged with the ceph and btrfs bits for 3.4,
i.e. 3.3.1 +
   commit bc3f116fec194 "Btrfs: update the checks for mixed block groups
with big metadata blocks"
   commit c666601a935b9 "rbd: move snap_rwsem to the device, rename to
header_rwsem"

The btrfs filesystem in question is backing a Ceph OSD under
a heavy write load.

Here''s the bug:

[510342.517157] ------------[ cut here ]------------
[510342.521855] kernel BUG at fs/btrfs/extent_io.c:3982!
[510342.526894] invalid opcode: 0000 [#1] SMP
[510342.531102] CPU 4
[510342.533028] Modules linked in: btrfs zlib_deflate ib_ipoib rdma_ucm ib_ucm
ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 ib_sa iw_cxgb4 dm_mirror
dm_region_hash dm_log dm_round_robin dm_multipath scsi_dh vhost_net macvtap
macvlan tun kvm uinput sg sd_mod joydev ata_piix libata button microcode mpt2sas
scsi_transport_sas raid_class scsi_mod serio_raw pcspkr mlx4_ib ib_mad ib_core
mlx4_en mlx4_core cxgb4 i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support ehci_hcd
uhci_hcd ioatdma dm_mod i7core_edac edac_core nfs nfs_acl auth_rpcgss fscache
lockd sunrpc tg3 bnx2 igb dca e1000 [last unloaded: scsi_wait_scan]
[510342.587836]
[510342.589412] Pid: 16609, comm: kworker/4:2 Not tainted 3.3.1-00162-gd8b2857
#15 Supermicro X8DTH-i/6/iF/6F/X8DTH
[510342.599601] RIP: 0010:[<ffffffffa057924c>]  [<ffffffffa057924c>]
btrfs_release_extent_buffer_page.clone.0+0x2c/0x130 [btrfs]
[510342.610893] RSP: 0018:ffff88015fb6ba10  EFLAGS: 00010202
[510342.616277] RAX: 0000000000000004 RBX: ffff880ab81865a0 RCX:
ffff880174bc0230
[510342.623476] RDX: ffff8801335bf9b1 RSI: 00000000000d0fb8 RDI:
ffff880ab81865a0
[510342.630675] RBP: ffff88015fb6ba40 R08: 0000000000000038 R09:
0000000000000003
[510342.637874] R10: 0000000000000008 R11: ffff8804658c9e40 R12:
ffff88015fb6a000
[510342.645069] R13: ffff880ab81865a0 R14: 000000000000000e R15:
ffff88015fb6bc10
[510342.652268] FS:  0000000000000000(0000) GS:ffff880627c80000(0000)
knlGS:0000000000000000
[510342.660418] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[510342.666234] CR2: ffffffffff600400 CR3: 0000000001a05000 CR4:
00000000000006e0
[510342.673427] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[510342.680627] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
0000000000000400
[510342.687827] Process kworker/4:2 (pid: 16609, threadinfo ffff88015fb6a000,
task ffff880102ca4410)
[510342.696669] Stack:
[510342.698769]  ffff880100000000 ffff880ab81865a0 ffff88015fb6a000
ffff8806057d2eb0
[510342.706297]  000000000000000e ffff88015fb6bc10 ffff88015fb6ba70
ffffffffa05793f2
[510342.713825]  ffff88015fb6bb80 ffff880ab81865a0 ffff88015fb6bb50
0000000000000008
[510342.721362] Call Trace:
[510342.723912]  [<ffffffffa05793f2>] release_extent_buffer+0xa2/0xe0
[btrfs]
[510342.730790]  [<ffffffffa05795b4>] free_extent_buffer+0x34/0x80 [btrfs]
[510342.737407]  [<ffffffffa057a126>] btree_write_cache_pages+0x246/0x410
[btrfs]
[510342.744637]  [<ffffffffa054e96a>] btree_writepages+0x3a/0x50 [btrfs]
[510342.751060]  [<ffffffff810fc421>] do_writepages+0x21/0x40
[510342.756537]  [<ffffffff810f0b0b>] __filemap_fdatawrite_range+0x5b/0x60
[510342.763136]  [<ffffffff810f0de3>] filemap_fdatawrite_range+0x13/0x20
[510342.769568]  [<ffffffffa0554ecf>] btrfs_write_marked_extents+0x7f/0xe0
[btrfs]
[510342.776867]  [<ffffffffa0554f5e>]
btrfs_write_and_wait_marked_extents+0x2e/0x60 [btrfs]
[510342.784951]  [<ffffffffa0554fbb>]
btrfs_write_and_wait_transaction+0x2b/0x50 [btrfs]
[510342.792768]  [<ffffffffa055604c>] btrfs_commit_transaction+0x7ac/0xa10
[btrfs]
[510342.800060]  [<ffffffff81079540>] ? set_next_entity+0x90/0xa0
[510342.805875]  [<ffffffff8105f5d0>] ? wake_up_bit+0x40/0x40
[510342.811365]  [<ffffffffa0556590>] ? btrfs_end_transaction+0x20/0x20
[btrfs]
[510342.818403]  [<ffffffffa05565af>] do_async_commit+0x1f/0x30 [btrfs]
[510342.824748]  [<ffffffffa0556590>] ? btrfs_end_transaction+0x20/0x20
[btrfs]
[510342.831774]  [<ffffffff81058680>] process_one_work+0x140/0x490
[510342.837673]  [<ffffffff8105a417>] worker_thread+0x187/0x3f0
[510342.843319]  [<ffffffff8105a290>] ? manage_workers+0x120/0x120
[510342.849225]  [<ffffffff8105f02e>] kthread+0x9e/0xb0
[510342.854176]  [<ffffffff81486c64>] kernel_thread_helper+0x4/0x10
[510342.860168]  [<ffffffff8147d84a>] ? retint_restore_args+0xe/0xe
[510342.866161]  [<ffffffff8105ef90>] ?
kthread_freezable_should_stop+0x80/0x80
[510342.873198]  [<ffffffff81486c60>] ? gs_change+0xb/0xb
[510342.878322] Code: 48 89 e5 41 57 41 56 41 55 41 54 53 48 83 ec 08 66 66 66
66 90 8b 47 38 49 89 fd 85 c0 75 0c 48 8b 47 20 4c 8d 7f 20 84 c0 79 04
<0f> 0b eb fe 48 8b 47 20 a8 04 75 f4 48 8b 07 49 89 c4 4c 03 67
[510342.898331] RIP  [<ffffffffa057924c>]
btrfs_release_extent_buffer_page.clone.0+0x2c/0x130 [btrfs]
[510342.907294]  RSP <ffff88015fb6ba10>
[510342.911241] ---[ end trace 62013c6b6e2e5135 ]---


Please let me know if there is anything I can do
to help track this down.

Thanks -- Jim

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Chris Mason

2012-Apr-10 20:24 UTC

head link

Re: kernel BUG at fs/btrfs/extent_io.c:3982!

On Tue, Apr 10, 2012 at 01:39:14PM -0600, Jim Schutt
wrote:> Hi,
> 
> I hit this BUG today.
> 
> I''m running 3.3.1 merged with the ceph and btrfs bits for 3.4,
> i.e. 3.3.1 +
>   commit bc3f116fec194 "Btrfs: update the checks for mixed block
groups with big metadata blocks"
>   commit c666601a935b9 "rbd: move snap_rwsem to the device, rename to
header_rwsem"
> 
> The btrfs filesystem in question is backing a Ceph OSD under
> a heavy write load.
> 
> Here''s the bug:
> 
> [510342.517157] ------------[ cut here ]------------
> [510342.521855] kernel BUG at fs/btrfs/extent_io.c:3982!
Could you please confirm that line number is this BUG_ON()

        BUG_ON(extent_buffer_under_io(eb));

Josef has a theory on this one, but I want to make sure we''re chasing
the right thing.

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jim Schutt

2012-Apr-10 20:32 UTC

head link

Re: kernel BUG at fs/btrfs/extent_io.c:3982!

On 04/10/2012 02:24 PM, Chris Mason wrote:> On Tue, Apr 10, 2012 at 01:39:14PM -0600, Jim Schutt wrote:
>> Hi,
>>
>> I hit this BUG today.
>>
>> I''m running 3.3.1 merged with the ceph and btrfs bits for 3.4,
>> i.e. 3.3.1 +
>>    commit bc3f116fec194 "Btrfs: update the checks for mixed block
groups with big metadata blocks"
>>    commit c666601a935b9 "rbd: move snap_rwsem to the device,
rename to header_rwsem"
>>
>> The btrfs filesystem in question is backing a Ceph OSD under
>> a heavy write load.
>>
>> Here''s the bug:
>>
>> [510342.517157] ------------[ cut here ]------------
>> [510342.521855] kernel BUG at fs/btrfs/extent_io.c:3982!
>
> Could you please confirm that line number is this BUG_ON()
>
>          BUG_ON(extent_buffer_under_io(eb));
Yep, that''s definitely it:

git blame fs/btrfs/extent_io.c | grep -w 3982
0b32f4bb (Josef Bacik        2012-03-13 09:38:00 -0400 3982) 
BUG_ON(extent_buffer_under_io(eb));
>
> Josef has a theory on this one, but I want to make sure we''re
chasing
> the right thing.
Great, thanks.  I''ll be happy to test any patches, if needed.

-- Jim
>
> -chris
>

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Josef Bacik

2012-Apr-11 19:09 UTC

head link

Re: kernel BUG at fs/btrfs/extent_io.c:3982!

On Tue, Apr 10, 2012 at 01:39:14PM -0600, Jim Schutt
wrote:> Hi,
> 
> I hit this BUG today.
> 
> I''m running 3.3.1 merged with the ceph and btrfs bits for 3.4,
> i.e. 3.3.1 +
>   commit bc3f116fec194 "Btrfs: update the checks for mixed block
groups with big metadata blocks"
>   commit c666601a935b9 "rbd: move snap_rwsem to the device, rename to
header_rwsem"
> 
> The btrfs filesystem in question is backing a Ceph OSD under
> a heavy write load.
> 
> Here''s the bug:
> 
Can you give this a whirl and let me know how it goes?  If I''m right
you should
see a warning pop up in your messages.  Thanks,

Josef

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 26fbe1c..0d81fd4 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -55,6 +55,7 @@ struct extent_page_data {
 };
 
 static noinline void flush_write_bio(void *data);
+static void check_buffer_tree_ref(struct extent_buffer *eb);
 static inline struct btrfs_fs_info *
 tree_fs_info(struct extent_io_tree *tree)
 {
@@ -3264,6 +3265,12 @@ retry:
 				continue;
 			}
 
+			if (unlikely(!test_bit(EXTENT_BUFFER_TREE_REF,
+					       &eb->bflags))) {
+				WARN_ON(1);
+				check_buffer_tree_ref(eb);
+			}
+
 			prev_eb = eb;
 			ret = lock_extent_buffer_for_io(eb, fs_info, &epd);
 			if (!ret) {
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jim Schutt

2012-Apr-11 20:24 UTC

head link

Re: kernel BUG at fs/btrfs/extent_io.c:3982!

On 04/11/2012 01:09 PM, Josef Bacik wrote:> On Tue, Apr 10, 2012 at 01:39:14PM -0600, Jim Schutt wrote:
>> Hi,
>>
>> I hit this BUG today.
>>
>> I''m running 3.3.1 merged with the ceph and btrfs bits for 3.4,
>> i.e. 3.3.1 +
>>    commit bc3f116fec194 "Btrfs: update the checks for mixed block
groups with big metadata blocks"
>>    commit c666601a935b9 "rbd: move snap_rwsem to the device,
rename to header_rwsem"
>>
>> The btrfs filesystem in question is backing a Ceph OSD under
>> a heavy write load.
>>
>> Here''s the bug:
>>
>
> Can you give this a whirl and let me know how it goes?  If I''m
right you should
> see a warning pop up in your messages.  Thanks,
OK, I''ve got my test running with your patch applied
to my previous kernel.

Do you expect your warning to only fire when my
previous kernel would have BUGged?  I ask because I''ve
only seen the BUG once, so it may be a low-probability
occurrence.

It seems like I should keep testing until I see either
your new warning or the BUG, right?

Thanks -- Jim
>
> Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Josef Bacik

2012-Apr-11 20:28 UTC

head link

Re: kernel BUG at fs/btrfs/extent_io.c:3982!

On Wed, Apr 11, 2012 at 02:24:30PM -0600, Jim Schutt
wrote:> On 04/11/2012 01:09 PM, Josef Bacik wrote:
> >On Tue, Apr 10, 2012 at 01:39:14PM -0600, Jim Schutt wrote:
> >>Hi,
> >>
> >>I hit this BUG today.
> >>
> >>I''m running 3.3.1 merged with the ceph and btrfs bits for
3.4,
> >>i.e. 3.3.1 +
> >>   commit bc3f116fec194 "Btrfs: update the checks for mixed
block groups with big metadata blocks"
> >>   commit c666601a935b9 "rbd: move snap_rwsem to the device,
rename to header_rwsem"
> >>
> >>The btrfs filesystem in question is backing a Ceph OSD under
> >>a heavy write load.
> >>
> >>Here''s the bug:
> >>
> >
> >Can you give this a whirl and let me know how it goes?  If I''m
right you should
> >see a warning pop up in your messages.  Thanks,
> 
> OK, I''ve got my test running with your patch applied
> to my previous kernel.
> 
> Do you expect your warning to only fire when my
> previous kernel would have BUGged?  I ask because I''ve
> only seen the BUG once, so it may be a low-probability
> occurrence.
> 
> It seems like I should keep testing until I see either
> your new warning or the BUG, right?
> 
So hopefully you will see my WARN with no BUG, but yes keep running until you
see one or the other please ;).  Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jim Schutt

2012-Apr-11 21:39 UTC

head link

Re: kernel BUG at fs/btrfs/extent_io.c:3982!

On 04/11/2012 02:28 PM, Josef Bacik wrote:> On Wed, Apr 11, 2012 at 02:24:30PM -0600, Jim Schutt wrote:
>> On 04/11/2012 01:09 PM, Josef Bacik wrote:
>>> On Tue, Apr 10, 2012 at 01:39:14PM -0600, Jim Schutt wrote:
>>>> Hi,
>>>>
>>>> I hit this BUG today.
>>>>
>>>> I''m running 3.3.1 merged with the ceph and btrfs bits
for 3.4,
>>>> i.e. 3.3.1 +
>>>>    commit bc3f116fec194 "Btrfs: update the checks for
mixed block groups with big metadata blocks"
>>>>    commit c666601a935b9 "rbd: move snap_rwsem to the
device, rename to header_rwsem"
>>>>
>>>> The btrfs filesystem in question is backing a Ceph OSD under
>>>> a heavy write load.
>>>>
>>>> Here''s the bug:
>>>>
>>>
>>> Can you give this a whirl and let me know how it goes?  If
I''m right you should
>>> see a warning pop up in your messages.  Thanks,
>>
>> OK, I''ve got my test running with your patch applied
>> to my previous kernel.
>>
>> Do you expect your warning to only fire when my
>> previous kernel would have BUGged?  I ask because I''ve
>> only seen the BUG once, so it may be a low-probability
>> occurrence.
>>
>> It seems like I should keep testing until I see either
>> your new warning or the BUG, right?
>>
>
> So hopefully you will see my WARN with no BUG, but yes keep running until
you
> see one or the other please ;).  Thanks,
Hmmm, the BUG won:

[ 6202.249041] ------------[ cut here ]------------
[ 6202.253654] kernel BUG at fs/btrfs/extent_io.c:3989!
[ 6202.258607] invalid opcode: 0000 [#1] SMP
[ 6202.262737] CPU 5
[ 6202.264578] Modules linked in: btrfs zlib_deflate ib_ipoib rdma_ucm ib_ucm
ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 ib_sa iw_cxgb4 dm_mirror
dm_region_hash dm_log dm_round_robin dm_multipath scsi_dh vhost_net macvtap
macvlan tun kvm uinput sg joydev sd_mod ata_piix libata microcode button mpt2sas
scsi_transport_sas raid_class scsi_mod serio_raw pcspkr mlx4_ib ib_mad ib_core
mlx4_en mlx4_core cxgb4 i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support ehci_hcd
uhci_hcd ioatdma dm_mod i7core_edac edac_core nfs nfs_acl auth_rpcgss fscache
lockd sunrpc tg3 bnx2 igb dca e1000 [last unloaded: scsi_wait_scan]
[ 6202.319360]
[ 6202.320862] Pid: 1676, comm: kworker/5:2 Not tainted 3.3.1-00163-gdf6ae83 #17
Supermicro X8DTH-i/6/iF/6F/X8DTH
[ 6202.330900] RIP: 0010:[<ffffffffa057724c>]  [<ffffffffa057724c>]
btrfs_release_extent_buffer_page.clone.0+0x2c/0x130 [btrfs]
[ 6202.342121] RSP: 0018:ffff88060c74da00  EFLAGS: 00010202
[ 6202.347417] RAX: 0000000000000004 RBX: ffff88049b4d3b20 RCX: ffff8809135bf9a8
[ 6202.354521] RDX: ffff8802df769cd9 RSI: 00000000001409bc RDI: ffff88049b4d3b20
[ 6202.361626] RBP: ffff88060c74da30 R08: 000000000000003c R09: 0000000000000003
[ 6202.368734] R10: 0000000000000008 R11: ffff8802a9aa6a20 R12: ffff88060c74c000
[ 6202.375848] R13: ffff88049b4d3b20 R14: 000000000000000e R15: ffff88060c74dc10
[ 6202.382963] FS:  0000000000000000(0000) GS:ffff880627ca0000(0000)
knlGS:0000000000000000
[ 6202.391029] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 6202.396758] CR2: ffffffffff600400 CR3: 000000061e956000 CR4: 00000000000006e0
[ 6202.403872] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 6202.410986] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 6202.418104] Process kworker/5:2 (pid: 1676, threadinfo ffff88060c74c000, task
ffff8806166616b0)
[ 6202.426776] Stack:
[ 6202.428792]  ffff880600000000 ffff88049b4d3b20 ffff88060c74c000
ffff8802fc3c3290
[ 6202.436257]  000000000000000e ffff88060c74dc10 ffff88060c74da60
ffffffffa05773f2
[ 6202.443735]  ffff88060c74db80 ffff88049b4d3b20 ffff88060c74db10
0000000000000000
[ 6202.451211] Call Trace:
[ 6202.453690]  [<ffffffffa05773f2>] release_extent_buffer+0xa2/0xe0
[btrfs]
[ 6202.460505]  [<ffffffffa05775b4>] free_extent_buffer+0x34/0x80 [btrfs]
[ 6202.467051]  [<ffffffffa0578152>] btree_write_cache_pages+0x272/0x480
[btrfs]
[ 6202.474169]  [<ffffffff81077588>] ? update_curr+0x128/0x1f0
[ 6202.479761]  [<ffffffffa054c96a>] btree_writepages+0x3a/0x50 [btrfs]
[ 6202.486110]  [<ffffffff810fc421>] do_writepages+0x21/0x40
[ 6202.491500]  [<ffffffff810f0b0b>] __filemap_fdatawrite_range+0x5b/0x60
[ 6202.498019]  [<ffffffff810f0de3>] filemap_fdatawrite_range+0x13/0x20
[ 6202.504407]  [<ffffffffa0552ecf>] btrfs_write_marked_extents+0x7f/0xe0
[btrfs]
[ 6202.511639]  [<ffffffffa0552f5e>]
btrfs_write_and_wait_marked_extents+0x2e/0x60 [btrfs]
[ 6202.519679]  [<ffffffffa0552fbb>]
btrfs_write_and_wait_transaction+0x2b/0x50 [btrfs]
[ 6202.527464]  [<ffffffffa055404c>] btrfs_commit_transaction+0x7ac/0xa10
[btrfs]
[ 6202.534675]  [<ffffffff81079540>] ? set_next_entity+0x90/0xa0
[ 6202.540418]  [<ffffffff8105f5d0>] ? wake_up_bit+0x40/0x40
[ 6202.545830]  [<ffffffffa0554590>] ? btrfs_end_transaction+0x20/0x20
[btrfs]
[ 6202.552825]  [<ffffffffa05545af>] do_async_commit+0x1f/0x30 [btrfs]
[ 6202.559111]  [<ffffffffa0554590>] ? btrfs_end_transaction+0x20/0x20
[btrfs]
[ 6202.566062]  [<ffffffff81058680>] process_one_work+0x140/0x490
[ 6202.571886]  [<ffffffff8105a417>] worker_thread+0x187/0x3f0
[ 6202.577453]  [<ffffffff8105a290>] ? manage_workers+0x120/0x120
[ 6202.583281]  [<ffffffff8105f02e>] kthread+0x9e/0xb0
[ 6202.588159]  [<ffffffff81486c64>] kernel_thread_helper+0x4/0x10
[ 6202.594076]  [<ffffffff8147d84a>] ? retint_restore_args+0xe/0xe
[ 6202.599988]  [<ffffffff8105ef90>] ?
kthread_freezable_should_stop+0x80/0x80
[ 6202.606936]  [<ffffffff81486c60>] ? gs_change+0xb/0xb
[ 6202.611975] Code: 48 89 e5 41 57 41 56 41 55 41 54 53 48 83 ec 08 66 66 66 66
90 8b 47 38 49 89 fd 85 c0 75 0c 48 8b 47 20 4c 8d 7f 20 84 c0 79 04 <0f>
0b eb fe 48 8b 47 20 a8 04 75 f4 48 8b 07 49 89 c4 4c 03 67
[ 6202.631894] RIP  [<ffffffffa057724c>]
btrfs_release_extent_buffer_page.clone.0+0x2c/0x130 [btrfs]
[ 6202.640773]  RSP <ffff88060c74da00>
[ 6202.644691] ---[ end trace de7af0e9a646be3b ]---

git blame fs/btrfs/extent_io.c | grep -w 3989
0b32f4bb (Josef Bacik        2012-03-13 09:38:00 -0400 3989) 
BUG_ON(extent_buffer_under_io(eb));

-- Jim

>
> Josef
>

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Chris Mason

2012-Apr-12 00:29 UTC

head link

Re: kernel BUG at fs/btrfs/extent_io.c:3982!

On Wed, Apr 11, 2012 at 03:39:07PM -0600, Jim Schutt
wrote:> On 04/11/2012 02:28 PM, Josef Bacik wrote:
> >On Wed, Apr 11, 2012 at 02:24:30PM -0600, Jim Schutt wrote:
> >>On 04/11/2012 01:09 PM, Josef Bacik wrote:
> >>>On Tue, Apr 10, 2012 at 01:39:14PM -0600, Jim Schutt wrote:
> >>>>Hi,
> >>>>
> >>>>I hit this BUG today.
> >>>>
> >>>>I''m running 3.3.1 merged with the ceph and btrfs
bits for 3.4,
> >>>>i.e. 3.3.1 +
> >>>>   commit bc3f116fec194 "Btrfs: update the checks for
mixed block groups with big metadata blocks"
> >>>>   commit c666601a935b9 "rbd: move snap_rwsem to the
device, rename to header_rwsem"
> >>>>
> >>>>The btrfs filesystem in question is backing a Ceph OSD
under
> >>>>a heavy write load.
> >>>>
> >>>>Here''s the bug:
> >>>>
> >>>
> >>>Can you give this a whirl and let me know how it goes?  If
I''m right you should
> >>>see a warning pop up in your messages.  Thanks,
> >>
> >>OK, I''ve got my test running with your patch applied
> >>to my previous kernel.
> >>
> >>Do you expect your warning to only fire when my
> >>previous kernel would have BUGged?  I ask because I''ve
> >>only seen the BUG once, so it may be a low-probability
> >>occurrence.
> >>
> >>It seems like I should keep testing until I see either
> >>your new warning or the BUG, right?
> >>
> >
> >So hopefully you will see my WARN with no BUG, but yes keep running
until you
> >see one or the other please ;).  Thanks,
> 
> Hmmm, the BUG won:
> 
> [ 6202.249041] ------------[ cut here ]------------
> [ 6202.253654] kernel BUG at fs/btrfs/extent_io.c:3989!
Since this is exactly the same call trace, we can assume ref count on
the buffer is correct.  I think it means we''re racing on removing the
buffer from the radix tree.  I''m adding some diagnostics here to try
and
grow the window a bit.

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Josef Bacik

2012-May-01 16:00 UTC

head link

Re: kernel BUG at fs/btrfs/extent_io.c:3982!

On Wed, Apr 11, 2012 at 02:24:30PM -0600, Jim Schutt
wrote:> On 04/11/2012 01:09 PM, Josef Bacik wrote:
> >On Tue, Apr 10, 2012 at 01:39:14PM -0600, Jim Schutt wrote:
> >>Hi,
> >>
> >>I hit this BUG today.
> >>
> >>I''m running 3.3.1 merged with the ceph and btrfs bits for
3.4,
> >>i.e. 3.3.1 +
> >>   commit bc3f116fec194 "Btrfs: update the checks for mixed
block groups with big metadata blocks"
> >>   commit c666601a935b9 "rbd: move snap_rwsem to the device,
rename to header_rwsem"
> >>
> >>The btrfs filesystem in question is backing a Ceph OSD under
> >>a heavy write load.
> >>
> >>Here''s the bug:
> >>
> >
> >Can you give this a whirl and let me know how it goes?  If I''m
right you should
> >see a warning pop up in your messages.  Thanks,
> 
> OK, I''ve got my test running with your patch applied
> to my previous kernel.
> 
> Do you expect your warning to only fire when my
> previous kernel would have BUGged?  I ask because I''ve
> only seen the BUG once, so it may be a low-probability
> occurrence.
> 
> It seems like I should keep testing until I see either
> your new warning or the BUG, right?
Hey Jim,

I just sent a patch to the list

[PATCH] Btrfs: fix page leak when allocing extent buffers 

Could you try that and see if you can reproduce your problem?  Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jim Schutt

2012-May-01 16:41 UTC

head link

Re: kernel BUG at fs/btrfs/extent_io.c:3982!

On 05/01/2012 10:00 AM, Josef Bacik wrote:> On Wed, Apr 11, 2012 at 02:24:30PM -0600, Jim Schutt wrote:
>> On 04/11/2012 01:09 PM, Josef Bacik wrote:
>>> On Tue, Apr 10, 2012 at 01:39:14PM -0600, Jim Schutt wrote:
>>>> Hi,
>>>>
>>>> I hit this BUG today.
>>>>
>>>> I''m running 3.3.1 merged with the ceph and btrfs bits
for 3.4,
>>>> i.e. 3.3.1 +
>>>>    commit bc3f116fec194 "Btrfs: update the checks for
mixed block groups with big metadata blocks"
>>>>    commit c666601a935b9 "rbd: move snap_rwsem to the
device, rename to header_rwsem"
>>>>
>>>> The btrfs filesystem in question is backing a Ceph OSD under
>>>> a heavy write load.
>>>>
>>>> Here''s the bug:
>>>>
>>>
>>> Can you give this a whirl and let me know how it goes?  If
I''m right you should
>>> see a warning pop up in your messages.  Thanks,
>>
>> OK, I''ve got my test running with your patch applied
>> to my previous kernel.
>>
>> Do you expect your warning to only fire when my
>> previous kernel would have BUGged?  I ask because I''ve
>> only seen the BUG once, so it may be a low-probability
>> occurrence.
>>
>> It seems like I should keep testing until I see either
>> your new warning or the BUG, right?
>
> Hey Jim,
>
> I just sent a patch to the list
>
> [PATCH] Btrfs: fix page leak when allocing extent buffers
>
> Could you try that and see if you can reproduce your problem?
Taking it for a spin now...

Thanks -- Jim
> Thanks,
>
> Josef
>

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jim Schutt

2012-May-03 14:43 UTC

head link

Re: kernel BUG at fs/btrfs/extent_io.c:3982!

On 05/01/2012 10:41 AM, Jim Schutt wrote:> On 05/01/2012 10:00 AM, Josef Bacik wrote:
>> On Wed, Apr 11, 2012 at 02:24:30PM -0600, Jim Schutt wrote:
>>> On 04/11/2012 01:09 PM, Josef Bacik wrote:
>>>> On Tue, Apr 10, 2012 at 01:39:14PM -0600, Jim Schutt wrote:
>>>>> Hi,
>>>>>
>>>>> I hit this BUG today.
>>>>>
>>>>> I''m running 3.3.1 merged with the ceph and btrfs
bits for 3.4,
>>>>> i.e. 3.3.1 +
>>>>> commit bc3f116fec194 "Btrfs: update the checks for
mixed block groups with big metadata blocks"
>>>>> commit c666601a935b9 "rbd: move snap_rwsem to the
device, rename to header_rwsem"
>>>>>
>>>>> The btrfs filesystem in question is backing a Ceph OSD
under
>>>>> a heavy write load.
>>>>>
>>>>> Here''s the bug:
>>>>>
>>>>
>>>> Can you give this a whirl and let me know how it goes? If
I''m right you should
>>>> see a warning pop up in your messages. Thanks,
>>>
>>> OK, I''ve got my test running with your patch applied
>>> to my previous kernel.
>>>
>>> Do you expect your warning to only fire when my
>>> previous kernel would have BUGged? I ask because I''ve
>>> only seen the BUG once, so it may be a low-probability
>>> occurrence.
>>>
>>> It seems like I should keep testing until I see either
>>> your new warning or the BUG, right?
>>
>> Hey Jim,
>>
>> I just sent a patch to the list
>>
>> [PATCH] Btrfs: fix page leak when allocing extent buffers
>>
>> Could you try that and see if you can reproduce your problem?
>
> Taking it for a spin now...
>
Hit it again:

[ 4638.295231] ------------[ cut here ]------------
[ 4638.299840] kernel BUG at fs/btrfs/extent_io.c:3993!
[ 4638.304792] invalid opcode: 0000 [#1] SMP
[ 4638.308912] CPU 3
[ 4638.310745] Modules linked in: btrfs zlib_deflate dm_round_robin ib_ipoib
rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 ib_sa
iw_cxgb4 sg sd_mod dm_mirror dm_region_hash dm_log dm_multipath scsi_dh
vhost_net macvtap macvlan tun kvm uinput joydev button ata_piix libata mpt2sas
scsi_transport_sas raid_class scsi_mod microcode serio_raw pcspkr mlx4_ib ib_mad
ib_core mlx4_en mlx4_core cxgb4 i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support
ehci_hcd uhci_hcd ioatdma i7core_edac edac_core dm_mod nfs nfs_acl auth_rpcgss
fscache lockd sunrpc broadcom tg3 bnx2 igb dca e1000 [last unloaded:
scsi_wait_scan]
[ 4638.366288]
[ 4638.367786] Pid: 32179, comm: kworker/3:5 Not tainted 3.3.4-00186-g56a0ae2
#65 Supermicro X8DTH-i/6/iF/6F/X8DTH
[ 4638.377898] RIP: 0010:[<ffffffffa057717c>]  [<ffffffffa057717c>]
btrfs_release_extent_buffer_page.clone.0+0x2c/0x130 [btrfs]
[ 4638.389112] RSP: 0018:ffff8805ff6cba00  EFLAGS: 00010202
[ 4638.394408] RAX: 0000000000000004 RBX: ffff880152ba8c18 RCX: ffff8800be9e4468
[ 4638.401529] RDX: ffff8802f7d64b19 RSI: 00000000000858ec RDI: ffff880152ba8c18
[ 4638.408644] RBP: ffff8805ff6cba30 R08: 000000000000002c R09: 0000000000000003
[ 4638.415759] R10: 0000000000000008 R11: ffff880618cee0c0 R12: ffff8805ff6ca000
[ 4638.422874] R13: ffff880152ba8c18 R14: 000000000000000e R15: ffff8805ff6cbc10
[ 4638.429987] FS:  0000000000000000(0000) GS:ffff880627c60000(0000)
knlGS:0000000000000000
[ 4638.438052] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 4638.443781] CR2: ffffffffff600400 CR3: 0000000a06461000 CR4: 00000000000006e0
[ 4638.450900] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 4638.458018] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 4638.465130] Process kworker/3:5 (pid: 32179, threadinfo ffff8805ff6ca000,
task ffff88060bf14500)
[ 4638.473886] Stack:
[ 4638.475899]  ffff880500000000 ffff880152ba8c18 ffff8805ff6ca000
ffff880a0cf1aeb0
[ 4638.483350]  000000000000000e ffff8805ff6cbc10 ffff8805ff6cba60
ffffffffa0577322
[ 4638.490782]  ffff8805ff6cbb80 ffff880152ba8c18 ffff8805ff6cbb50
0000000000000008
[ 4638.498234] Call Trace:
[ 4638.500705]  [<ffffffffa0577322>] release_extent_buffer+0xa2/0xe0
[btrfs]
[ 4638.507492]  [<ffffffffa05774e4>] free_extent_buffer+0x34/0x80 [btrfs]
[ 4638.514036]  [<ffffffffa05780a2>] btree_write_cache_pages+0x272/0x480
[btrfs]
[ 4638.521155]  [<ffffffff81075b18>] ? enqueue_sleeper+0x248/0x2c0
[ 4638.527072]  [<ffffffffa054c92a>] btree_writepages+0x3a/0x50 [btrfs]
[ 4638.533411]  [<ffffffff810fc9f1>] do_writepages+0x21/0x40
[ 4638.538794]  [<ffffffff810f10db>] __filemap_fdatawrite_range+0x5b/0x60
[ 4638.545300]  [<ffffffff810f13b3>] filemap_fdatawrite_range+0x13/0x20
[ 4638.551654]  [<ffffffffa0552e8f>] btrfs_write_marked_extents+0x7f/0xe0
[btrfs]
[ 4638.558862]  [<ffffffffa0552f1e>]
btrfs_write_and_wait_marked_extents+0x2e/0x60 [btrfs]
[ 4638.566867]  [<ffffffffa0552f7b>]
btrfs_write_and_wait_transaction+0x2b/0x50 [btrfs]
[ 4638.574615]  [<ffffffffa0554058>] btrfs_commit_transaction+0x7d8/0x9f0
[btrfs]
[ 4638.581818]  [<ffffffff81079910>] ? set_next_entity+0x90/0xa0
[ 4638.587556]  [<ffffffff8105f970>] ? wake_up_bit+0x40/0x40
[ 4638.592957]  [<ffffffffa0554570>] ? btrfs_end_transaction+0x20/0x20
[btrfs]
[ 4638.599902]  [<ffffffffa055458f>] do_async_commit+0x1f/0x30 [btrfs]
[ 4638.606161]  [<ffffffffa0554570>] ? btrfs_end_transaction+0x20/0x20
[btrfs]
[ 4638.613106]  [<ffffffff81058a20>] process_one_work+0x140/0x490
[ 4638.618926]  [<ffffffff8105a7b7>] worker_thread+0x187/0x3f0
[ 4638.624484]  [<ffffffff8105a630>] ? manage_workers+0x120/0x120
[ 4638.630303]  [<ffffffff8105f3ce>] kthread+0x9e/0xb0
[ 4638.635171]  [<ffffffff81487a24>] kernel_thread_helper+0x4/0x10
[ 4638.641072]  [<ffffffff8147e60a>] ? retint_restore_args+0xe/0xe
[ 4638.646974]  [<ffffffff8105f330>] ?
kthread_freezable_should_stop+0x80/0x80
[ 4638.653915]  [<ffffffff81487a20>] ? gs_change+0xb/0xb
[ 4638.658952] Code: 48 89 e5 41 57 41 56 41 55 41 54 53 48 83 ec 08 66 66 66 66
90 8b 47 38 49 89 fd 85 c0 75 0c 48 8b 47 20 4c 8d 7f 20 84 c0 79 04 <0f>
0b eb fe 48 8b 47 20 a8 04 75 f4 48 8b 07 49 89 c4 4c 03 67
[ 4638.678851] RIP  [<ffffffffa057717c>]
btrfs_release_extent_buffer_page.clone.0+0x2c/0x130 [btrfs]
[ 4638.687729]  RSP <ffff8805ff6cba00>
[ 4638.691654] ---[ end trace 51121d321f4755d6 ]---


Kernel is 3.3.4 + for-linus branch (commit c666601a93) of
     git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client.git
+ for-linus branch (commit dc7fdde39e) of
     git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git
+ your debug patch for me: "kernel BUG at fs/btrfs/extent_io.c:3982!"
+ your patch "Btrfs: fix page leak when allocing extent buffers"


git blame fs/btrfs/extent_io.c | grep -w -C 10 3993
0b32f4bb (Josef Bacik        2012-03-13 09:38:00 -0400 3983)
897ca6e9 (Miao Xie           2010-10-26 20:57:29 -0400 3984) /*
897ca6e9 (Miao Xie           2010-10-26 20:57:29 -0400 3985)  * Helper for
releasing extent buffer page.
897ca6e9 (Miao Xie           2010-10-26 20:57:29 -0400 3986)  */
897ca6e9 (Miao Xie           2010-10-26 20:57:29 -0400 3987) static void
btrfs_release_extent_buffer_page(struct extent_buffer *eb,
897ca6e9 (Miao Xie           2010-10-26 20:57:29 -0400 3988) 						unsigned long
start_idx)
897ca6e9 (Miao Xie           2010-10-26 20:57:29 -0400 3989) {
897ca6e9 (Miao Xie           2010-10-26 20:57:29 -0400 3990) 	unsigned long
index;
897ca6e9 (Miao Xie           2010-10-26 20:57:29 -0400 3991) 	struct page *page;
897ca6e9 (Miao Xie           2010-10-26 20:57:29 -0400 3992)
0b32f4bb (Josef Bacik        2012-03-13 09:38:00 -0400 3993) 
BUG_ON(extent_buffer_under_io(eb));
0b32f4bb (Josef Bacik        2012-03-13 09:38:00 -0400 3994)
897ca6e9 (Miao Xie           2010-10-26 20:57:29 -0400 3995) 	index =
num_extent_pages(eb->start, eb->len);
897ca6e9 (Miao Xie           2010-10-26 20:57:29 -0400 3996) 	if (start_idx
>= index)
897ca6e9 (Miao Xie           2010-10-26 20:57:29 -0400 3997) 		return;
897ca6e9 (Miao Xie           2010-10-26 20:57:29 -0400 3998)
897ca6e9 (Miao Xie           2010-10-26 20:57:29 -0400 3999) 	do {
897ca6e9 (Miao Xie           2010-10-26 20:57:29 -0400 4000) 		index--;
897ca6e9 (Miao Xie           2010-10-26 20:57:29 -0400 4001) 		page =
extent_buffer_page(eb, index);
4f2de97a (Josef Bacik        2012-03-07 16:20:05 -0500 4002) 		if (page) {
4f2de97a (Josef Bacik        2012-03-07 16:20:05 -0500 4003) 		
spin_lock(&page->map

FWIW, it takes a while to hit this - load was 128 Ceph clients writing
to a Ceph filesystem with 288 OSDs - the above bug hit several tens of
TB into a 65 TB aggregate write test.

-- Jim
> Thanks -- Jim
>
>> Thanks,
>>
>> Josef
>>
>

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Josef Bacik

2012-May-03 14:53 UTC

head link

Re: kernel BUG at fs/btrfs/extent_io.c:3982!

On Thu, May 03, 2012 at 08:43:32AM -0600, Jim Schutt
wrote:> On 05/01/2012 10:41 AM, Jim Schutt wrote:
> >On 05/01/2012 10:00 AM, Josef Bacik wrote:
> >>On Wed, Apr 11, 2012 at 02:24:30PM -0600, Jim Schutt wrote:
> >>>On 04/11/2012 01:09 PM, Josef Bacik wrote:
> >>>>On Tue, Apr 10, 2012 at 01:39:14PM -0600, Jim Schutt wrote:
> >>>>>Hi,
> >>>>>
> >>>>>I hit this BUG today.
> >>>>>
> >>>>>I''m running 3.3.1 merged with the ceph and
btrfs bits for 3.4,
> >>>>>i.e. 3.3.1 +
> >>>>>commit bc3f116fec194 "Btrfs: update the checks for
mixed block groups with big metadata blocks"
> >>>>>commit c666601a935b9 "rbd: move snap_rwsem to the
device, rename to header_rwsem"
> >>>>>
> >>>>>The btrfs filesystem in question is backing a Ceph OSD
under
> >>>>>a heavy write load.
> >>>>>
> >>>>>Here''s the bug:
> >>>>>
> >>>>
> >>>>Can you give this a whirl and let me know how it goes? If
I''m right you should
> >>>>see a warning pop up in your messages. Thanks,
> >>>
> >>>OK, I''ve got my test running with your patch applied
> >>>to my previous kernel.
> >>>
> >>>Do you expect your warning to only fire when my
> >>>previous kernel would have BUGged? I ask because I''ve
> >>>only seen the BUG once, so it may be a low-probability
> >>>occurrence.
> >>>
> >>>It seems like I should keep testing until I see either
> >>>your new warning or the BUG, right?
> >>
> >>Hey Jim,
> >>
> >>I just sent a patch to the list
> >>
> >>[PATCH] Btrfs: fix page leak when allocing extent buffers
> >>
> >>Could you try that and see if you can reproduce your problem?
> >
> >Taking it for a spin now...
> >
> 
> Hit it again:
> 
Argh ok it''s time to stop hopping around the problem and see what
exactly the
state is when this happens so I know where to look.  Can you run with this patch
and give me the dmesg?  The important information will be above the --- cut here
 --- line so make sure to grab that part.  Thanks,

Josef


diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 7af9343..72249e3 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3972,7 +3972,13 @@ static void btrfs_release_extent_buffer_page(struct
extent_buffer *eb,
 	unsigned long num_pages;
 	struct page *page;
 
-	BUG_ON(extent_buffer_under_io(eb));
+	if (extent_buffer_under_io(eb)) {
+		printk(KERN_ERR "io_pages=%d, writeback=%d, dirty=%d, stale=%d,
tree_ref=%d\n",
+		       atomic_read(&eb->io_pages), test_bit(EXTENT_BUFFER_WRITEBACK,
&eb->bflags),
+		       test_bit(EXTENT_BUFFER_DIRTY, &eb->bflags),
test_bit(EXTENT_BUFFER_STALE, &eb->bflags),
+		       test_bit(EXTENT_BUFFER_TREE_REF, &eb->bflags));
+		BUG();
+	}
 
 	num_pages = num_extent_pages(eb->start, eb->len);
 	index = start_idx + num_pages;
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jim Schutt

2012-May-03 15:46 UTC

head link

Re: [EXTERNAL] Re: kernel BUG at fs/btrfs/extent_io.c:3982!

On 05/03/2012 08:53 AM, Josef Bacik wrote:> On Thu, May 03, 2012 at 08:43:32AM -0600, Jim Schutt wrote:
>> On 05/01/2012 10:41 AM, Jim Schutt wrote:
>>> On 05/01/2012 10:00 AM, Josef Bacik wrote:
>>>> On Wed, Apr 11, 2012 at 02:24:30PM -0600, Jim Schutt wrote:
>>>>> On 04/11/2012 01:09 PM, Josef Bacik wrote:
>>>>>> On Tue, Apr 10, 2012 at 01:39:14PM -0600, Jim Schutt
wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> I hit this BUG today.
>>>>>>>
>>>>>>> I''m running 3.3.1 merged with the ceph and
btrfs bits for 3.4,
>>>>>>> i.e. 3.3.1 +
>>>>>>> commit bc3f116fec194 "Btrfs: update the checks
for mixed block groups with big metadata blocks"
>>>>>>> commit c666601a935b9 "rbd: move snap_rwsem to
the device, rename to header_rwsem"
>>>>>>>
>>>>>>> The btrfs filesystem in question is backing a Ceph
OSD under
>>>>>>> a heavy write load.
>>>>>>>
>>>>>>> Here''s the bug:
>>>>>>>
>>>>>>
>>>>>> Can you give this a whirl and let me know how it goes?
If I''m right you should
>>>>>> see a warning pop up in your messages. Thanks,
>>>>>
>>>>> OK, I''ve got my test running with your patch
applied
>>>>> to my previous kernel.
>>>>>
>>>>> Do you expect your warning to only fire when my
>>>>> previous kernel would have BUGged? I ask because
I''ve
>>>>> only seen the BUG once, so it may be a low-probability
>>>>> occurrence.
>>>>>
>>>>> It seems like I should keep testing until I see either
>>>>> your new warning or the BUG, right?
>>>>
>>>> Hey Jim,
>>>>
>>>> I just sent a patch to the list
>>>>
>>>> [PATCH] Btrfs: fix page leak when allocing extent buffers
>>>>
>>>> Could you try that and see if you can reproduce your problem?
>>>
>>> Taking it for a spin now...
>>>
>>
>> Hit it again:
>>
>
> Argh ok it''s time to stop hopping around the problem and see what
exactly the
> state is when this happens so I know where to look.  Can you run with this
patch
> and give me the dmesg?  The important information will be above the --- cut
here
>   --- line so make sure to grab that part.  Thanks,
Working on it...

BTW, when I recompiled, I noticed this warning:

   CC [M]  fs/btrfs/extent_io.o
fs/btrfs/extent_io.c: In function ‘write_one_eb’:
fs/btrfs/extent_io.c:3195: warning: ‘ret’ may be used uninitialized in this
function

Is there ever any chance at all that write_one_eb() can be
called by mistake for an eb with zero pages?  If so, could
that be part of the problem?

-- Jim
>
> Josef
>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Josef Bacik

2012-May-03 15:53 UTC

head link

Re: [EXTERNAL] Re: kernel BUG at fs/btrfs/extent_io.c:3982!

On Thu, May 03, 2012 at 09:46:15AM -0600, Jim Schutt
wrote:> On 05/03/2012 08:53 AM, Josef Bacik wrote:
> >On Thu, May 03, 2012 at 08:43:32AM -0600, Jim Schutt wrote:
> >>On 05/01/2012 10:41 AM, Jim Schutt wrote:
> >>>On 05/01/2012 10:00 AM, Josef Bacik wrote:
> >>>>On Wed, Apr 11, 2012 at 02:24:30PM -0600, Jim Schutt wrote:
> >>>>>On 04/11/2012 01:09 PM, Josef Bacik wrote:
> >>>>>>On Tue, Apr 10, 2012 at 01:39:14PM -0600, Jim
Schutt wrote:
> >>>>>>>Hi,
> >>>>>>>
> >>>>>>>I hit this BUG today.
> >>>>>>>
> >>>>>>>I''m running 3.3.1 merged with the ceph
and btrfs bits for 3.4,
> >>>>>>>i.e. 3.3.1 +
> >>>>>>>commit bc3f116fec194 "Btrfs: update the
checks for mixed block groups with big metadata blocks"
> >>>>>>>commit c666601a935b9 "rbd: move snap_rwsem
to the device, rename to header_rwsem"
> >>>>>>>
> >>>>>>>The btrfs filesystem in question is backing a
Ceph OSD under
> >>>>>>>a heavy write load.
> >>>>>>>
> >>>>>>>Here''s the bug:
> >>>>>>>
> >>>>>>
> >>>>>>Can you give this a whirl and let me know how it
goes? If I''m right you should
> >>>>>>see a warning pop up in your messages. Thanks,
> >>>>>
> >>>>>OK, I''ve got my test running with your patch
applied
> >>>>>to my previous kernel.
> >>>>>
> >>>>>Do you expect your warning to only fire when my
> >>>>>previous kernel would have BUGged? I ask because
I''ve
> >>>>>only seen the BUG once, so it may be a low-probability
> >>>>>occurrence.
> >>>>>
> >>>>>It seems like I should keep testing until I see either
> >>>>>your new warning or the BUG, right?
> >>>>
> >>>>Hey Jim,
> >>>>
> >>>>I just sent a patch to the list
> >>>>
> >>>>[PATCH] Btrfs: fix page leak when allocing extent buffers
> >>>>
> >>>>Could you try that and see if you can reproduce your
problem?
> >>>
> >>>Taking it for a spin now...
> >>>
> >>
> >>Hit it again:
> >>
> >
> >Argh ok it''s time to stop hopping around the problem and see
what exactly the
> >state is when this happens so I know where to look.  Can you run with
this patch
> >and give me the dmesg?  The important information will be above the ---
cut here
> >  --- line so make sure to grab that part.  Thanks,
> 
> Working on it...
> 
> BTW, when I recompiled, I noticed this warning:
> 
>   CC [M]  fs/btrfs/extent_io.o
> fs/btrfs/extent_io.c: In function ‘write_one_eb’:
> fs/btrfs/extent_io.c:3195: warning: ‘ret’ may be used uninitialized in this
function
> 
> Is there ever any chance at all that write_one_eb() can be
> called by mistake for an eb with zero pages?  If so, could
> that be part of the problem?
> 
It shouldn''t happen but really neither should this bug sooooo go ahead
and set
ret = 0 and put a BUG_ON(!num_pages); in write_one_eb after the

        num_pages = num_extent_pages(eb->start, eb->len);

and let it ride.  Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Btrfs devel - Apr 2012 - kernel BUG at fs/btrfs/extent_io.c:3982!

kernel BUG at fs/btrfs/extent_io.c:3982!

Re: kernel BUG at fs/btrfs/extent_io.c:3982!

Re: kernel BUG at fs/btrfs/extent_io.c:3982!

Re: kernel BUG at fs/btrfs/extent_io.c:3982!

Re: kernel BUG at fs/btrfs/extent_io.c:3982!

Re: kernel BUG at fs/btrfs/extent_io.c:3982!

Re: kernel BUG at fs/btrfs/extent_io.c:3982!

Re: kernel BUG at fs/btrfs/extent_io.c:3982!

Re: kernel BUG at fs/btrfs/extent_io.c:3982!

Re: kernel BUG at fs/btrfs/extent_io.c:3982!

Re: kernel BUG at fs/btrfs/extent_io.c:3982!

Re: kernel BUG at fs/btrfs/extent_io.c:3982!

Re: [EXTERNAL] Re: kernel BUG at fs/btrfs/extent_io.c:3982!

Re: [EXTERNAL] Re: kernel BUG at fs/btrfs/extent_io.c:3982!