thr3ads.net - Btrfs devel - [PATCH] Btrfs: fix a deadlock on chunk mutex [Dec 2012]

If this information is useful, please help other people find it:
Share via:

Liu Bo

2012-Dec-13 01:52 UTC

[PATCH] Btrfs: fix a deadlock on chunk mutex

An user reported that he has hit an annoying deadlock while playing with
ceph based on btrfs.

Current updating device tree requires space from METADATA chunk,
so we -may- need to do a recursive chunk allocation when adding/updating
dev extent, that is where the deadlock comes from.

If we use SYSTEM metadata to update device tree, we can avoid the recursive
stuff.

Reported-by: Jim Schutt <jaschut@sandia.gov>
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
---
 fs/btrfs/extent-tree.c |    8 +++++---
 1 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 3d3e2c1..561dad5 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3346,7 +3346,8 @@ u64 btrfs_get_alloc_profile(struct btrfs_root *root, int
data)
 
 	if (data)
 		flags = BTRFS_BLOCK_GROUP_DATA;
-	else if (root == root->fs_info->chunk_root)
+	else if (root == root->fs_info->chunk_root ||
+		 root == root->fs_info->dev_root)
 		flags = BTRFS_BLOCK_GROUP_SYSTEM;
 	else
 		flags = BTRFS_BLOCK_GROUP_METADATA;
@@ -3534,7 +3535,8 @@ static u64 get_system_chunk_thresh(struct btrfs_root
*root, u64 type)
 	else
 		num_dev = 1;	/* DUP or single */
 
-	/* metadata for updaing devices and chunk tree */
+	/* metadata for adding/updating devices and chunk tree */
+	num_dev = num_dev << 1
 	return btrfs_calc_trans_metadata_size(root, num_dev + 1);
 }
 
@@ -4351,7 +4353,7 @@ static void init_global_block_rsv(struct btrfs_fs_info
*fs_info)
 
 	fs_info->extent_root->block_rsv = &fs_info->global_block_rsv;
 	fs_info->csum_root->block_rsv = &fs_info->global_block_rsv;
-	fs_info->dev_root->block_rsv = &fs_info->global_block_rsv;
+	fs_info->dev_root->block_rsv = &fs_info->chunk_block_rsv;
 	fs_info->tree_root->block_rsv = &fs_info->global_block_rsv;
 	fs_info->chunk_root->block_rsv = &fs_info->chunk_block_rsv;
 
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Josef Bacik

2012-Dec-18 13:52 UTC

head link

Re: [PATCH] Btrfs: fix a deadlock on chunk mutex

On Wed, Dec 12, 2012 at 06:52:37PM -0700, Liu Bo wrote:> An user reported that he has hit an annoying deadlock while playing with
> ceph based on btrfs.
> 
> Current updating device tree requires space from METADATA chunk,
> so we -may- need to do a recursive chunk allocation when adding/updating
> dev extent, that is where the deadlock comes from.
> 
> If we use SYSTEM metadata to update device tree, we can avoid the recursive
> stuff.
> 
This is going to cause us to allocate much more system chunks than we used to
which could land us in trouble.  Instead let''s just keep us from
re-entering if
we''re already allocating a chunk.  We do the chunk allocation when we
don''t have
enough space for a cluster, but we''ll likely have plenty of space to
make an
allocation.  Can you give this patch a try Jim and see if it fixes your problem?
Thanks,

Josef


diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index e152809..59df5e7 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3564,6 +3564,10 @@ static int do_chunk_alloc(struct btrfs_trans_handle
*trans,
 	int wait_for_alloc = 0;
 	int ret = 0;
 
+	/* Don''t re-enter if we''re already allocating a chunk */
+	if (trans->allocating_chunk)
+		return -ENOSPC;
+
 	space_info = __find_space_info(extent_root->fs_info, flags);
 	if (!space_info) {
 		ret = update_space_info(extent_root->fs_info, flags,
@@ -3606,6 +3610,8 @@ again:
 		goto again;
 	}
 
+	trans->allocating_chunk = true;
+
 	/*
 	 * If we have mixed data/metadata chunks we want to make sure we keep
 	 * allocating mixed chunks instead of individual chunks.
@@ -3632,6 +3638,7 @@ again:
 	check_system_chunk(trans, extent_root, flags);
 
 	ret = btrfs_alloc_chunk(trans, extent_root, flags);
+	trans->allocating_chunk = false;
 	if (ret < 0 && ret != -ENOSPC)
 		goto out;
 
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index e6509b9..47ad8be 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -388,6 +388,7 @@ again:
 	h->qgroup_reserved = qgroup_reserved;
 	h->delayed_ref_elem.seq = 0;
 	h->type = type;
+	h->allocating_chunk = false;
 	INIT_LIST_HEAD(&h->qgroup_ref_list);
 	INIT_LIST_HEAD(&h->new_bgs);
 
diff --git a/fs/btrfs/transaction.h b/fs/btrfs/transaction.h
index 0e8aa1e..69700f7 100644
--- a/fs/btrfs/transaction.h
+++ b/fs/btrfs/transaction.h
@@ -68,6 +68,7 @@ struct btrfs_trans_handle {
 	struct btrfs_block_rsv *orig_rsv;
 	short aborted;
 	short adding_csums;
+	bool allocating_chunk;
 	enum btrfs_trans_type type;
 	/*
 	 * this root is only needed to validate that the root passed to
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Liu Bo

2012-Dec-18 14:47 UTC

head link

Re: [PATCH] Btrfs: fix a deadlock on chunk mutex

On Tue, Dec 18, 2012 at 08:52:42AM -0500, Josef Bacik
wrote:> On Wed, Dec 12, 2012 at 06:52:37PM -0700, Liu Bo wrote:
> > An user reported that he has hit an annoying deadlock while playing
with
> > ceph based on btrfs.
> > 
> > Current updating device tree requires space from METADATA chunk,
> > so we -may- need to do a recursive chunk allocation when
adding/updating
> > dev extent, that is where the deadlock comes from.
> > 
> > If we use SYSTEM metadata to update device tree, we can avoid the
recursive
> > stuff.
> > 
> 
> This is going to cause us to allocate much more system chunks than we used
to
> which could land us in trouble.  Instead let''s just keep us from
re-entering if
> we''re already allocating a chunk.  We do the chunk allocation when
we don''t have
> enough space for a cluster, but we''ll likely have plenty of space
to make an
> allocation.  Can you give this patch a try Jim and see if it fixes your
problem?
> Thanks,
From the stack info Jim gave, returning ENOSPC to caller will end up with
aborting to readonly if there is no others save the situation by 
allocating another METADATA chunk, it is recursive allocation though.

thanks,
liubo
> 
> Josef
> 
> 
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index e152809..59df5e7 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -3564,6 +3564,10 @@ static int do_chunk_alloc(struct btrfs_trans_handle
*trans,
>  	int wait_for_alloc = 0;
>  	int ret = 0;
>  
> +	/* Don''t re-enter if we''re already allocating a chunk
*/
> +	if (trans->allocating_chunk)
> +		return -ENOSPC;
> +
>  	space_info = __find_space_info(extent_root->fs_info, flags);
>  	if (!space_info) {
>  		ret = update_space_info(extent_root->fs_info, flags,
> @@ -3606,6 +3610,8 @@ again:
>  		goto again;
>  	}
>  
> +	trans->allocating_chunk = true;
> +
>  	/*
>  	 * If we have mixed data/metadata chunks we want to make sure we keep
>  	 * allocating mixed chunks instead of individual chunks.
> @@ -3632,6 +3638,7 @@ again:
>  	check_system_chunk(trans, extent_root, flags);
>  
>  	ret = btrfs_alloc_chunk(trans, extent_root, flags);
> +	trans->allocating_chunk = false;
>  	if (ret < 0 && ret != -ENOSPC)
>  		goto out;
>  
> diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
> index e6509b9..47ad8be 100644
> --- a/fs/btrfs/transaction.c
> +++ b/fs/btrfs/transaction.c
> @@ -388,6 +388,7 @@ again:
>  	h->qgroup_reserved = qgroup_reserved;
>  	h->delayed_ref_elem.seq = 0;
>  	h->type = type;
> +	h->allocating_chunk = false;
>  	INIT_LIST_HEAD(&h->qgroup_ref_list);
>  	INIT_LIST_HEAD(&h->new_bgs);
>  
> diff --git a/fs/btrfs/transaction.h b/fs/btrfs/transaction.h
> index 0e8aa1e..69700f7 100644
> --- a/fs/btrfs/transaction.h
> +++ b/fs/btrfs/transaction.h
> @@ -68,6 +68,7 @@ struct btrfs_trans_handle {
>  	struct btrfs_block_rsv *orig_rsv;
>  	short aborted;
>  	short adding_csums;
> +	bool allocating_chunk;
>  	enum btrfs_trans_type type;
>  	/*
>  	 * this root is only needed to validate that the root passed to--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Josef Bacik

2012-Dec-18 15:40 UTC

head link

Re: [PATCH] Btrfs: fix a deadlock on chunk mutex

On Tue, Dec 18, 2012 at 07:47:51AM -0700, Liu Bo wrote:> On Tue, Dec 18, 2012 at 08:52:42AM -0500, Josef Bacik wrote:
> > On Wed, Dec 12, 2012 at 06:52:37PM -0700, Liu Bo wrote:
> > > An user reported that he has hit an annoying deadlock while
playing with
> > > ceph based on btrfs.
> > > 
> > > Current updating device tree requires space from METADATA chunk,
> > > so we -may- need to do a recursive chunk allocation when
adding/updating
> > > dev extent, that is where the deadlock comes from.
> > > 
> > > If we use SYSTEM metadata to update device tree, we can avoid the
recursive
> > > stuff.
> > > 
> > 
> > This is going to cause us to allocate much more system chunks than we
used to
> > which could land us in trouble.  Instead let''s just keep us
from re-entering if
> > we''re already allocating a chunk.  We do the chunk allocation
when we don''t have
> > enough space for a cluster, but we''ll likely have plenty of
space to make an
> > allocation.  Can you give this patch a try Jim and see if it fixes
your problem?
> > Thanks,
> 
> From the stack info Jim gave, returning ENOSPC to caller will end up with
> aborting to readonly if there is no others save the situation by 
> allocating another METADATA chunk, it is recursive allocation though.
> 
if (ret < 0 && ret != -ENOSPC)

it shouldn''t abort, it should just drop empty_size and stop trying to
allocate a
cluster and just allocate the blocks needed, and this is only for the recursive
chunk allocation, so after this succeeds we''ll have a new chunk and the
original
allocation will be able to carry on.  Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jim Schutt

2013-Jan-03 18:44 UTC

head link

Re: [PATCH] Btrfs: fix a deadlock on chunk mutex

Hi Josef,

Thanks for the patch - sorry for the long delay in testing...


On 12/18/2012 06:52 AM, Josef Bacik wrote:> On Wed, Dec 12, 2012 at 06:52:37PM -0700, Liu Bo wrote:
>> An user reported that he has hit an annoying deadlock while playing
with
>> ceph based on btrfs.
>>
>> Current updating device tree requires space from METADATA chunk,
>> so we -may- need to do a recursive chunk allocation when
adding/updating
>> dev extent, that is where the deadlock comes from.
>>
>> If we use SYSTEM metadata to update device tree, we can avoid the
recursive
>> stuff.
>>
> 
> This is going to cause us to allocate much more system chunks than we used
to
> which could land us in trouble.  Instead let''s just keep us from
re-entering if
> we''re already allocating a chunk.  We do the chunk allocation when
we don''t have
> enough space for a cluster, but we''ll likely have plenty of space
to make an
> allocation.  Can you give this patch a try Jim and see if it fixes your
problem?
> Thanks,
> 
> Josef
> 
With your patch applied to 3.7.1, I get the following on one
of my servers running Ceph OSDs.  The end effect is that some
of my ceph client writes hang. 

[ 1440.335752] ------------[ cut here ]------------
[ 1440.340602] WARNING: at fs/btrfs/super.c:246
__btrfs_abort_transaction+0x60/0x110 [btrfs]()
[ 1440.349117] Hardware name: X8DTH-i/6/iF/6F
[ 1440.353252] Modules linked in: btrfs zlib_deflate ib_ipoib rdma_ucm ib_ucm
ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 dm_mirror dm_region_hash
dm_log dm_round_robin dm_multipath scsi_dh vhost_net macvtap macvlan tun uinput
sg joydev sd_mod iTCO_wdt iTCO_vendor_support hid_generic button ata_piix libata
coretemp kvm crc32c_intel ghash_clmulni_intel aesni_intel ablk_helper cryptd lrw
aes_x86_64 xts gf128mul microcode mpt2sas scsi_transport_sas raid_class scsi_mod
serio_raw pcspkr mlx4_ib ib_sa ib_mad ib_core mlx4_en mlx4_core cxgb4 i2c_i801
i2c_core lpc_ich mfd_core ehci_hcd uhci_hcd ioatdma i7core_edac dm_mod edac_core
nfsv4 auth_rpcgss nfsv3 nfs_acl nfsv2 nfs lockd sunrpc fscache broadcom tg3
hwmon bnx2 igb dca e1000
[ 1440.419398] Pid: 48686, comm: ceph-osd Not tainted 3.7.1-00006-gc794580 #484
[ 1440.426614] Call Trace:
[ 1440.429083]  [<ffffffff8103fed4>] warn_slowpath_common+0x94/0xc0
[ 1440.435110]  [<ffffffff8103ffb6>] warn_slowpath_fmt+0x46/0x50
[ 1440.440894]  [<ffffffffa05425c0>] __btrfs_abort_transaction+0x60/0x110
[btrfs]
[ 1440.448135]  [<ffffffffa059513d>] __btrfs_alloc_chunk+0x6cd/0x750
[btrfs]
[ 1440.454941]  [<ffffffffa059521e>] btrfs_alloc_chunk+0x5e/0x90 [btrfs]
[ 1440.461382]  [<ffffffffa05543a1>] ? check_system_chunk+0x71/0x130
[btrfs]
[ 1440.468188]  [<ffffffffa055474c>] do_chunk_alloc+0x2ec/0x370 [btrfs]
[ 1440.474562]  [<ffffffffa05509e9>] ?
btrfs_reduce_alloc_profile+0xa9/0x120 [btrfs]
[ 1440.482050]  [<ffffffffa055839c>]
btrfs_check_data_free_space+0x13c/0x2b0 [btrfs]
[ 1440.489558]  [<ffffffffa0559f40>]
btrfs_delalloc_reserve_space+0x20/0x60 [btrfs]
[ 1440.497013]  [<ffffffffa057e31e>] __btrfs_buffered_write+0x15e/0x350
[btrfs]
[ 1440.504095]  [<ffffffffa057e849>] btrfs_file_aio_write+0x209/0x320
[btrfs]
[ 1440.511000]  [<ffffffffa057e640>] ? __btrfs_direct_write+0x130/0x130
[btrfs]
[ 1440.518062]  [<ffffffff81164ef4>] do_sync_readv_writev+0x94/0xe0
[ 1440.524105]  [<ffffffff81165f03>] do_readv_writev+0xe3/0x1e0
[ 1440.529792]  [<ffffffff81182ff2>] ? fget_light+0x122/0x170
[ 1440.535275]  [<ffffffff81166046>] vfs_writev+0x46/0x60
[ 1440.540412]  [<ffffffff8116617f>] sys_writev+0x5f/0xc0
[ 1440.545547]  [<ffffffff81264b3e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[ 1440.551987]  [<ffffffff814b7102>] system_call_fastpath+0x16/0x1b
[ 1440.558016] ---[ end trace 764e83a458dabca6 ]---
[ 1440.562662] BTRFS warning (device dm-32): __btrfs_alloc_chunk:3488: Aborting
unused transaction(error 28).
[ 1440.595987] BTRFS warning (device dm-32): find_free_extent:5871: Aborting
unused transaction(Object already exists).
[ 1440.606542] BUG: unable to handle kernel NULL pointer dereference at         
(null)
[ 1440.614382] IP: [<ffffffffa0584e5e>] map_private_extent_buffer+0xe/0xf0
[btrfs]
[ 1440.621704] PGD 6138e8067 PUD 56749f067 PMD 0 
[ 1440.626190] Oops: 0000 [#1] SMP 
[ 1440.629442] Modules linked in: btrfs zlib_deflate ib_ipoib rdma_ucm ib_ucm
ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 dm_mirror dm_region_hash
dm_log dm_round_robin dm_multipath scsi_dh vhost_net macvtap macvlan tun uinput
sg joydev sd_mod iTCO_wdt iTCO_vendor_support hid_generic button ata_piix libata
coretemp kvm crc32c_intel ghash_clmulni_intel aesni_intel ablk_helper cryptd lrw
aes_x86_64 xts gf128mul microcode mpt2sas scsi_transport_sas raid_class scsi_mod
serio_raw pcspkr mlx4_ib ib_sa ib_mad ib_core mlx4_en mlx4_core cxgb4 i2c_i801
i2c_core lpc_ich mfd_core ehci_hcd uhci_hcd ioatdma i7core_edac dm_mod edac_core
nfsv4 auth_rpcgss nfsv3 nfs_acl nfsv2 nfs lockd sunrpc fscache broadcom tg3
hwmon bnx2 igb dca e1000
[ 1440.694855] CPU 16 
[ 1440.696784] Pid: 48687, comm: ceph-osd Tainted: G        W   
3.7.1-00006-gc794580 #484 Supermicro X8DTH-i/6/iF/6F/X8DTH
[ 1440.707803] RIP: 0010:[<ffffffffa0584e5e>]  [<ffffffffa0584e5e>]
map_private_extent_buffer+0xe/0xf0 [btrfs]
[ 1440.717544] RSP: 0018:ffff880b740db9f8  EFLAGS: 00010292
[ 1440.722841] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff880b740dba28
[ 1440.729947] RDX: 0000000000000004 RSI: 0000000000000076 RDI: 0000000000000000
[ 1440.737055] RBP: ffff880b740dba08 R08: ffff880b740dba20 R09: ffff880b740dba18
[ 1440.744167] R10: ffff88092bba8000 R11: ffff880a4138c320 R12: 0000000000000000
[ 1440.751280] R13: 0000000000000065 R14: 0000000000000011 R15: 0000000000000076
[ 1440.758395] FS:  00007fffeb4c3700(0000) GS:ffff880627d40000(0000)
knlGS:0000000000000000
[ 1440.766460] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1440.772188] CR2: 0000000000000000 CR3: 00000004bd2a4000 CR4: 00000000000007e0
[ 1440.779303] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1440.786416] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 1440.793523] Process ceph-osd (pid: 48687, threadinfo ffff880b740da000, task
ffff8808f801bec0)
[ 1440.802018] Stack:
[ 1440.804030]  ffff880b740dbb98 0000000000000000 ffff880b740dba68
ffffffffa0581e3c
[ 1440.811464]  ffff880977dbd030 ffff880c00000002 ffff8808f801c5f0
0000000000000053
[ 1440.818897]  ffff880b740dbae4 ffff880612084c60 0000000000000000
ffff880612084c60
[ 1440.826330] Call Trace:
[ 1440.828800]  [<ffffffffa0581e3c>] btrfs_get_token_32+0x8c/0xf0 [btrfs]
[ 1440.835327]  [<ffffffffa056042d>] btrfs_match_dir_item_name+0x4d/0x140
[btrfs]
[ 1440.842545]  [<ffffffffa0560919>] insert_with_overflow+0x59/0x120
[btrfs]
[ 1440.849315]  [<ffffffffa0560ca6>] btrfs_insert_xattr_item+0xb6/0x1d0
[btrfs]
[ 1440.856343]  [<ffffffffa056d279>] ? join_transaction+0x29/0x370 [btrfs]
[ 1440.862945]  [<ffffffffa056d30f>] ? join_transaction+0xbf/0x370 [btrfs]
[ 1440.869536]  [<ffffffff81159ac3>] ? kmem_cache_alloc+0xd3/0x170
[ 1440.875450]  [<ffffffffa0582b3a>] do_setxattr+0x17a/0x240 [btrfs]
[ 1440.881534]  [<ffffffffa0582c8b>] __btrfs_setxattr+0x8b/0x110 [btrfs]
[ 1440.887965]  [<ffffffffa0582f27>] btrfs_setxattr+0xa7/0xc0 [btrfs]
[ 1440.894130]  [<ffffffff8118a19b>] __vfs_setxattr_noperm+0x7b/0x150
[ 1440.900287]  [<ffffffff8118a2fe>] vfs_setxattr+0x8e/0xc0
[ 1440.905591]  [<ffffffff8118a4e5>] setxattr+0x1b5/0x230
[ 1440.910713]  [<ffffffff81167347>] ? __sb_start_write+0x1b7/0x200
[ 1440.916702]  [<ffffffff81185378>] ? mnt_want_write_file+0x28/0x60
[ 1440.922778]  [<ffffffff81182f40>] ? fget_light+0x70/0x170
[ 1440.928168]  [<ffffffff81185378>] ? mnt_want_write_file+0x28/0x60
[ 1440.934242]  [<ffffffff81182ff2>] ? fget_light+0x122/0x170
[ 1440.939713]  [<ffffffff8118a5ec>] sys_fsetxattr+0x8c/0xe0
[ 1440.945097]  [<ffffffff814b7102>] system_call_fastpath+0x16/0x1b
[ 1440.951083] Code: ef 88 00 00 00 48 89 e5 e8 a0 ff ff ff c9 c3 66 66 66 66 66
2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 53 48 83 ec 08 66 66 66 66 90 <4c>
8b 17 41 81 e2 ff 0f 00 00 4a 8d 04 16 4c 8d 5c 10 ff 48 89
[ 1440.971006] RIP  [<ffffffffa0584e5e>]
map_private_extent_buffer+0xe/0xf0 [btrfs]
[ 1440.978415]  RSP <ffff880b740db9f8>
[ 1440.981896] CR2: 0000000000000000
[ 1440.985557] ---[ end trace 764e83a458dabca7 ]---
[ 1440.990075] divide error: 0000 [#2] SMP 
[ 1440.990133] Modules linked in: btrfs zlib_deflate ib_ipoib rdma_ucm ib_ucm
ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 dm_mirror dm_region_hash
dm_log dm_round_robin dm_multipath scsi_dh vhost_net macvtap macvlan tun uinput
sg joydev sd_mod iTCO_wdt iTCO_vendor_support hid_generic button ata_piix libata
coretemp kvm crc32c_intel ghash_clmulni_intel aesni_intel ablk_helper cryptd lrw
aes_x86_64 xts gf128mul microcode mpt2sas scsi_transport_sas raid_class scsi_mod
serio_raw pcspkr mlx4_ib ib_sa ib_mad ib_core mlx4_en mlx4_core cxgb4 i2c_i801
i2c_core lpc_ich mfd_core ehci_hcd uhci_hcd ioatdma i7core_edac dm_mod edac_core
nfsv4 auth_rpcgss nfsv3 nfs_acl nfsv2 nfs lockd sunrpc fscache broadcom tg3
hwmon bnx2 igb dca e1000
[ 1440.990139] CPU 20 
[ 1440.990139] Pid: 48693, comm: ceph-osd Tainted: G      D W   
3.7.1-00006-gc794580 #484 Supermicro X8DTH-i/6/iF/6F/X8DTH
[ 1440.990163] RIP: 0010:[<ffffffffa059429d>]  [<ffffffffa059429d>]
__btrfs_map_block+0xcd/0x670 [btrfs]
[ 1440.990187] RSP: 0018:ffff880b740f5ad8  EFLAGS: 00010246
[ 1440.990194] RAX: 0000000000800000 RBX: 0000000000800000 RCX: 0000000040000000
[ 1440.990195] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[ 1440.990195] RBP: ffff880b740f5b68 R08: 0000000000000000 R09: 0000000000000000
[ 1440.990196] R10: ffff88062311f6e8 R11: 0000000000000000 R12: ffff880b740f5b90
[ 1440.990200] R13: ffff8805054971c0 R14: ffff880c182f4298 R15: ffff880b740f5e68
[ 1440.990201] FS:  00007fffe6cba700(0000) GS:ffff880c3fd00000(0000)
knlGS:0000000000000000
[ 1440.990202] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 1440.990203] CR2: ffffffffff600400 CR3: 00000004bd2a4000 CR4: 00000000000007e0
[ 1440.990207] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1440.990207] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 1440.990209] Process ceph-osd (pid: 48693, threadinfo ffff880b740f4000, task
ffff8809877d8000)
[ 1440.990209] Stack:
[ 1440.990217]  ffff88092bba8000 ffff880156a22e00 ffff88062311f6e8
ffff880156a23388
[ 1440.990225]  0000000000000000 ffffffff8111365d 0000000000000000
0000000000000000
[ 1440.990230]  00000000740f5b98 0000000000000046 0000000000000000
ffffffff8111365d
[ 1440.990230] Call Trace:
[ 1440.990236]  [<ffffffff8111365d>] ? test_set_page_writeback+0x6d/0x170
[ 1440.990291]  [<ffffffff8111365d>] ? test_set_page_writeback+0x6d/0x170
[ 1440.990307]  [<ffffffffa059484e>] btrfs_map_block+0xe/0x10 [btrfs]
[ 1440.990349]  [<ffffffffa0571307>] btrfs_merge_bio_hook+0x57/0x80
[btrfs]
[ 1440.990458]  [<ffffffffa0585ba3>] submit_extent_page+0xc3/0x1d0 [btrfs]
[ 1440.990487]  [<ffffffff8110a2f0>] ? find_get_pages+0x1c0/0x1c0
[ 1440.990525]  [<ffffffffa058ba7f>] __extent_writepage+0x69f/0x760
[btrfs]
[ 1440.990571]  [<ffffffffa0585ed0>] ? extent_io_tree_init+0x90/0x90
[btrfs]
[ 1440.990680]  [<ffffffffa058bf52>]
extent_write_cache_pages.clone.3+0x242/0x3d0 [btrfs]
[ 1440.990733]  [<ffffffffa058c12f>] extent_writepages+0x4f/0x70 [btrfs]
[ 1440.990784]  [<ffffffffa0577630>] ? btrfs_lookup+0x70/0x70 [btrfs]
[ 1440.990848]  [<ffffffff81182ff2>] ? fget_light+0x122/0x170
[ 1440.990870]  [<ffffffffa0571df7>] btrfs_writepages+0x27/0x30 [btrfs]
[ 1440.990886]  [<ffffffff81115423>] do_writepages+0x23/0x40
[ 1440.990889]  [<ffffffff811099ce>] __filemap_fdatawrite_range+0x4e/0x50
[ 1440.990920]  [<ffffffff81109c83>] filemap_fdatawrite_range+0x13/0x20
[ 1440.990982]  [<ffffffff81195589>] sys_sync_file_range+0x109/0x170
[ 1440.991022]  [<ffffffff814b7102>] system_call_fastpath+0x16/0x1b
[ 1440.991149] Code: 66 0f 1f 44 00 00 4d 8b 6a 60 48 29 c3 8b 45 c4 41 39 45 18
b8 00 00 00 00 0f 4d 45 c4 31 d2 89 45 c4 49 63 75 10 48 89 d8 89 f7 <48>
f7 f7 49 89 c6 48 89 45 c8 4c 0f af f6 4c 39 f3 73 10 0f 0b
[ 1440.991174] RIP  [<ffffffffa059429d>] __btrfs_map_block+0xcd/0x670
[btrfs]
[ 1440.991203]  RSP <ffff880b740f5ad8>
[ 1440.991206] ---[ end trace 764e83a458dabca8 ]---
[ 1451.948155] BUG: unable to handle kernel NULL pointer dereference at
00000000000000a9
[ 1451.956010] IP: [<ffffffffa05949d4>] btrfs_map_bio+0x184/0x220 [btrfs]
[ 1451.962580] PGD 0 
[ 1451.964620] Oops: 0000 [#3] SMP 
[ 1451.967887] Modules linked in: btrfs zlib_deflate ib_ipoib rdma_ucm ib_ucm
ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 dm_mirror dm_region_hash
dm_log dm_round_robin dm_multipath scsi_dh vhost_net macvtap macvlan tun uinput
sg joydev sd_mod iTCO_wdt iTCO_vendor_support hid_generic button ata_piix libata
coretemp kvm crc32c_intel ghash_clmulni_intel aesni_intel ablk_helper cryptd lrw
aes_x86_64 xts gf128mul microcode mpt2sas scsi_transport_sas raid_class scsi_mod
serio_raw pcspkr mlx4_ib ib_sa ib_mad ib_core mlx4_en mlx4_core cxgb4 i2c_i801
i2c_core lpc_ich mfd_core ehci_hcd uhci_hcd ioatdma i7core_edac dm_mod edac_core
nfsv4 auth_rpcgss nfsv3 nfs_acl nfsv2 nfs lockd sunrpc fscache broadcom tg3
hwmon bnx2 igb dca e1000
[ 1452.033336] CPU 5 
[ 1452.035177] Pid: 25627, comm: btrfs-worker-1 Tainted: G      D W   
3.7.1-00006-gc794580 #484 Supermicro X8DTH-i/6/iF/6F/X8DTH
[ 1452.046715] RIP: 0010:[<ffffffffa05949d4>]  [<ffffffffa05949d4>]
btrfs_map_bio+0x184/0x220 [btrfs]
[ 1452.055688] RSP: 0018:ffff88050e967cc8  EFLAGS: 00010202
[ 1452.060987] RAX: 000000000000000c RBX: ffff880959c9ea80 RCX: ffff880959c9ea80
[ 1452.068100] RDX: ffff88060bd03060 RSI: 0000000000000001 RDI: ffff88062311f6e8
[ 1452.075212] RBP: ffff88050e967d28 R08: ffff88060bd03060 R09: 0000000000000009
[ 1452.082327] R10: ffff88062311f6e8 R11: 0000000000000000 R12: 0000000000000001
[ 1452.089442] R13: 0000000000000000 R14: 0000000000000004 R15: ffff88092bba8000
[ 1452.096554] FS:  0000000000000000(0000) GS:ffff880627ca0000(0000)
knlGS:0000000000000000
[ 1452.104621] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 1452.110352] CR2: 00000000000000a9 CR3: 0000000001a0b000 CR4: 00000000000007e0
[ 1452.117466] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1452.124577] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 1452.131693] Process btrfs-worker-1 (pid: 25627, threadinfo ffff88050e966000,
task ffff880612418000)
[ 1452.140707] Stack:
[ 1452.142720]  0000000000000000 000000000040e010 00000001182f5470
0000000100000000
[ 1452.150160]  ffff88060bd03060 000000003f7fe000 ffff88050e967d38
ffff880959c9e7c8
[ 1452.157601]  ffff880959c9e780 ffff880c182f5470 ffff880c182f5428
ffff880c182f5418
[ 1452.165061] Call Trace:
[ 1452.167540]  [<ffffffffa0570bab>] __btrfs_submit_bio_done+0x1b/0x20
[btrfs]
[ 1452.174501]  [<ffffffffa0566a41>] run_one_async_done+0xc1/0xd0 [btrfs]
[ 1452.181027]  [<ffffffffa0596a93>] run_ordered_completions+0x83/0xd0
[btrfs]
[ 1452.187991]  [<ffffffffa05975c8>] worker_loop+0x1b8/0x410 [btrfs]
[ 1452.194087]  [<ffffffffa0597410>] ?
check_pending_worker_creates+0xe0/0xe0 [btrfs]
[ 1452.201639]  [<ffffffff81066df1>] kthread+0xe1/0xf0
[ 1452.206528]  [<ffffffff81066d10>] ? __init_kthread_worker+0x70/0x70
[ 1452.212779]  [<ffffffff814b705c>] ret_from_fork+0x7c/0xb0
[ 1452.218167]  [<ffffffff81066d10>] ? __init_kthread_worker+0x70/0x70
[ 1452.224411] Code: 48 89 51 48 48 8d 14 40 48 8b 45 c0 48 c1 e2 03 48 01 d0 48
8b 40 38 48 c1 e8 09 48 89 01 48 03 55 c0 48 8b 72 30 48 85 f6 74 4c <48>
8b 86 a8 00 00 00 48 85 c0 74 40 41 83 fc 01 75 0a 8b 56 60
[ 1452.244357] RIP  [<ffffffffa05949d4>] btrfs_map_bio+0x184/0x220 [btrfs]
[ 1452.250995]  RSP <ffff88050e967cc8>
[ 1452.254485] CR2: 00000000000000a9
[ 1452.258149] ---[ end trace 764e83a458dabca9 ]---


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Josef Bacik

2013-Jan-28 21:23 UTC

head link

Re: [PATCH] Btrfs: fix a deadlock on chunk mutex

On Thu, Jan 03, 2013 at 11:44:46AM -0700, Jim Schutt
wrote:> Hi Josef,
> 
> Thanks for the patch - sorry for the long delay in testing...
> 
Jim,

I''ve been trying to reason out how this happens, could you do a btrfs
fi df on
the filesystem thats giving you trouble so I can see if what I think is
happening is what''s actually happening.  Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jim Schutt

2013-Jan-28 21:58 UTC

head link

Re: [PATCH] Btrfs: fix a deadlock on chunk mutex

On 01/28/2013 02:23 PM, Josef Bacik wrote:> On Thu, Jan 03, 2013 at 11:44:46AM -0700, Jim Schutt wrote:
>> Hi Josef,
>>
>> Thanks for the patch - sorry for the long delay in testing...
>>
> 
> Jim,
> 
> I''ve been trying to reason out how this happens, could you do a
btrfs fi df on
> the filesystem thats giving you trouble so I can see if what I think is
> happening is what''s actually happening.  Thanks,
Sure - it''ll take me a bit to set the test up again.

-- Jim
> 
> Josef
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Liu Bo

2013-Jan-29 02:30 UTC

head link

Re: [PATCH] Btrfs: fix a deadlock on chunk mutex

On Mon, Jan 28, 2013 at 04:23:31PM -0500, Josef Bacik
wrote:> On Thu, Jan 03, 2013 at 11:44:46AM -0700, Jim Schutt wrote:
> > Hi Josef,
> > 
> > Thanks for the patch - sorry for the long delay in testing...
> > 
> 
> Jim,
> 
> I''ve been trying to reason out how this happens, could you do a
btrfs fi df on
> the filesystem thats giving you trouble so I can see if what I think is
> happening is what''s actually happening.  Thanks,
Josef,

A quick reproducer here: running xfstests 251 with autodefrag,compress=zlib

thanks,
liubo
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Josef Bacik

2013-Jan-29 13:47 UTC

head link

Re: [PATCH] Btrfs: fix a deadlock on chunk mutex

On Mon, Jan 28, 2013 at 07:30:09PM -0700, Liu Bo wrote:> On Mon, Jan 28, 2013 at 04:23:31PM -0500, Josef Bacik wrote:
> > On Thu, Jan 03, 2013 at 11:44:46AM -0700, Jim Schutt wrote:
> > > Hi Josef,
> > > 
> > > Thanks for the patch - sorry for the long delay in testing...
> > > 
> > 
> > Jim,
> > 
> > I''ve been trying to reason out how this happens, could you do
a btrfs fi df on
> > the filesystem thats giving you trouble so I can see if what I think
is
> > happening is what''s actually happening.  Thanks,
> 
> Josef,
> 
> A quick reproducer here: running xfstests 251 with autodefrag,compress=zlib
> 

251      [not run] FSTRIM is not supported

Are you sure its 251?  Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Josef Bacik

2013-Jan-29 13:50 UTC

head link

Re: [PATCH] Btrfs: fix a deadlock on chunk mutex

On Tue, Jan 29, 2013 at 08:47:30AM -0500, Josef Bacik
wrote:> On Mon, Jan 28, 2013 at 07:30:09PM -0700, Liu Bo wrote:
> > On Mon, Jan 28, 2013 at 04:23:31PM -0500, Josef Bacik wrote:
> > > On Thu, Jan 03, 2013 at 11:44:46AM -0700, Jim Schutt wrote:
> > > > Hi Josef,
> > > > 
> > > > Thanks for the patch - sorry for the long delay in
testing...
> > > > 
> > > 
> > > Jim,
> > > 
> > > I''ve been trying to reason out how this happens, could
you do a btrfs fi df on
> > > the filesystem thats giving you trouble so I can see if what I
think is
> > > happening is what''s actually happening.  Thanks,
> > 
> > Josef,
> > 
> > A quick reproducer here: running xfstests 251 with
autodefrag,compress=zlib
> > 
> 
> 
> 251      [not run] FSTRIM is not supported
> 
> Are you sure its 251?  Thanks,
Sorry it''s early, I need a device that does trim.  /me waits for his
fusion card
to get back from the shop,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

David Sterba

2013-Jan-29 16:43 UTC

head link

Re: [PATCH] Btrfs: fix a deadlock on chunk mutex

On Tue, Jan 29, 2013 at 08:50:34AM -0500, Josef Bacik
wrote:> On Tue, Jan 29, 2013 at 08:47:30AM -0500, Josef Bacik wrote:
> > 251      [not run] FSTRIM is not supported
> > 
> > Are you sure its 251?  Thanks,
> 
> Sorry it''s early, I need a device that does trim.  /me waits for
his fusion card
> to get back from the shop,
You can use scsi_debug device with

parm:           lbpu:enable LBP, support UNMAP command (def=0) (int)

david
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

David Sterba

2013-Jan-29 16:52 UTC

head link

Re: [PATCH] Btrfs: fix a deadlock on chunk mutex

On Tue, Jan 29, 2013 at 05:43:31PM +0100, David Sterba
wrote:> On Tue, Jan 29, 2013 at 08:50:34AM -0500, Josef Bacik wrote:
> You can use scsi_debug device with
> 
> parm:           lbpu:enable LBP, support UNMAP command (def=0) (int)
Also, loop device with a file backed by a filesystem with hole punch
support also understands TRIM.

david
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jim Schutt

2013-Jan-29 18:41 UTC

head link

Re: [PATCH] Btrfs: fix a deadlock on chunk mutex

On 01/28/2013 02:23 PM, Josef Bacik wrote:> On Thu, Jan 03, 2013 at 11:44:46AM -0700, Jim Schutt wrote:
>> Hi Josef,
>>
>> Thanks for the patch - sorry for the long delay in testing...
>>
> 
> Jim,
> 
> I''ve been trying to reason out how this happens, could you do a
btrfs fi df on
> the filesystem thats giving you trouble so I can see if what I think is
> happening is what''s actually happening.  Thanks,
Here''s an example, using a slightly different kernel than
my previous report.  It''s your btrfs-next master branch
(commit 8f139e59d5 "Btrfs: use bit operation for ->fs_state")
with ceph 3.8 for-linus (commit 0fa6ebc600 from linus'' tree).


Here I''m finding the file system in question:

# ls -l /dev/mapper | grep dm-93
lrwxrwxrwx 1 root root       8 Jan 29 11:13 cs53s19p2 -> ../dm-93

# df -h | grep -A 1 cs53s19p2
/dev/mapper/cs53s19p2
                      896G  1.1G  896G   1% /ram/mnt/ceph/data.osd.522


Here''s the info you asked for:

# btrfs fi df /ram/mnt/ceph/data.osd.522
Data: total=2.01GB, used=1.00GB
System: total=4.00MB, used=64.00KB
Metadata: total=8.00MB, used=7.56MB


And here''s the backtrace that had trouble on dm-93.
It''s a little different to my previous report:

[  705.496463] ------------[ cut here ]------------
[  705.501123] WARNING: at fs/btrfs/super.c:256
__btrfs_abort_transaction+0x60/0x110 [btrfs]()
[  705.509751] Hardware name: X8DTH-i/6/iF/6F
[  705.513862] Modules linked in: btrfs zlib_deflate ib_ipoib rdma_ucm ib_ucm
ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 dm_mirror dm_region_hash
dm_log dm_round_robin dm_multipath scsi_dh vhost_net macvtap macvlan tun uinput
sg joydev sd_mod hid_generic iTCO_wdt iTCO_vendor_support coretemp kvm
crc32c_intel ghash_clmulni_intel aesni_intel ablk_helper cryptd lrw aes_x86_64
xts gf128mul microcode serio_raw pcspkr mlx4_ib ib_sa ib_mad ib_core mlx4_en
mlx4_core ata_piix libata mpt2sas scsi_transport_sas raid_class scsi_mod cxgb4
i2c_i801 i2c_core button lpc_ich mfd_core ehci_hcd uhci_hcd i7core_edac
edac_core dm_mod ioatdma nfsv4 auth_rpcgss nfsv3 nfs_acl nfsv2 nfs lockd sunrpc
fscache broadcom tg3 hwmon bnx2 igb dca e1000
[  705.580232] Pid: 33025, comm: ceph-osd Not tainted 3.7.0-00269-gd9acbfd #492
[  705.587488] Call Trace:
[  705.589957]  [<ffffffff8103ff04>] warn_slowpath_common+0x94/0xc0
[  705.596108]  [<ffffffffa055331a>] ? btrfs_free_path+0x2a/0x40 [btrfs]
[  705.602685]  [<ffffffff8103ffe6>] warn_slowpath_fmt+0x46/0x50
[  705.608563]  [<ffffffffa054c730>] __btrfs_abort_transaction+0x60/0x110
[btrfs]
[  705.615994]  [<ffffffffa05a2058>] __btrfs_alloc_chunk+0x678/0x710
[btrfs]
[  705.622945]  [<ffffffffa05a214e>] btrfs_alloc_chunk+0x5e/0x90 [btrfs]
[  705.629635]  [<ffffffffa055edb1>] ? check_system_chunk+0x71/0x130
[btrfs]
[  705.637079]  [<ffffffffa055f15c>] do_chunk_alloc+0x2ec/0x370 [btrfs]
[  705.643451]  [<ffffffffa055b199>] ?
btrfs_reduce_alloc_profile+0xa9/0x120 [btrfs]
[  705.650951]  [<ffffffffa0561d1c>]
btrfs_check_data_free_space+0x13c/0x2b0 [btrfs]
[  705.658446]  [<ffffffffa0564a70>]
btrfs_delalloc_reserve_space+0x20/0x60 [btrfs]
[  705.665882]  [<ffffffffa058980e>] __btrfs_buffered_write+0x15e/0x340
[btrfs]
[  705.672952]  [<ffffffffa0589e29>] btrfs_file_aio_write+0x309/0x450
[btrfs]
[  705.679889]  [<ffffffffa0589b20>] ? __btrfs_direct_write+0x130/0x130
[btrfs]
[  705.686934]  [<ffffffff811626f4>] do_sync_readv_writev+0x94/0xe0
[  705.692942]  [<ffffffff811637b3>] do_readv_writev+0xe3/0x1e0
[  705.698604]  [<ffffffff81180c42>] ? fget_light+0x122/0x170
[  705.704093]  [<ffffffff811638f6>] vfs_writev+0x46/0x60
[  705.709239]  [<ffffffff81163a2f>] sys_writev+0x5f/0xc0
[  705.714388]  [<ffffffff812637ee>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[  705.720827]  [<ffffffff814b7882>] system_call_fastpath+0x16/0x1b
[  705.726829] ---[ end trace 6e889d6d939ca116 ]---
[  705.731459] BTRFS warning (device dm-93): __btrfs_alloc_chunk:3787: Aborting
unused transaction(error 28).
[  705.741187] btrfs: mapping failed logical 1099431936 bio len 524288 len 65536
[  705.741192] BTRFS warning (device dm-93): find_free_extent:5948: Aborting
unused transaction(Object already exists).
[  705.759185] ------------[ cut here ]------------
[  705.763929] kernel BUG at fs/btrfs/volumes.c:4891!
[  705.768990] invalid opcode: 0000 [#1] SMP 
[  705.773561] Modules linked in: btrfs zlib_deflate ib_ipoib rdma_ucm ib_ucm
ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 dm_mirror dm_region_hash
dm_log dm_round_robin dm_multipath scsi_dh vhost_net macvtap macvlan tun uinput
sg joydev sd_mod hid_generic iTCO_wdt iTCO_vendor_support coretemp kvm
crc32c_intel ghash_clmulni_intel aesni_intel ablk_helper cryptd lrw aes_x86_64
xts gf128mul microcode serio_raw pcspkr mlx4_ib ib_sa ib_mad ib_core mlx4_en
mlx4_core ata_piix libata mpt2sas scsi_transport_sas raid_class scsi_mod cxgb4
i2c_i801 i2c_core button lpc_ich mfd_core ehci_hcd uhci_hcd i7core_edac
edac_core dm_mod ioatdma nfsv4 auth_rpcgss nfsv3 nfs_acl nfsv2 nfs lockd sunrpc
fscache broadcom tg3 hwmon bnx2 igb dca e1000
[  705.845121] CPU 22 
[  705.847114] Pid: 21317, comm: btrfs-worker-1 Tainted: G        W   
3.7.0-00269-gd9acbfd #492 Supermicro X8DTH-i/6/iF/6F/X8DTH
[  705.858886] RIP: 0010:[<ffffffffa05a2f0d>]  [<ffffffffa05a2f0d>]
btrfs_map_bio+0x8d/0x300 [btrfs]
[  705.867928] RSP: 0018:ffff880610ce7c58  EFLAGS: 00010296
[  705.873363] RAX: 0000000000000041 RBX: ffff88061c368480 RCX: 0000000000009291
[  705.880692] RDX: 0000000000000091 RSI: 0000000000000001 RDI: ffffffff81a21a40
[  705.888315] RBP: ffff880610ce7d08 R08: 0000000000000001 R09: 0000000000000001
[  705.895805] R10: 00000000000007ca R11: 0000000000000001 R12: 0000000041880000
[  705.903139] R13: 0000000000080000 R14: ffff880c12621468 R15: ffff880c12621458
[  705.910467] FS:  0000000000000000(0000) GS:ffff880c3fd40000(0000)
knlGS:0000000000000000
[  705.918978] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  705.925036] CR2: ffffffffff600400 CR3: 0000000001a0b000 CR4: 00000000000007e0
[  705.932406] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  705.939818] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  705.947461] Process btrfs-worker-1 (pid: 21317, threadinfo ffff880610ce6000,
task ffff880613b1bec0)
[  705.957264] Stack:
[  705.959806]  ffff8805e0f64000 ffff8808e5b12188 ffff880613b1c578
000004aa11555000
[  705.970044]  ffff880c00000000 ffff880c126214b0 0000000100000000
ffff8805eddd2000
[  705.979630]  0000000000000001 0000000100000411 ffff880610ce7d28
0000000000000246
[  705.989568] Call Trace:
[  705.992386]  [<ffffffffa05a3cf0>] ? run_ordered_completions+0x40/0xd0
[btrfs]
[  706.000651]  [<ffffffffa057bd43>] __btrfs_submit_bio_done+0x23/0x40
[btrfs]
[  706.008210]  [<ffffffffa0570ba1>] run_one_async_done+0xc1/0xd0 [btrfs]
[  706.015049]  [<ffffffffa05a3d33>] run_ordered_completions+0x83/0xd0
[btrfs]
[  706.022246]  [<ffffffffa05a4868>] worker_loop+0x1b8/0x410 [btrfs]
[  706.028930]  [<ffffffffa05a46b0>] ?
check_pending_worker_creates+0xe0/0xe0 [btrfs]
[  706.037561]  [<ffffffff81067561>] kthread+0xe1/0xf0
[  706.042896]  [<ffffffff81067480>] ? __init_kthread_worker+0x70/0x70
[  706.049524]  [<ffffffff814b77dc>] ret_from_fork+0x7c/0xb0
[  706.055314]  [<ffffffff81067480>] ? __init_kthread_worker+0x70/0x70
[  706.062429] Code: 56 02 00 00 48 8b 45 c0 48 8b 4d c8 8b 50 28 49 39 cd 89 55
9c 76 1f 4c 89 ea 4c 89 e6 48 c7 c7 e8 a6 5e a0 31 c0 e8 93 84 f0 e0 <0f>
0b 90 eb fe 66 0f 1f 44 00 00 48 89 58 10 48 8b 53 48 48 8b
[  706.090905] RIP  [<ffffffffa05a2f0d>] btrfs_map_bio+0x8d/0x300 [btrfs]
[  706.098098]  RSP <ffff880610ce7c58>
[  706.102125] ---[ end trace 6e889d6d939ca117 ]---

-- Jim
> 
> Josef
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Josef Bacik

2013-Jan-29 20:04 UTC

head link

Re: [PATCH] Btrfs: fix a deadlock on chunk mutex

On Tue, Jan 29, 2013 at 11:41:10AM -0700, Jim Schutt
wrote:> On 01/28/2013 02:23 PM, Josef Bacik wrote:
> > On Thu, Jan 03, 2013 at 11:44:46AM -0700, Jim Schutt wrote:
> >> Hi Josef,
> >>
> >> Thanks for the patch - sorry for the long delay in testing...
> >>
> > 
> > Jim,
> > 
> > I''ve been trying to reason out how this happens, could you do
a btrfs fi df on
> > the filesystem thats giving you trouble so I can see if what I think
is
> > happening is what''s actually happening.  Thanks,
> 
> Here''s an example, using a slightly different kernel than
> my previous report.  It''s your btrfs-next master branch
> (commit 8f139e59d5 "Btrfs: use bit operation for ->fs_state")
> with ceph 3.8 for-linus (commit 0fa6ebc600 from linus'' tree).
> 
> 
> Here I''m finding the file system in question:
> 
> # ls -l /dev/mapper | grep dm-93
> lrwxrwxrwx 1 root root       8 Jan 29 11:13 cs53s19p2 -> ../dm-93
> 
> # df -h | grep -A 1 cs53s19p2
> /dev/mapper/cs53s19p2
>                       896G  1.1G  896G   1% /ram/mnt/ceph/data.osd.522
> 
> 
> Here''s the info you asked for:
> 
> # btrfs fi df /ram/mnt/ceph/data.osd.522
> Data: total=2.01GB, used=1.00GB
> System: total=4.00MB, used=64.00KB
> Metadata: total=8.00MB, used=7.56MB
> 
How big is the disk you are using, and what mount options?  I have a patch to
keep the panic from happening and hopefully the abort, could you try this?  I
still want to keep the underlying error from happening because it
shouldn''t be,
but no reason I can''t fix the error case while you can easily reproduce
it :).
Thanks,

Josef

From c50b725c74c7d39064e553ef85ac9753efbd8aec Mon Sep 17 00:00:00 2001
From: Josef Bacik <jbacik@fusionio.com>
Date: Tue, 29 Jan 2013 15:03:37 -0500
Subject: [PATCH] Btrfs: fix chunk allocation error handling

If we error out allocating a dev extent we will have already created the
block group and such which will cause problems since the allocator may have
tried to allocate out of the block group that no longer exists.  This will
cause BUG_ON()''s in the bio submission path.  This also makes a failure
to
allocate a dev extent a non-abort error, we will just clean up the dev
extents we did allocate and exit.  Now if we fail to delete the dev extents
we will abort since we can''t have half of the dev extents hanging
around,
but this will make us much less likely to abort.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
---
 fs/btrfs/volumes.c |   32 ++++++++++++++++++++++----------
 1 files changed, 22 insertions(+), 10 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 4f8c281..2ba5b84 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -3766,12 +3766,6 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle
*trans,
 	if (ret)
 		goto error;
 
-	ret = btrfs_make_block_group(trans, extent_root, 0, type,
-				     BTRFS_FIRST_CHUNK_TREE_OBJECTID,
-				     start, num_bytes);
-	if (ret)
-		goto error;
-
 	for (i = 0; i < map->num_stripes; ++i) {
 		struct btrfs_device *device;
 		u64 dev_offset;
@@ -3783,15 +3777,33 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle
*trans,
 				info->chunk_root->root_key.objectid,
 				BTRFS_FIRST_CHUNK_TREE_OBJECTID,
 				start, dev_offset, stripe_size);
-		if (ret) {
-			btrfs_abort_transaction(trans, extent_root, ret);
-			goto error;
-		}
+		if (ret)
+			goto error_dev_extent;
+	}
+
+	ret = btrfs_make_block_group(trans, extent_root, 0, type,
+				     BTRFS_FIRST_CHUNK_TREE_OBJECTID,
+				     start, num_bytes);
+	if (ret) {
+		i = map->num_stripes - 1;
+		goto error_dev_extent;
 	}
 
 	kfree(devices_info);
 	return 0;
 
+error_dev_extent:
+	for (; i >= 0; i--) {
+		struct btrfs_device *device;
+		int err;
+
+		device = map->stripes[i].dev;
+		err = btrfs_free_dev_extent(trans, device, start);
+		if (err) {
+			btrfs_abort_transaction(trans, extent_root, err);
+			break;
+		}
+	}
 error:
 	kfree(map);
 	kfree(devices_info);
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jim Schutt

2013-Jan-29 20:37 UTC

head link

Re: [PATCH] Btrfs: fix a deadlock on chunk mutex

On 01/29/2013 01:04 PM, Josef Bacik wrote:> On Tue, Jan 29, 2013 at 11:41:10AM -0700, Jim Schutt wrote:
>> On 01/28/2013 02:23 PM, Josef Bacik wrote:
>>> On Thu, Jan 03, 2013 at 11:44:46AM -0700, Jim Schutt wrote:
>>>> Hi Josef,
>>>>
>>>> Thanks for the patch - sorry for the long delay in testing...
>>>>
>>>
>>> Jim,
>>>
>>> I''ve been trying to reason out how this happens, could you
do a btrfs fi df on
>>> the filesystem thats giving you trouble so I can see if what I
think is
>>> happening is what''s actually happening.  Thanks,
>>
>> Here''s an example, using a slightly different kernel than
>> my previous report.  It''s your btrfs-next master branch
>> (commit 8f139e59d5 "Btrfs: use bit operation for
->fs_state")
>> with ceph 3.8 for-linus (commit 0fa6ebc600 from linus'' tree).
>>
>>
>> Here I''m finding the file system in question:
>>
>> # ls -l /dev/mapper | grep dm-93
>> lrwxrwxrwx 1 root root       8 Jan 29 11:13 cs53s19p2 -> ../dm-93
>>
>> # df -h | grep -A 1 cs53s19p2
>> /dev/mapper/cs53s19p2
>>                       896G  1.1G  896G   1% /ram/mnt/ceph/data.osd.522
>>
>>
>> Here''s the info you asked for:
>>
>> # btrfs fi df /ram/mnt/ceph/data.osd.522
>> Data: total=2.01GB, used=1.00GB
>> System: total=4.00MB, used=64.00KB
>> Metadata: total=8.00MB, used=7.56MB
>>
> 
> How big is the disk you are using, and what mount options? 
The partition is ~900 GiB, and the mount options according
to /proc/mount are: rw,noatime,nospace_cache

Also, in case it matters, I build the file systems
with -l 65536 -n 65536.
> I have a patch to
> keep the panic from happening and hopefully the abort, could you try this? 
I
> still want to keep the underlying error from happening because it
shouldn''t be,
> but no reason I can''t fix the error case while you can easily
reproduce it :).
I''m happy to try it - but I probably won''t have results
for you until tomorrow, due to other time pressures.

Thanks for taking a look.

-- Jim
> Thanks,
> 
> Josef
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jim Schutt

2013-Jan-29 23:05 UTC

head link

Re: [PATCH] Btrfs: fix a deadlock on chunk mutex

On 01/29/2013 01:04 PM, Josef Bacik wrote:> On Tue, Jan 29, 2013 at 11:41:10AM -0700, Jim Schutt wrote:
>> > On 01/28/2013 02:23 PM, Josef Bacik wrote:
>>> > > On Thu, Jan 03, 2013 at 11:44:46AM -0700, Jim Schutt
wrote:
>>>> > >> Hi Josef,
>>>> > >>
>>>> > >> Thanks for the patch - sorry for the long delay
in testing...
>>>> > >>
>>> > > 
>>> > > Jim,
>>> > > 
>>> > > I''ve been trying to reason out how this happens,
could you do a btrfs fi df on
>>> > > the filesystem thats giving you trouble so I can see if
what I think is
>>> > > happening is what''s actually happening.  Thanks,
>> > 
>> > Here''s an example, using a slightly different kernel than
>> > my previous report.  It''s your btrfs-next master branch
>> > (commit 8f139e59d5 "Btrfs: use bit operation for
->fs_state")
>> > with ceph 3.8 for-linus (commit 0fa6ebc600 from linus''
tree).
>> > 
>> > 
>> > Here I''m finding the file system in question:
>> > 
>> > # ls -l /dev/mapper | grep dm-93
>> > lrwxrwxrwx 1 root root       8 Jan 29 11:13 cs53s19p2 ->
../dm-93
>> > 
>> > # df -h | grep -A 1 cs53s19p2
>> > /dev/mapper/cs53s19p2
>> >                       896G  1.1G  896G   1%
/ram/mnt/ceph/data.osd.522
>> > 
>> > 
>> > Here''s the info you asked for:
>> > 
>> > # btrfs fi df /ram/mnt/ceph/data.osd.522
>> > Data: total=2.01GB, used=1.00GB
>> > System: total=4.00MB, used=64.00KB
>> > Metadata: total=8.00MB, used=7.56MB
>> > 
> How big is the disk you are using, and what mount options?  I have a patch
to
> keep the panic from happening and hopefully the abort, could you try this? 
I
> still want to keep the underlying error from happening because it
shouldn''t be,
> but no reason I can''t fix the error case while you can easily
reproduce it :).
> Thanks,
> 
> Josef
> 
>>From c50b725c74c7d39064e553ef85ac9753efbd8aec Mon Sep 17 00:00:00 2001
> From: Josef Bacik <jbacik@fusionio.com>
> Date: Tue, 29 Jan 2013 15:03:37 -0500
> Subject: [PATCH] Btrfs: fix chunk allocation error handling
> 
> If we error out allocating a dev extent we will have already created the
> block group and such which will cause problems since the allocator may have
> tried to allocate out of the block group that no longer exists.  This will
> cause BUG_ON()''s in the bio submission path.  This also makes a
failure to
> allocate a dev extent a non-abort error, we will just clean up the dev
> extents we did allocate and exit.  Now if we fail to delete the dev extents
> we will abort since we can''t have half of the dev extents hanging
around,
> but this will make us much less likely to abort.  Thanks,
> 
> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
> ---
Interesting - with your patch applied I triggered the following, just
bringing up a fresh Ceph filesystem - I didn''t even get a chance to
mount it on my Ceph clients:

[ 6419.450179] BTRFS error (device dm-73) in btrfs_free_dev_extent:1115: error
28 (Slot search failed)
[ 6419.459223] btrfs is forced readonly
[ 6419.462805] ------------[ cut here ]------------
[ 6419.467440] WARNING: at fs/btrfs/super.c:256
__btrfs_abort_transaction+0x60/0x110 [btrfs]()
[ 6419.475809] Hardware name: X8DTH-i/6/iF/6F
[ 6419.479914] Modules linked in: btrfs zlib_deflate ib_ipoib rdma_ucm ib_ucm
ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 dm_mirror dm_region_hash
dm_log dm_round_robin dm_multipath scsi_dh vhost_net macvtap macvlan tun uinput
sg joydev sd_mod iTCO_wdt iTCO_vendor_support hid_generic coretemp kvm
crc32c_intel ghash_clmulni_intel aesni_intel ablk_helper cryptd lrw aes_x86_64
xts gf128mul microcode button ata_piix libata mpt2sas scsi_transport_sas
raid_class scsi_mod serio_raw pcspkr mlx4_ib ib_sa ib_mad ib_core mlx4_en
mlx4_core cxgb4 i2c_i801 i2c_core lpc_ich mfd_core uhci_hcd ehci_hcd i7core_edac
edac_core ioatdma dm_mod nfsv4 auth_rpcgss nfsv3 nfs_acl nfsv2 nfs lockd sunrpc
fscache broadcom tg3 hwmon bnx2 igb dca e1000
[ 6419.546095] Pid: 107593, comm: ceph-osd Not tainted 3.7.0-00270-g8353482 #494
[ 6419.553227] Call Trace:
[ 6419.555697]  [<ffffffff8103ff04>] warn_slowpath_common+0x94/0xc0
[ 6419.561708]  [<ffffffff8103ffe6>] warn_slowpath_fmt+0x46/0x50
[ 6419.567491]  [<ffffffffa0542730>] __btrfs_abort_transaction+0x60/0x110
[btrfs]
[ 6419.574746]  [<ffffffffa05980c6>] __btrfs_alloc_chunk+0x6e6/0x770
[btrfs]
[ 6419.581553]  [<ffffffffa05981ae>] btrfs_alloc_chunk+0x5e/0x90 [btrfs]
[ 6419.588017]  [<ffffffffa0554db1>] ? check_system_chunk+0x71/0x130
[btrfs]
[ 6419.594824]  [<ffffffffa055515c>] do_chunk_alloc+0x2ec/0x370 [btrfs]
[ 6419.601188]  [<ffffffffa055e06c>] find_free_extent+0xaac/0xbe0 [btrfs]
[ 6419.607733]  [<ffffffffa055e222>] btrfs_reserve_extent+0x82/0x190
[btrfs]
[ 6419.614545]  [<ffffffffa055e3b5>] btrfs_alloc_free_block+0x85/0x230
[btrfs]
[ 6419.621530]  [<ffffffffa0586e55>] ? check_buffer_tree_ref+0x25/0x50
[btrfs]
[ 6419.628512]  [<ffffffffa0549bca>] __btrfs_cow_block+0x14a/0x4b0 [btrfs]
[ 6419.635155]  [<ffffffffa05a261c>] ? btrfs_try_tree_write_lock+0x3c/0xa0
[btrfs]
[ 6419.642475]  [<ffffffffa05a2c43>] ?
btrfs_set_lock_blocking_rw+0xe3/0x160 [btrfs]
[ 6419.649970]  [<ffffffffa054a5b1>] btrfs_cow_block+0x161/0x200 [btrfs]
[ 6419.656424]  [<ffffffffa054d679>] btrfs_search_slot+0x399/0x760 [btrfs]
[ 6419.663050]  [<ffffffffa0573f79>]
btrfs_truncate_inode_items+0x179/0x710 [btrfs]
[ 6419.670458]  [<ffffffffa0584ad5>] ?
btrfs_add_ordered_operation+0x55/0xb0 [btrfs]
[ 6419.677961]  [<ffffffffa0575fcd>] btrfs_truncate+0x16d/0x2c0 [btrfs]
[ 6419.684328]  [<ffffffffa057a441>] btrfs_setsize+0x151/0x190 [btrfs]
[ 6419.690601]  [<ffffffff8117eb4a>] ? notify_change+0xaa/0x2e0
[ 6419.696274]  [<ffffffffa057a4e6>] btrfs_setattr+0x66/0xd0 [btrfs]
[ 6419.702373]  [<ffffffff8117eca2>] notify_change+0x202/0x2e0
[ 6419.707949]  [<ffffffff81161f5f>] do_truncate+0x6f/0x90
[ 6419.713174]  [<ffffffff811620dd>] do_sys_truncate+0x15d/0x170
[ 6419.718919]  [<ffffffff811620fe>] sys_truncate+0xe/0x10
[ 6419.724139]  [<ffffffff814b7882>] system_call_fastpath+0x16/0x1b
[ 6419.730132] ---[ end trace e480283f0ee28284 ]---
[ 6419.734754] BTRFS warning (device dm-73): __btrfs_alloc_chunk:3803: Aborting
unused transaction(error 28).

Here''s some data on the btrfs filesystem in question:

# ls -l /dev/mapper | grep dm-73
lrwxrwxrwx 1 root root       8 Jan 29 14:27 cs33s16p2 -> ../dm-73

# df -h | grep -A 1 cs33s16p2
/dev/mapper/cs33s16p2
                      896G  7.8M  896G   1% /ram/mnt/ceph/data.osd.39

# btrfs fi df /ram/mnt/ceph/data.osd.39/
Data: total=8.00MB, used=3.61MB
System: total=4.00MB, used=64.00KB
Metadata: total=8.00MB, used=4.12MB

# cat /proc/mounts | grep osd.39
/dev/mapper/cs33s16p2 /ram/mnt/ceph/data.osd.39 btrfs ro,noatime,nospace_cache 0
0


FWIW, in these tests I''m building a fresh Ceph filesystem with 576
OSDs,
hence 576 different btrfs filesystems, but typically I only have an
issue with one of them per test.

-- Jim

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Josef Bacik

2013-Jan-30 15:06 UTC

head link

Re: [PATCH] Btrfs: fix a deadlock on chunk mutex

On Tue, Jan 29, 2013 at 04:05:17PM -0700, Jim Schutt
wrote:> On 01/29/2013 01:04 PM, Josef Bacik wrote:
> > On Tue, Jan 29, 2013 at 11:41:10AM -0700, Jim Schutt wrote:
> >> > On 01/28/2013 02:23 PM, Josef Bacik wrote:
> >>> > > On Thu, Jan 03, 2013 at 11:44:46AM -0700, Jim Schutt
wrote:
> >>>> > >> Hi Josef,
> >>>> > >>
> >>>> > >> Thanks for the patch - sorry for the long
delay in testing...
> >>>> > >>
> >>> > > 
> >>> > > Jim,
> >>> > > 
> >>> > > I''ve been trying to reason out how this
happens, could you do a btrfs fi df on
> >>> > > the filesystem thats giving you trouble so I can see
if what I think is
> >>> > > happening is what''s actually happening. 
Thanks,
> >> > 
> >> > Here''s an example, using a slightly different kernel
than
> >> > my previous report.  It''s your btrfs-next master
branch
> >> > (commit 8f139e59d5 "Btrfs: use bit operation for
->fs_state")
> >> > with ceph 3.8 for-linus (commit 0fa6ebc600 from
linus'' tree).
> >> > 
> >> > 
> >> > Here I''m finding the file system in question:
> >> > 
> >> > # ls -l /dev/mapper | grep dm-93
> >> > lrwxrwxrwx 1 root root       8 Jan 29 11:13 cs53s19p2 ->
../dm-93
> >> > 
> >> > # df -h | grep -A 1 cs53s19p2
> >> > /dev/mapper/cs53s19p2
> >> >                       896G  1.1G  896G   1%
/ram/mnt/ceph/data.osd.522
> >> > 
> >> > 
> >> > Here''s the info you asked for:
> >> > 
> >> > # btrfs fi df /ram/mnt/ceph/data.osd.522
> >> > Data: total=2.01GB, used=1.00GB
> >> > System: total=4.00MB, used=64.00KB
> >> > Metadata: total=8.00MB, used=7.56MB
> >> > 
> > How big is the disk you are using, and what mount options?  I have a
patch to
> > keep the panic from happening and hopefully the abort, could you try
this?  I
> > still want to keep the underlying error from happening because it
shouldn''t be,
> > but no reason I can''t fix the error case while you can easily
reproduce it :).
> > Thanks,
> > 
> > Josef
> > 
> >>From c50b725c74c7d39064e553ef85ac9753efbd8aec Mon Sep 17 00:00:00
2001
> > From: Josef Bacik <jbacik@fusionio.com>
> > Date: Tue, 29 Jan 2013 15:03:37 -0500
> > Subject: [PATCH] Btrfs: fix chunk allocation error handling
> > 
> > If we error out allocating a dev extent we will have already created
the
> > block group and such which will cause problems since the allocator may
have
> > tried to allocate out of the block group that no longer exists.  This
will
> > cause BUG_ON()''s in the bio submission path.  This also makes
a failure to
> > allocate a dev extent a non-abort error, we will just clean up the dev
> > extents we did allocate and exit.  Now if we fail to delete the dev
extents
> > we will abort since we can''t have half of the dev extents
hanging around,
> > but this will make us much less likely to abort.  Thanks,
> > 
> > Signed-off-by: Josef Bacik <jbacik@fusionio.com>
> > ---
> 
> Interesting - with your patch applied I triggered the following, just
> bringing up a fresh Ceph filesystem - I didn''t even get a chance
to
> mount it on my Ceph clients:
> 
Well that makes me a sad panda, but hey it didn''t panic this time. 
What
workload are you running on this fs/ceph cluster?  Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Josef Bacik

2013-Jan-30 15:16 UTC

head link

Re: [PATCH] Btrfs: fix a deadlock on chunk mutex

On Tue, Jan 29, 2013 at 04:05:17PM -0700, Jim Schutt
wrote:> On 01/29/2013 01:04 PM, Josef Bacik wrote:
> > On Tue, Jan 29, 2013 at 11:41:10AM -0700, Jim Schutt wrote:
> >> > On 01/28/2013 02:23 PM, Josef Bacik wrote:
> >>> > > On Thu, Jan 03, 2013 at 11:44:46AM -0700, Jim Schutt
wrote:
> >>>> > >> Hi Josef,
> >>>> > >>
> >>>> > >> Thanks for the patch - sorry for the long
delay in testing...
> >>>> > >>
> >>> > > 
> >>> > > Jim,
> >>> > > 
> >>> > > I''ve been trying to reason out how this
happens, could you do a btrfs fi df on
> >>> > > the filesystem thats giving you trouble so I can see
if what I think is
> >>> > > happening is what''s actually happening. 
Thanks,
> >> > 
> >> > Here''s an example, using a slightly different kernel
than
> >> > my previous report.  It''s your btrfs-next master
branch
> >> > (commit 8f139e59d5 "Btrfs: use bit operation for
->fs_state")
> >> > with ceph 3.8 for-linus (commit 0fa6ebc600 from
linus'' tree).
> >> > 
> >> > 
> >> > Here I''m finding the file system in question:
> >> > 
> >> > # ls -l /dev/mapper | grep dm-93
> >> > lrwxrwxrwx 1 root root       8 Jan 29 11:13 cs53s19p2 ->
../dm-93
> >> > 
> >> > # df -h | grep -A 1 cs53s19p2
> >> > /dev/mapper/cs53s19p2
> >> >                       896G  1.1G  896G   1%
/ram/mnt/ceph/data.osd.522
> >> > 
> >> > 
> >> > Here''s the info you asked for:
> >> > 
> >> > # btrfs fi df /ram/mnt/ceph/data.osd.522
> >> > Data: total=2.01GB, used=1.00GB
> >> > System: total=4.00MB, used=64.00KB
> >> > Metadata: total=8.00MB, used=7.56MB
> >> > 
> > How big is the disk you are using, and what mount options?  I have a
patch to
> > keep the panic from happening and hopefully the abort, could you try
this?  I
> > still want to keep the underlying error from happening because it
shouldn''t be,
> > but no reason I can''t fix the error case while you can easily
reproduce it :).
> > Thanks,
> > 
> > Josef
> > 
> >>From c50b725c74c7d39064e553ef85ac9753efbd8aec Mon Sep 17 00:00:00
2001
> > From: Josef Bacik <jbacik@fusionio.com>
> > Date: Tue, 29 Jan 2013 15:03:37 -0500
> > Subject: [PATCH] Btrfs: fix chunk allocation error handling
> > 
> > If we error out allocating a dev extent we will have already created
the
> > block group and such which will cause problems since the allocator may
have
> > tried to allocate out of the block group that no longer exists.  This
will
> > cause BUG_ON()''s in the bio submission path.  This also makes
a failure to
> > allocate a dev extent a non-abort error, we will just clean up the dev
> > extents we did allocate and exit.  Now if we fail to delete the dev
extents
> > we will abort since we can''t have half of the dev extents
hanging around,
> > but this will make us much less likely to abort.  Thanks,
> > 
> > Signed-off-by: Josef Bacik <jbacik@fusionio.com>
> > ---
> 
> Interesting - with your patch applied I triggered the following, just
> bringing up a fresh Ceph filesystem - I didn''t even get a chance
to
> mount it on my Ceph clients:
> 
Actually nevermind it looks like I figured out how to reproduce.  I''ll
let you
know when I have something to test.  Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Josef Bacik

2013-Jan-30 16:38 UTC

head link

Re: [PATCH] Btrfs: fix a deadlock on chunk mutex

On Tue, Jan 29, 2013 at 04:05:17PM -0700, Jim Schutt
wrote:> On 01/29/2013 01:04 PM, Josef Bacik wrote:
> > On Tue, Jan 29, 2013 at 11:41:10AM -0700, Jim Schutt wrote:
> >> > On 01/28/2013 02:23 PM, Josef Bacik wrote:
> >>> > > On Thu, Jan 03, 2013 at 11:44:46AM -0700, Jim Schutt
wrote:
> >>>> > >> Hi Josef,
> >>>> > >>
> >>>> > >> Thanks for the patch - sorry for the long
delay in testing...
> >>>> > >>
> >>> > > 
> >>> > > Jim,
> >>> > > 
> >>> > > I''ve been trying to reason out how this
happens, could you do a btrfs fi df on
> >>> > > the filesystem thats giving you trouble so I can see
if what I think is
> >>> > > happening is what''s actually happening. 
Thanks,
> >> > 
> >> > Here''s an example, using a slightly different kernel
than
> >> > my previous report.  It''s your btrfs-next master
branch
> >> > (commit 8f139e59d5 "Btrfs: use bit operation for
->fs_state")
> >> > with ceph 3.8 for-linus (commit 0fa6ebc600 from
linus'' tree).
> >> > 
> >> > 
> >> > Here I''m finding the file system in question:
> >> > 
> >> > # ls -l /dev/mapper | grep dm-93
> >> > lrwxrwxrwx 1 root root       8 Jan 29 11:13 cs53s19p2 ->
../dm-93
> >> > 
> >> > # df -h | grep -A 1 cs53s19p2
> >> > /dev/mapper/cs53s19p2
> >> >                       896G  1.1G  896G   1%
/ram/mnt/ceph/data.osd.522
> >> > 
> >> > 
> >> > Here''s the info you asked for:
> >> > 
> >> > # btrfs fi df /ram/mnt/ceph/data.osd.522
> >> > Data: total=2.01GB, used=1.00GB
> >> > System: total=4.00MB, used=64.00KB
> >> > Metadata: total=8.00MB, used=7.56MB
> >> > 
> > How big is the disk you are using, and what mount options?  I have a
patch to
> > keep the panic from happening and hopefully the abort, could you try
this?  I
> > still want to keep the underlying error from happening because it
shouldn''t be,
> > but no reason I can''t fix the error case while you can easily
reproduce it :).
> > Thanks,
> > 
> > Josef
> > 
> >>From c50b725c74c7d39064e553ef85ac9753efbd8aec Mon Sep 17 00:00:00
2001
> > From: Josef Bacik <jbacik@fusionio.com>
> > Date: Tue, 29 Jan 2013 15:03:37 -0500
> > Subject: [PATCH] Btrfs: fix chunk allocation error handling
> > 
> > If we error out allocating a dev extent we will have already created
the
> > block group and such which will cause problems since the allocator may
have
> > tried to allocate out of the block group that no longer exists.  This
will
> > cause BUG_ON()''s in the bio submission path.  This also makes
a failure to
> > allocate a dev extent a non-abort error, we will just clean up the dev
> > extents we did allocate and exit.  Now if we fail to delete the dev
extents
> > we will abort since we can''t have half of the dev extents
hanging around,
> > but this will make us much less likely to abort.  Thanks,
> > 
> > Signed-off-by: Josef Bacik <jbacik@fusionio.com>
> > ---
> 
> Interesting - with your patch applied I triggered the following, just
> bringing up a fresh Ceph filesystem - I didn''t even get a chance
to
> mount it on my Ceph clients:
> 
Ok can you give this patch a whirl as well?  It seems to fix the problem for me.
Thanks,

Josef

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index dca5679..874bcf2 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3677,8 +3677,18 @@ static int can_overcommit(struct btrfs_root *root,
 	u64 used;
 
 	used = space_info->bytes_used + space_info->bytes_reserved +
-		space_info->bytes_pinned + space_info->bytes_readonly +
-		space_info->bytes_may_use;
+		space_info->bytes_pinned + space_info->bytes_readonly;
+
+	/*
+	 * We only want to allow over committing if we have lots of actual space
+	 * free, but if we''ve tied up more than 80% of the space with actual
+	 * space reservation (not including bytes we _might_ use) then don''t
+	 * allow overcommitting as it will just make things go badly for us.
+	 */
+	if (used > div_factor(space_info->total_bytes, 8))
+		return 0;
+
+	used += space_info->bytes_may_use;
 
 	spin_lock(&root->fs_info->free_chunk_lock);
 	avail = root->fs_info->free_chunk_space;
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jim Schutt

2013-Jan-30 21:37 UTC

head link

Re: [PATCH] Btrfs: fix a deadlock on chunk mutex

On 01/30/2013 09:38 AM, Josef Bacik wrote:> On Tue, Jan 29, 2013 at 04:05:17PM -0700, Jim Schutt wrote:
>> > On 01/29/2013 01:04 PM, Josef Bacik wrote:
>>> > > On Tue, Jan 29, 2013 at 11:41:10AM -0700, Jim Schutt
wrote:
>>>>> > >> > On 01/28/2013 02:23 PM, Josef Bacik
wrote:
>>>>>>> > >>> > > On Thu, Jan 03, 2013 at
11:44:46AM -0700, Jim Schutt wrote:
>>>>>>>>> > >>>> > >> Hi
Josef,
>>>>>>>>> > >>>> > >>
>>>>>>>>> > >>>> > >> Thanks
for the patch - sorry for the long delay in testing...
>>>>>>>>> > >>>> > >>
>>>>>>> > >>> > > 
>>>>>>> > >>> > > Jim,
>>>>>>> > >>> > > 
>>>>>>> > >>> > > I''ve been
trying to reason out how this happens, could you do a btrfs fi df on
>>>>>>> > >>> > > the filesystem thats
giving you trouble so I can see if what I think is
>>>>>>> > >>> > > happening is
what''s actually happening.  Thanks,
>>>>> > >> > 
>>>>> > >> > Here''s an example, using a
slightly different kernel than
>>>>> > >> > my previous report.  It''s your
btrfs-next master branch
>>>>> > >> > (commit 8f139e59d5 "Btrfs: use bit
operation for ->fs_state")
>>>>> > >> > with ceph 3.8 for-linus (commit
0fa6ebc600 from linus'' tree).
>>>>> > >> > 
>>>>> > >> > 
>>>>> > >> > Here I''m finding the file
system in question:
>>>>> > >> > 
>>>>> > >> > # ls -l /dev/mapper | grep dm-93
>>>>> > >> > lrwxrwxrwx 1 root root       8 Jan 29
11:13 cs53s19p2 -> ../dm-93
>>>>> > >> > 
>>>>> > >> > # df -h | grep -A 1 cs53s19p2
>>>>> > >> > /dev/mapper/cs53s19p2
>>>>> > >> >                       896G  1.1G  896G  
1% /ram/mnt/ceph/data.osd.522
>>>>> > >> > 
>>>>> > >> > 
>>>>> > >> > Here''s the info you asked for:
>>>>> > >> > 
>>>>> > >> > # btrfs fi df /ram/mnt/ceph/data.osd.522
>>>>> > >> > Data: total=2.01GB, used=1.00GB
>>>>> > >> > System: total=4.00MB, used=64.00KB
>>>>> > >> > Metadata: total=8.00MB, used=7.56MB
>>>>> > >> > 
>>> > > How big is the disk you are using, and what mount
options?  I have a patch to
>>> > > keep the panic from happening and hopefully the abort,
could you try this?  I
>>> > > still want to keep the underlying error from happening
because it shouldn''t be,
>>> > > but no reason I can''t fix the error case while
you can easily reproduce it :).
>>> > > Thanks,
>>> > > 
>>> > > Josef
>>> > > 
>>> > >>From c50b725c74c7d39064e553ef85ac9753efbd8aec Mon Sep
17 00:00:00 2001
>>> > > From: Josef Bacik <jbacik@fusionio.com>
>>> > > Date: Tue, 29 Jan 2013 15:03:37 -0500
>>> > > Subject: [PATCH] Btrfs: fix chunk allocation error
handling
>>> > > 
>>> > > If we error out allocating a dev extent we will have
already created the
>>> > > block group and such which will cause problems since the
allocator may have
>>> > > tried to allocate out of the block group that no longer
exists.  This will
>>> > > cause BUG_ON()''s in the bio submission path. 
This also makes a failure to
>>> > > allocate a dev extent a non-abort error, we will just
clean up the dev
>>> > > extents we did allocate and exit.  Now if we fail to
delete the dev extents
>>> > > we will abort since we can''t have half of the
dev extents hanging around,
>>> > > but this will make us much less likely to abort.  Thanks,
>>> > > 
>>> > > Signed-off-by: Josef Bacik <jbacik@fusionio.com>
>>> > > ---
>> > 
>> > Interesting - with your patch applied I triggered the following,
just
>> > bringing up a fresh Ceph filesystem - I didn''t even get a
chance to
>> > mount it on my Ceph clients:
>> > 
> Ok can you give this patch a whirl as well?  It seems to fix the problem
for me.
With this patch on top of your previous patch, after several trials of
my test I am also unable to reproduce the issue.  Since I had been
having trouble first time, every time, I think it also seems to fix
the problem for me.

Thanks again!

-- Jim
> Thanks,
> 
> Josef

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Josef Bacik

2013-Jan-30 21:55 UTC

head link

Re: [PATCH] Btrfs: fix a deadlock on chunk mutex

On Wed, Jan 30, 2013 at 02:37:40PM -0700, Jim Schutt
wrote:> On 01/30/2013 09:38 AM, Josef Bacik wrote:
> > On Tue, Jan 29, 2013 at 04:05:17PM -0700, Jim Schutt wrote:
> >> > On 01/29/2013 01:04 PM, Josef Bacik wrote:
> >>> > > On Tue, Jan 29, 2013 at 11:41:10AM -0700, Jim Schutt
wrote:
> >>>>> > >> > On 01/28/2013 02:23 PM, Josef Bacik
wrote:
> >>>>>>> > >>> > > On Thu, Jan 03,
2013 at 11:44:46AM -0700, Jim Schutt wrote:
> >>>>>>>>> > >>>> > >> Hi
Josef,
> >>>>>>>>> > >>>> > >>
> >>>>>>>>> > >>>> > >>
Thanks for the patch - sorry for the long delay in testing...
> >>>>>>>>> > >>>> > >>
> >>>>>>> > >>> > > 
> >>>>>>> > >>> > > Jim,
> >>>>>>> > >>> > > 
> >>>>>>> > >>> > > I''ve been
trying to reason out how this happens, could you do a btrfs fi df on
> >>>>>>> > >>> > > the filesystem
thats giving you trouble so I can see if what I think is
> >>>>>>> > >>> > > happening is
what''s actually happening.  Thanks,
> >>>>> > >> > 
> >>>>> > >> > Here''s an example, using a
slightly different kernel than
> >>>>> > >> > my previous report.  It''s
your btrfs-next master branch
> >>>>> > >> > (commit 8f139e59d5 "Btrfs: use
bit operation for ->fs_state")
> >>>>> > >> > with ceph 3.8 for-linus (commit
0fa6ebc600 from linus'' tree).
> >>>>> > >> > 
> >>>>> > >> > 
> >>>>> > >> > Here I''m finding the file
system in question:
> >>>>> > >> > 
> >>>>> > >> > # ls -l /dev/mapper | grep dm-93
> >>>>> > >> > lrwxrwxrwx 1 root root       8 Jan
29 11:13 cs53s19p2 -> ../dm-93
> >>>>> > >> > 
> >>>>> > >> > # df -h | grep -A 1 cs53s19p2
> >>>>> > >> > /dev/mapper/cs53s19p2
> >>>>> > >> >                       896G  1.1G 
896G   1% /ram/mnt/ceph/data.osd.522
> >>>>> > >> > 
> >>>>> > >> > 
> >>>>> > >> > Here''s the info you asked
for:
> >>>>> > >> > 
> >>>>> > >> > # btrfs fi df
/ram/mnt/ceph/data.osd.522
> >>>>> > >> > Data: total=2.01GB, used=1.00GB
> >>>>> > >> > System: total=4.00MB, used=64.00KB
> >>>>> > >> > Metadata: total=8.00MB, used=7.56MB
> >>>>> > >> > 
> >>> > > How big is the disk you are using, and what mount
options?  I have a patch to
> >>> > > keep the panic from happening and hopefully the
abort, could you try this?  I
> >>> > > still want to keep the underlying error from
happening because it shouldn''t be,
> >>> > > but no reason I can''t fix the error case
while you can easily reproduce it :).
> >>> > > Thanks,
> >>> > > 
> >>> > > Josef
> >>> > > 
> >>> > >>From c50b725c74c7d39064e553ef85ac9753efbd8aec Mon
Sep 17 00:00:00 2001
> >>> > > From: Josef Bacik <jbacik@fusionio.com>
> >>> > > Date: Tue, 29 Jan 2013 15:03:37 -0500
> >>> > > Subject: [PATCH] Btrfs: fix chunk allocation error
handling
> >>> > > 
> >>> > > If we error out allocating a dev extent we will have
already created the
> >>> > > block group and such which will cause problems since
the allocator may have
> >>> > > tried to allocate out of the block group that no
longer exists.  This will
> >>> > > cause BUG_ON()''s in the bio submission
path.  This also makes a failure to
> >>> > > allocate a dev extent a non-abort error, we will
just clean up the dev
> >>> > > extents we did allocate and exit.  Now if we fail to
delete the dev extents
> >>> > > we will abort since we can''t have half of
the dev extents hanging around,
> >>> > > but this will make us much less likely to abort. 
Thanks,
> >>> > > 
> >>> > > Signed-off-by: Josef Bacik
<jbacik@fusionio.com>
> >>> > > ---
> >> > 
> >> > Interesting - with your patch applied I triggered the
following, just
> >> > bringing up a fresh Ceph filesystem - I didn''t even
get a chance to
> >> > mount it on my Ceph clients:
> >> > 
> > Ok can you give this patch a whirl as well?  It seems to fix the
problem for me.
> 
> With this patch on top of your previous patch, after several trials of
> my test I am also unable to reproduce the issue.  Since I had been
> having trouble first time, every time, I think it also seems to fix
> the problem for me.
> 
> Thanks again!
> 
Awesome thanks for testing!

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Josef Bacik

2013-Jan-31 15:33 UTC

head link

Re: [PATCH] Btrfs: fix a deadlock on chunk mutex

On Wed, Jan 30, 2013 at 02:37:40PM -0700, Jim Schutt
wrote:> On 01/30/2013 09:38 AM, Josef Bacik wrote:
> > On Tue, Jan 29, 2013 at 04:05:17PM -0700, Jim Schutt wrote:
> >> > On 01/29/2013 01:04 PM, Josef Bacik wrote:
> >>> > > On Tue, Jan 29, 2013 at 11:41:10AM -0700, Jim Schutt
wrote:
> >>>>> > >> > On 01/28/2013 02:23 PM, Josef Bacik
wrote:
> >>>>>>> > >>> > > On Thu, Jan 03,
2013 at 11:44:46AM -0700, Jim Schutt wrote:
> >>>>>>>>> > >>>> > >> Hi
Josef,
> >>>>>>>>> > >>>> > >>
> >>>>>>>>> > >>>> > >>
Thanks for the patch - sorry for the long delay in testing...
> >>>>>>>>> > >>>> > >>
> >>>>>>> > >>> > > 
> >>>>>>> > >>> > > Jim,
> >>>>>>> > >>> > > 
> >>>>>>> > >>> > > I''ve been
trying to reason out how this happens, could you do a btrfs fi df on
> >>>>>>> > >>> > > the filesystem
thats giving you trouble so I can see if what I think is
> >>>>>>> > >>> > > happening is
what''s actually happening.  Thanks,
> >>>>> > >> > 
> >>>>> > >> > Here''s an example, using a
slightly different kernel than
> >>>>> > >> > my previous report.  It''s
your btrfs-next master branch
> >>>>> > >> > (commit 8f139e59d5 "Btrfs: use
bit operation for ->fs_state")
> >>>>> > >> > with ceph 3.8 for-linus (commit
0fa6ebc600 from linus'' tree).
> >>>>> > >> > 
> >>>>> > >> > 
> >>>>> > >> > Here I''m finding the file
system in question:
> >>>>> > >> > 
> >>>>> > >> > # ls -l /dev/mapper | grep dm-93
> >>>>> > >> > lrwxrwxrwx 1 root root       8 Jan
29 11:13 cs53s19p2 -> ../dm-93
> >>>>> > >> > 
> >>>>> > >> > # df -h | grep -A 1 cs53s19p2
> >>>>> > >> > /dev/mapper/cs53s19p2
> >>>>> > >> >                       896G  1.1G 
896G   1% /ram/mnt/ceph/data.osd.522
> >>>>> > >> > 
> >>>>> > >> > 
> >>>>> > >> > Here''s the info you asked
for:
> >>>>> > >> > 
> >>>>> > >> > # btrfs fi df
/ram/mnt/ceph/data.osd.522
> >>>>> > >> > Data: total=2.01GB, used=1.00GB
> >>>>> > >> > System: total=4.00MB, used=64.00KB
> >>>>> > >> > Metadata: total=8.00MB, used=7.56MB
> >>>>> > >> > 
> >>> > > How big is the disk you are using, and what mount
options?  I have a patch to
> >>> > > keep the panic from happening and hopefully the
abort, could you try this?  I
> >>> > > still want to keep the underlying error from
happening because it shouldn''t be,
> >>> > > but no reason I can''t fix the error case
while you can easily reproduce it :).
> >>> > > Thanks,
> >>> > > 
> >>> > > Josef
> >>> > > 
> >>> > >>From c50b725c74c7d39064e553ef85ac9753efbd8aec Mon
Sep 17 00:00:00 2001
> >>> > > From: Josef Bacik <jbacik@fusionio.com>
> >>> > > Date: Tue, 29 Jan 2013 15:03:37 -0500
> >>> > > Subject: [PATCH] Btrfs: fix chunk allocation error
handling
> >>> > > 
> >>> > > If we error out allocating a dev extent we will have
already created the
> >>> > > block group and such which will cause problems since
the allocator may have
> >>> > > tried to allocate out of the block group that no
longer exists.  This will
> >>> > > cause BUG_ON()''s in the bio submission
path.  This also makes a failure to
> >>> > > allocate a dev extent a non-abort error, we will
just clean up the dev
> >>> > > extents we did allocate and exit.  Now if we fail to
delete the dev extents
> >>> > > we will abort since we can''t have half of
the dev extents hanging around,
> >>> > > but this will make us much less likely to abort. 
Thanks,
> >>> > > 
> >>> > > Signed-off-by: Josef Bacik
<jbacik@fusionio.com>
> >>> > > ---
> >> > 
> >> > Interesting - with your patch applied I triggered the
following, just
> >> > bringing up a fresh Ceph filesystem - I didn''t even
get a chance to
> >> > mount it on my Ceph clients:
> >> > 
> > Ok can you give this patch a whirl as well?  It seems to fix the
problem for me.
> 
> With this patch on top of your previous patch, after several trials of
> my test I am also unable to reproduce the issue.  Since I had been
> having trouble first time, every time, I think it also seems to fix
> the problem for me.
> 
Hey Jim,

Could you test this patch instead?  I think it''s a little less
hamfisted and
should give us a nice balance between not crashing and being good for
performance.  Thanks,

Josef

commit 43510c0e5faad8e5e4d8ba13baa1dd5dfb3d39ce
Author: Josef Bacik <jbacik@fusionio.com>
Date:   Wed Jan 30 17:02:51 2013 -0500

    Btrfs: do not allow overcommit to happen if we are over 80% in use
    
    Because of how little we allocate chunks now we can get really tight on
    metadata space before we will allocate a new chunk.  This resulted in being
    unable to add device extents when allocating a new metadata chunk as we did
    not have enough space.  This is because we were allowed to overcommit too
    much metadata without actually making sure we had enough space to make
    allocations.  The idea behind overcommit is that we are allowed to say
"sure
    you can have that reservation" when most of the free space is occupied
by
    reservations, not actual allocations.  But in this case where a majority of
    the total space is in use by actual allocations we can screw ourselves by
    not being able to make real allocations when it matters.  So put this cap in
    place for now to keep us from overcommitting so much that we run out of
    space.  Thanks,
    
    Reported-and-tested-by: Jim Schutt <jaschut@sandia.gov>
    Signed-off-by: Josef Bacik <jbacik@fusionio.com>

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index dca5679..156341e 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3672,13 +3672,30 @@ static int can_overcommit(struct btrfs_root *root,
 			  struct btrfs_space_info *space_info, u64 bytes,
 			  enum btrfs_reserve_flush_enum flush)
 {
+	struct btrfs_block_rsv *global_rsv =
&root->fs_info->global_block_rsv;
 	u64 profile = btrfs_get_alloc_profile(root, 0);
+	u64 rsv_size = 0;
 	u64 avail;
 	u64 used;
 
 	used = space_info->bytes_used + space_info->bytes_reserved +
-		space_info->bytes_pinned + space_info->bytes_readonly +
-		space_info->bytes_may_use;
+		space_info->bytes_pinned + space_info->bytes_readonly;
+
+	spin_lock(&global_rsv->lock);
+	rsv_size = global_rsv->size;
+	spin_unlock(&global_rsv->lock);
+
+	/*
+	 * We only want to allow over committing if we have lots of actual space
+	 * free, but if we don''t have enough space to handle the global
reserve
+	 * space then we could end up having a real enospc problem when trying
+	 * to allocate a chunk or some other such important allocation.
+	 */
+	rsv_size <<= 1;
+	if (used + rsv_size >= space_info->total_bytes)
+		return 0;
+
+	used += space_info->bytes_may_use;
 
 	spin_lock(&root->fs_info->free_chunk_lock);
 	avail = root->fs_info->free_chunk_space;
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jim Schutt

2013-Jan-31 16:52 UTC

head link

Re: [PATCH] Btrfs: fix a deadlock on chunk mutex

On 01/31/2013 08:33 AM, Josef Bacik wrote:> On Wed, Jan 30, 2013 at 02:37:40PM -0700, Jim Schutt wrote:
>> On 01/30/2013 09:38 AM, Josef Bacik wrote:
>>> On Tue, Jan 29, 2013 at 04:05:17PM -0700, Jim Schutt wrote:
>>>>> On 01/29/2013 01:04 PM, Josef Bacik wrote:
>>>>>>> On Tue, Jan 29, 2013 at 11:41:10AM -0700, Jim
Schutt wrote:
>>>>>>>>>>> On 01/28/2013 02:23 PM, Josef Bacik
wrote:
>>>>>>>>>>>>>>> On Thu, Jan 03,
2013 at 11:44:46AM -0700, Jim Schutt wrote:
>>>>>>>>>>>>>>>>>>> Hi
Josef,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
Thanks for the patch - sorry for the long delay in testing...
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Jim,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I''ve been
trying to reason out how this happens, could you do a btrfs fi df on
>>>>>>>>>>>>>>> the filesystem
thats giving you trouble so I can see if what I think is
>>>>>>>>>>>>>>> happening is
what''s actually happening.  Thanks,
>>>>>>>>>>>
>>>>>>>>>>> Here''s an example, using a
slightly different kernel than
>>>>>>>>>>> my previous report.  It''s
your btrfs-next master branch
>>>>>>>>>>> (commit 8f139e59d5 "Btrfs: use
bit operation for ->fs_state")
>>>>>>>>>>> with ceph 3.8 for-linus (commit
0fa6ebc600 from linus'' tree).
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Here I''m finding the file
system in question:
>>>>>>>>>>>
>>>>>>>>>>> # ls -l /dev/mapper | grep dm-93
>>>>>>>>>>> lrwxrwxrwx 1 root root       8 Jan
29 11:13 cs53s19p2 -> ../dm-93
>>>>>>>>>>>
>>>>>>>>>>> # df -h | grep -A 1 cs53s19p2
>>>>>>>>>>> /dev/mapper/cs53s19p2
>>>>>>>>>>>                       896G  1.1G 
896G   1% /ram/mnt/ceph/data.osd.522
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Here''s the info you asked
for:
>>>>>>>>>>>
>>>>>>>>>>> # btrfs fi df
/ram/mnt/ceph/data.osd.522
>>>>>>>>>>> Data: total=2.01GB, used=1.00GB
>>>>>>>>>>> System: total=4.00MB, used=64.00KB
>>>>>>>>>>> Metadata: total=8.00MB, used=7.56MB
>>>>>>>>>>>
>>>>>>> How big is the disk you are using, and what mount
options?  I have a patch to
>>>>>>> keep the panic from happening and hopefully the
abort, could you try this?  I
>>>>>>> still want to keep the underlying error from
happening because it shouldn''t be,
>>>>>>> but no reason I can''t fix the error case
while you can easily reproduce it :).
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Josef
>>>>>>>
>>>>>>> >From c50b725c74c7d39064e553ef85ac9753efbd8aec
Mon Sep 17 00:00:00 2001
>>>>>>> From: Josef Bacik <jbacik@fusionio.com>
>>>>>>> Date: Tue, 29 Jan 2013 15:03:37 -0500
>>>>>>> Subject: [PATCH] Btrfs: fix chunk allocation error
handling
>>>>>>>
>>>>>>> If we error out allocating a dev extent we will
have already created the
>>>>>>> block group and such which will cause problems
since the allocator may have
>>>>>>> tried to allocate out of the block group that no
longer exists.  This will
>>>>>>> cause BUG_ON()''s in the bio submission
path.  This also makes a failure to
>>>>>>> allocate a dev extent a non-abort error, we will
just clean up the dev
>>>>>>> extents we did allocate and exit.  Now if we fail
to delete the dev extents
>>>>>>> we will abort since we can''t have half of
the dev extents hanging around,
>>>>>>> but this will make us much less likely to abort. 
Thanks,
>>>>>>>
>>>>>>> Signed-off-by: Josef Bacik
<jbacik@fusionio.com>
>>>>>>> ---
>>>>>
>>>>> Interesting - with your patch applied I triggered the
following, just
>>>>> bringing up a fresh Ceph filesystem - I didn''t
even get a chance to
>>>>> mount it on my Ceph clients:
>>>>>
>>> Ok can you give this patch a whirl as well?  It seems to fix the
problem for me.
>>
>> With this patch on top of your previous patch, after several trials of
>> my test I am also unable to reproduce the issue.  Since I had been
>> having trouble first time, every time, I think it also seems to fix
>> the problem for me.
>>
> 
> Hey Jim,
> 
> Could you test this patch instead?  I think it''s a little less
hamfisted and
> should give us a nice balance between not crashing and being good for
> performance.  Thanks,
Hi Josef,

Running with this patch in place of your previous version, I
was again unable to reproduce the issue.

I might be seeing a couple percent increase in performance, or
it might just be noise, but I''m willing to say that I think
performance is same-or-better than the previous version of
the patch.

Thanks again!

-- Jim
> 
> Josef
> 
> commit 43510c0e5faad8e5e4d8ba13baa1dd5dfb3d39ce
> Author: Josef Bacik <jbacik@fusionio.com>
> Date:   Wed Jan 30 17:02:51 2013 -0500
> 
>     Btrfs: do not allow overcommit to happen if we are over 80% in use
>     
>     Because of how little we allocate chunks now we can get really tight on
>     metadata space before we will allocate a new chunk.  This resulted in
being
>     unable to add device extents when allocating a new metadata chunk as we
did
>     not have enough space.  This is because we were allowed to overcommit
too
>     much metadata without actually making sure we had enough space to make
>     allocations.  The idea behind overcommit is that we are allowed to say
"sure
>     you can have that reservation" when most of the free space is
occupied by
>     reservations, not actual allocations.  But in this case where a
majority of
>     the total space is in use by actual allocations we can screw ourselves
by
>     not being able to make real allocations when it matters.  So put this
cap in
>     place for now to keep us from overcommitting so much that we run out of
>     space.  Thanks,
>     
>     Reported-and-tested-by: Jim Schutt <jaschut@sandia.gov>
>     Signed-off-by: Josef Bacik <jbacik@fusionio.com>
> 
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index dca5679..156341e 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -3672,13 +3672,30 @@ static int can_overcommit(struct btrfs_root *root,
>  			  struct btrfs_space_info *space_info, u64 bytes,
>  			  enum btrfs_reserve_flush_enum flush)
>  {
> +	struct btrfs_block_rsv *global_rsv =
&root->fs_info->global_block_rsv;
>  	u64 profile = btrfs_get_alloc_profile(root, 0);
> +	u64 rsv_size = 0;
>  	u64 avail;
>  	u64 used;
>  
>  	used = space_info->bytes_used + space_info->bytes_reserved +
> -		space_info->bytes_pinned + space_info->bytes_readonly +
> -		space_info->bytes_may_use;
> +		space_info->bytes_pinned + space_info->bytes_readonly;
> +
> +	spin_lock(&global_rsv->lock);
> +	rsv_size = global_rsv->size;
> +	spin_unlock(&global_rsv->lock);
> +
> +	/*
> +	 * We only want to allow over committing if we have lots of actual space
> +	 * free, but if we don''t have enough space to handle the global
reserve
> +	 * space then we could end up having a real enospc problem when trying
> +	 * to allocate a chunk or some other such important allocation.
> +	 */
> +	rsv_size <<= 1;
> +	if (used + rsv_size >= space_info->total_bytes)
> +		return 0;
> +
> +	used += space_info->bytes_may_use;
>  
>  	spin_lock(&root->fs_info->free_chunk_lock);
>  	avail = root->fs_info->free_chunk_space;
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Seemingly Similar Threads

Search for more possibly parallel threads

Btrfs devel - Dec 2012 - [PATCH] Btrfs: fix a deadlock on chunk mutex

[PATCH] Btrfs: fix a deadlock on chunk mutex

Re: [PATCH] Btrfs: fix a deadlock on chunk mutex

Re: [PATCH] Btrfs: fix a deadlock on chunk mutex

Re: [PATCH] Btrfs: fix a deadlock on chunk mutex

Re: [PATCH] Btrfs: fix a deadlock on chunk mutex

Re: [PATCH] Btrfs: fix a deadlock on chunk mutex

Re: [PATCH] Btrfs: fix a deadlock on chunk mutex

Re: [PATCH] Btrfs: fix a deadlock on chunk mutex

Re: [PATCH] Btrfs: fix a deadlock on chunk mutex

Re: [PATCH] Btrfs: fix a deadlock on chunk mutex

Re: [PATCH] Btrfs: fix a deadlock on chunk mutex

Re: [PATCH] Btrfs: fix a deadlock on chunk mutex

Re: [PATCH] Btrfs: fix a deadlock on chunk mutex

Re: [PATCH] Btrfs: fix a deadlock on chunk mutex

Re: [PATCH] Btrfs: fix a deadlock on chunk mutex

Re: [PATCH] Btrfs: fix a deadlock on chunk mutex

Re: [PATCH] Btrfs: fix a deadlock on chunk mutex

Re: [PATCH] Btrfs: fix a deadlock on chunk mutex

Re: [PATCH] Btrfs: fix a deadlock on chunk mutex

Re: [PATCH] Btrfs: fix a deadlock on chunk mutex

Re: [PATCH] Btrfs: fix a deadlock on chunk mutex

Re: [PATCH] Btrfs: fix a deadlock on chunk mutex

Re: [PATCH] Btrfs: fix a deadlock on chunk mutex

Seemingly Similar Threads