thr3ads.net - Btrfs devel - [PATCH] Btrfs: fix possible softlockup in the allocator [Oct 2009]

If this information is useful, please help other people find it:
Share via:

Josef Bacik

2009-Oct-05 19:30 UTC

[PATCH] Btrfs: fix possible softlockup in the allocator

Like the cluster allocating stuff, we can lockup the box with the normal
allocation path.  This happens when we

1) Start to cache a block group that is severely fragmented, but has a decent
amount of free space.
2) Start to commit a transaction
3) Have the commit try and empty out some of the delalloc inodes with extents
that are relatively large.

The inodes will not be able to make the allocations because they will ask for
allocations larger than a contiguous area in the free space cache.  So we will
wait for more progress to be made on the block group, but since we''re
in a
commit the caching kthread won''t make any more progress and it already
has
enough free space that wait_block_group_cache_progress will just return.  So,
if we wait and fail to make the allocation the next time around, just loop and
go to the next block group.  This keeps us from getting stuck in a softlockup.
Thanks,

Signed-off-by: Josef Bacik <jbacik@redhat.com>
---
 fs/btrfs/extent-tree.c |   23 +++++++++++++++++------
 1 files changed, 17 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index b259db3..e46b0b9 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3816,6 +3816,7 @@ static noinline int find_free_extent(struct
btrfs_trans_handle *trans,
 	int loop = 0;
 	bool found_uncached_bg = false;
 	bool failed_cluster_refill = false;
+	bool failed_alloc = false;
 
 	WARN_ON(num_bytes < root->sectorsize);
 	btrfs_set_key_type(ins, BTRFS_EXTENT_ITEM_KEY);
@@ -4020,14 +4021,23 @@ refill_cluster:
 
 		offset = btrfs_find_space_for_alloc(block_group, search_start,
 						    num_bytes, empty_size);
-		if (!offset && (cached || (!cached &&
-					   loop == LOOP_CACHING_NOWAIT))) {
-			goto loop;
-		} else if (!offset && (!cached &&
-				       loop > LOOP_CACHING_NOWAIT)) {
+		/*
+		 * If we didn''t find a chunk, and we haven''t failed on this
+		 * block group before, and this block group is in the middle of
+		 * caching and we are ok with waiting, then go ahead and wait
+		 * for progress to be made, and set failed_alloc to true.
+		 *
+		 * If failed_alloc is true then we''ve already waited on this
+		 * block group once and should move on to the next block group.
+		 */
+		if (!offset && !failed_alloc && !cached &&
+		    loop > LOOP_CACHING_NOWAIT) {
 			wait_block_group_cache_progress(block_group,
-					num_bytes + empty_size);
+						num_bytes + empty_size);
+			failed_alloc = true;
 			goto have_block_group;
+		} else if (!offset) {
+			goto loop;
 		}
 checks:
 		search_start = stripe_align(root, offset);
@@ -4075,6 +4085,7 @@ checks:
 		break;
 loop:
 		failed_cluster_refill = false;
+		failed_alloc = false;
 		btrfs_put_block_group(block_group);
 	}
 	up_read(&space_info->groups_sem);
-- 
1.5.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Chris Mason

2009-Oct-05 23:09 UTC

head link

Re: [PATCH] Btrfs: fix possible softlockup in the allocator

On Mon, Oct 05, 2009 at 03:30:39PM -0400, Josef Bacik
wrote:> Like the cluster allocating stuff, we can lockup the box with the normal
> allocation path.  This happens when we
This is working for me, I''m hammering on it a bit.

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Johannes Hirte

2009-Oct-06 06:14 UTC

head link

Re: [PATCH] Btrfs: fix possible softlockup in the allocator

Am Montag 05 Oktober 2009 21:30:39 schrieb Josef Bacik:> Like the cluster allocating stuff, we can lockup the box with the normal
> allocation path.  This happens when we
> 
> 1) Start to cache a block group that is severely fragmented, but has a
>  decent amount of free space.
> 2) Start to commit a transaction
> 3) Have the commit try and empty out some of the delalloc inodes with
>  extents that are relatively large.
> 
> The inodes will not be able to make the allocations because they will ask
>  for allocations larger than a contiguous area in the free space cache.  So
>  we will wait for more progress to be made on the block group, but since
>  we''re in a commit the caching kthread won''t make any
more progress and it
>  already has enough free space that wait_block_group_cache_progress will
>  just return.  So, if we wait and fail to make the allocation the next time
>  around, just loop and go to the next block group.  This keeps us from
>  getting stuck in a softlockup. Thanks,
> 
> Signed-off-by: Josef Bacik <jbacik@redhat.com>
> ---
>  fs/btrfs/extent-tree.c |   23 +++++++++++++++++------
>  1 files changed, 17 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index b259db3..e46b0b9 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -3816,6 +3816,7 @@ static noinline int find_free_extent(struct
>  btrfs_trans_handle *trans, int loop = 0;
>  	bool found_uncached_bg = false;
>  	bool failed_cluster_refill = false;
> +	bool failed_alloc = false;
> 
>  	WARN_ON(num_bytes < root->sectorsize);
>  	btrfs_set_key_type(ins, BTRFS_EXTENT_ITEM_KEY);
> @@ -4020,14 +4021,23 @@ refill_cluster:
> 
>  		offset = btrfs_find_space_for_alloc(block_group, search_start,
>  						    num_bytes, empty_size);
> -		if (!offset && (cached || (!cached &&
> -					   loop == LOOP_CACHING_NOWAIT))) {
> -			goto loop;
> -		} else if (!offset && (!cached &&
> -				       loop > LOOP_CACHING_NOWAIT)) {
> +		/*
> +		 * If we didn''t find a chunk, and we haven''t failed on
this
> +		 * block group before, and this block group is in the middle of
> +		 * caching and we are ok with waiting, then go ahead and wait
> +		 * for progress to be made, and set failed_alloc to true.
> +		 *
> +		 * If failed_alloc is true then we''ve already waited on this
> +		 * block group once and should move on to the next block group.
> +		 */
> +		if (!offset && !failed_alloc && !cached &&
> +		    loop > LOOP_CACHING_NOWAIT) {
>  			wait_block_group_cache_progress(block_group,
> -					num_bytes + empty_size);
> +						num_bytes + empty_size);
> +			failed_alloc = true;
>  			goto have_block_group;
> +		} else if (!offset) {
> +			goto loop;
>  		}
>  checks:
>  		search_start = stripe_align(root, offset);
> @@ -4075,6 +4085,7 @@ checks:
>  		break;
>  loop:
>  		failed_cluster_refill = false;
> +		failed_alloc = false;
>  		btrfs_put_block_group(block_group);
>  	}
>  	up_read(&space_info->groups_sem);
> 
My box survived 6h of dbench with this patch whereas without it hangs within 
the first thwo minutes.

Johannes
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Josef Bacik

2009-Oct-06 13:29 UTC

head link

Re: [PATCH] Btrfs: fix possible softlockup in the allocator

On Tue, Oct 06, 2009 at 08:14:55AM +0200, Johannes Hirte
wrote:> Am Montag 05 Oktober 2009 21:30:39 schrieb Josef Bacik:
> > Like the cluster allocating stuff, we can lockup the box with the
normal
> > allocation path.  This happens when we
> > 
> > 1) Start to cache a block group that is severely fragmented, but has a
> >  decent amount of free space.
> > 2) Start to commit a transaction
> > 3) Have the commit try and empty out some of the delalloc inodes with
> >  extents that are relatively large.
> > 
> > The inodes will not be able to make the allocations because they will
ask
> >  for allocations larger than a contiguous area in the free space
cache.  So
> >  we will wait for more progress to be made on the block group, but
since
> >  we''re in a commit the caching kthread won''t make
any more progress and it
> >  already has enough free space that wait_block_group_cache_progress
will
> >  just return.  So, if we wait and fail to make the allocation the next
time
> >  around, just loop and go to the next block group.  This keeps us from
> >  getting stuck in a softlockup. Thanks,
> > 
> > Signed-off-by: Josef Bacik <jbacik@redhat.com>
> > ---
> >  fs/btrfs/extent-tree.c |   23 +++++++++++++++++------
> >  1 files changed, 17 insertions(+), 6 deletions(-)
> > 
> > diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> > index b259db3..e46b0b9 100644
> > --- a/fs/btrfs/extent-tree.c
> > +++ b/fs/btrfs/extent-tree.c
> > @@ -3816,6 +3816,7 @@ static noinline int find_free_extent(struct
> >  btrfs_trans_handle *trans, int loop = 0;
> >  	bool found_uncached_bg = false;
> >  	bool failed_cluster_refill = false;
> > +	bool failed_alloc = false;
> > 
> >  	WARN_ON(num_bytes < root->sectorsize);
> >  	btrfs_set_key_type(ins, BTRFS_EXTENT_ITEM_KEY);
> > @@ -4020,14 +4021,23 @@ refill_cluster:
> > 
> >  		offset = btrfs_find_space_for_alloc(block_group, search_start,
> >  						    num_bytes, empty_size);
> > -		if (!offset && (cached || (!cached &&
> > -					   loop == LOOP_CACHING_NOWAIT))) {
> > -			goto loop;
> > -		} else if (!offset && (!cached &&
> > -				       loop > LOOP_CACHING_NOWAIT)) {
> > +		/*
> > +		 * If we didn''t find a chunk, and we haven''t
failed on this
> > +		 * block group before, and this block group is in the middle of
> > +		 * caching and we are ok with waiting, then go ahead and wait
> > +		 * for progress to be made, and set failed_alloc to true.
> > +		 *
> > +		 * If failed_alloc is true then we''ve already waited on
this
> > +		 * block group once and should move on to the next block group.
> > +		 */
> > +		if (!offset && !failed_alloc && !cached &&
> > +		    loop > LOOP_CACHING_NOWAIT) {
> >  			wait_block_group_cache_progress(block_group,
> > -					num_bytes + empty_size);
> > +						num_bytes + empty_size);
> > +			failed_alloc = true;
> >  			goto have_block_group;
> > +		} else if (!offset) {
> > +			goto loop;
> >  		}
> >  checks:
> >  		search_start = stripe_align(root, offset);
> > @@ -4075,6 +4085,7 @@ checks:
> >  		break;
> >  loop:
> >  		failed_cluster_refill = false;
> > +		failed_alloc = false;
> >  		btrfs_put_block_group(block_group);
> >  	}
> >  	up_read(&space_info->groups_sem);
> > 
> 
> My box survived 6h of dbench with this patch whereas without it hangs
within
> the first thwo minutes.
> 
Great, I''m glad it fixed it for you.  Thanks for testing and reporting
it.

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Chris Mason

2009-Oct-06 14:07 UTC

head link

Re: [PATCH] Btrfs: fix possible softlockup in the allocator

On Tue, Oct 06, 2009 at 09:29:42AM -0400, Josef Bacik
wrote:> > My box survived 6h of dbench with this patch whereas without it hangs
within
> > the first thwo minutes.
> > 
> 
> Great, I''m glad it fixed it for you.  Thanks for testing and
reporting it.
Pushed out to the master branch.  Thanks!

-chris

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Btrfs devel - Oct 2009 - [PATCH] Btrfs: fix possible softlockup in the allocator

[PATCH] Btrfs: fix possible softlockup in the allocator

Re: [PATCH] Btrfs: fix possible softlockup in the allocator

Re: [PATCH] Btrfs: fix possible softlockup in the allocator

Re: [PATCH] Btrfs: fix possible softlockup in the allocator

Re: [PATCH] Btrfs: fix possible softlockup in the allocator