thr3ads.net - Btrfs devel - [PATCH] Btrfs: take block group fragmentation into account for allocation [Mar 2009]

If this information is useful, please help other people find it:
Share via:

Josef Bacik

2009-Mar-06 19:30 UTC

[PATCH] Btrfs: take block group fragmentation into account for allocation

This patch improves the allocators packing ability to greatly improve cold-cache
reads.  Instead of handling the empty_size logic within find_free_extent, we
simply pass it to btrfs_find_free_space, where it will check the fragmentation
level of the block group and decide if it wants to look for a empty cluster, or
simply allocate space for what we need.

This is accomplished by taking the total amount of free space fragments and
dividing it by the maximum number of free space fragments there can be and
multiplying it by 100.  This gives us the percentage of free space
fragmentation.  The maximum number of free space fragments is simply the free
space divided by the sectorsize, and then dividing that by 2.  If we are 20% or
more fragmented we will ignore the empty_size and just allocate num_bytes as
close to search_start as we can.  This results in the allocator trying its best
to fill up block groups before moving on to one farther down the disk.

I''ve also changed how the allocation hints are given.  Instead of
relying on the
last allocation for the transaction, we rely on the last offset the particular
root allocated from.  This will help in packing allocations per roots close
together even in the face of fragmentation.  I may take out the alloc hint in
the transaction altogether, but that will be farther down the line.

Signed-off-by: Josef Bacik <jbacik@redhat.com>
---
 fs/btrfs/ctree.h            |    6 +++++-
 fs/btrfs/extent-tree.c      |   25 +++++++++++++++----------
 fs/btrfs/free-space-cache.c |   42 +++++++++++++-----------------------------
 3 files changed, 33 insertions(+), 40 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index bdf826c..64b4ee6 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -651,6 +651,10 @@ struct btrfs_block_group_cache {
 	struct rb_root free_space_bytes;
 	struct rb_root free_space_offset;
 
+	/* free space fragmentation stuff */
+	u64 fragments;
+	u64 fragment_size;
+
 	/* block group cache stuff */
 	struct rb_node cache_node;
 
@@ -2141,7 +2145,7 @@ void btrfs_remove_free_space_cache(struct
btrfs_block_group_cache
 				   *block_group);
 struct btrfs_free_space *btrfs_find_free_space(struct btrfs_block_group_cache
 					       *block_group, u64 offset,
-					       u64 bytes);
+					       u64 bytes, u64 empty_size);
 void btrfs_dump_free_space(struct btrfs_block_group_cache *block_group,
 			   u64 bytes);
 u64 btrfs_block_group_free_space(struct btrfs_block_group_cache *block_group);
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 38820b1..2a6fe71 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3082,8 +3082,8 @@ static noinline int find_free_extent(struct
btrfs_trans_handle *trans,
 {
 	int ret = 0;
 	struct btrfs_root *root = orig_root->fs_info->extent_root;
-	u64 total_needed = num_bytes;
 	u64 *last_ptr = NULL;
+	u64 total_used = 0;
 	struct btrfs_block_group_cache *block_group = NULL;
 	int allowed_chunk_alloc = 0;
 	int fill_root_alloc_info = 0;
@@ -3111,13 +3111,12 @@ static noinline int find_free_extent(struct
btrfs_trans_handle *trans,
 	if ((data & BTRFS_BLOCK_GROUP_DATA) && btrfs_test_opt(root, SSD))
 		last_ptr = &root->fs_info->last_data_alloc;
 
-	if (last_ptr && *last_ptr)
+	if (last_ptr && *last_ptr && !hint_byte)
 		hint_byte = *last_ptr;
 
 	search_start = max(search_start, first_logical_byte(root, 0));
 	search_start = max(search_start, hint_byte);
 
-	total_needed += empty_size;
 	block_group = btrfs_lookup_block_group(root->fs_info, search_start);
 	if (!block_group)
 		block_group = btrfs_lookup_first_block_group(root->fs_info,
@@ -3163,10 +3162,12 @@ have_block_group:
 			goto loop;
 
 		free_space = btrfs_find_free_space(block_group, search_start,
-						   total_needed);
+						   num_bytes, empty_size);
 		if (!free_space)
 			goto loop;
 
+		total_used = min_t(u64, num_bytes + empty_size,
+				   free_space->bytes);
 		search_start = stripe_align(root, free_space->offset);
 
 		/* past where we''re allowed to search */
@@ -3222,7 +3223,7 @@ have_block_group:
 			orig_root->alloc_bytes -= num_bytes;
 
 			if (!orig_root->alloc_bytes) {
-				orig_root->alloc_bytes = total_needed;
+				orig_root->alloc_bytes = total_used;
 				orig_root->alloc_offset = search_start;
 				used = 1;
 			}
@@ -3232,23 +3233,23 @@ have_block_group:
 			u64 bytes = orig_root->alloc_bytes;
 
 			orig_root->alloc_offset = search_start + num_bytes;
-			orig_root->alloc_bytes = total_needed - num_bytes;
+			orig_root->alloc_bytes = total_used - num_bytes;
 			used = 1;
 			spin_unlock(&orig_root->alloc_lock);
 
 			btrfs_add_free_space_lock(block_group, offset, bytes);
 		} else {
 			orig_root->alloc_offset = search_start + num_bytes;
-			orig_root->alloc_bytes = total_needed - num_bytes;
+			orig_root->alloc_bytes = total_used - num_bytes;
 			used = 1;
 			spin_unlock(&orig_root->alloc_lock);
 		}
 
 		if (used) {
 			btrfs_remove_free_space_lock(block_group, search_start,
-						     total_needed);
+						     total_used);
 			if (last_ptr)
-				*last_ptr = search_start + total_needed;
+				*last_ptr = search_start + total_used;
 		}
 
 		mutex_unlock(&block_group->alloc_mutex);
@@ -3271,7 +3272,6 @@ loop:
 		int try_again = empty_size;
 
 		block_group = NULL;
-		total_needed -= empty_size;
 		empty_size = 0;
 
 		if (allowed_chunk_alloc) {
@@ -3356,6 +3356,7 @@ again:
 			spin_unlock(&root->alloc_lock);
 			return 0;
 		}
+		hint_byte = root->alloc_offset;
 		spin_unlock(&root->alloc_lock);
 	}
 
@@ -6398,6 +6399,8 @@ int btrfs_read_block_groups(struct btrfs_root *root)
 		set_avail_alloc_bits(root->fs_info, cache->flags);
 		if (btrfs_chunk_readonly(root, cache->key.objectid))
 			set_block_group_readonly(cache);
+		cache->fragments = 0;
+		cache->fragment_size = root->sectorsize;
 	}
 	ret = 0;
 error:
@@ -6430,6 +6433,8 @@ int btrfs_make_block_group(struct btrfs_trans_handle
*trans,
 	mutex_init(&cache->alloc_mutex);
 	mutex_init(&cache->cache_mutex);
 	INIT_LIST_HEAD(&cache->list);
+	cache->fragments = 0;
+	cache->fragment_size = root->sectorsize;
 
 	btrfs_set_block_group_used(&cache->item, bytes_used);
 	btrfs_set_block_group_chunk_objectid(&cache->item, chunk_objectid);
diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index d1e5f0e..23cb12c 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -163,6 +163,7 @@ static void unlink_free_space(struct btrfs_block_group_cache
*block_group,
 {
 	rb_erase(&info->offset_index, &block_group->free_space_offset);
 	rb_erase(&info->bytes_index, &block_group->free_space_bytes);
+	block_group->fragments -= 1;
 }
 
 static int link_free_space(struct btrfs_block_group_cache *block_group,
@@ -181,6 +182,8 @@ static int link_free_space(struct btrfs_block_group_cache
*block_group,
 	if (ret)
 		return ret;
 
+	block_group->fragments += 1;
+
 	return ret;
 }
 
@@ -447,44 +450,25 @@ void btrfs_remove_free_space_cache(struct
btrfs_block_group_cache *block_group)
 	mutex_unlock(&block_group->alloc_mutex);
 }
 
-#if 0
-static struct btrfs_free_space *btrfs_find_free_space_offset(struct
-						      btrfs_block_group_cache
-						      *block_group, u64 offset,
-						      u64 bytes)
+static int fragmentation_percent(struct btrfs_block_group_cache *block_group)
 {
-	struct btrfs_free_space *ret;
+	u64 max_fragments;
 
-	mutex_lock(&block_group->alloc_mutex);
-	ret = tree_search_offset(&block_group->free_space_offset, offset,
-				 bytes, 0);
-	mutex_unlock(&block_group->alloc_mutex);
-
-	return ret;
+	max_fragments = (block_group->key.offset -
+		btrfs_block_group_used(&block_group->item)) /
+		block_group->fragment_size;
+	return ((block_group->fragments / max_fragments) * 100);
 }
 
-static struct btrfs_free_space *btrfs_find_free_space_bytes(struct
-						     btrfs_block_group_cache
-						     *block_group, u64 offset,
-						     u64 bytes)
-{
-	struct btrfs_free_space *ret;
-
-	mutex_lock(&block_group->alloc_mutex);
-
-	ret = tree_search_bytes(&block_group->free_space_bytes, offset, bytes);
-	mutex_unlock(&block_group->alloc_mutex);
-
-	return ret;
-}
-#endif
-
 struct btrfs_free_space *btrfs_find_free_space(struct btrfs_block_group_cache
 					       *block_group, u64 offset,
-					       u64 bytes)
+					       u64 bytes, u64 empty_size)
 {
 	struct btrfs_free_space *ret = NULL;
 
+	if (empty_size && fragmentation_percent(block_group) < 20)
+		bytes += empty_size;
+
 	ret = tree_search_offset(&block_group->free_space_offset, offset,
 				 bytes, 0);
 	if (!ret)
-- 
1.5.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Yien Zheng

2009-Mar-08 14:42 UTC

head link

Re: [PATCH] Btrfs: take block group fragmentation into account for allocation

I tried applying this patch, but the fragmentation_percent function is
giving me:

WARNING: "__udivdi3"
[/home/partition6/yien/git/linux-git/fs/btrfs/btrfs.ko] undefined!

when I compile, and the module fails to load, even though the build
completes.  I''ve traced it to the calculation of max_fragments:

max_fragments = (block_group->key.offset -
		btrfs_block_group_used(&block_group->item)) /
		block_group->fragment_size;

It seems that accessing block_group in here is causing a reference to
__udivdi3 somehow.  Any idea what''s going on here?
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jens Axboe

2009-Mar-08 16:37 UTC

head link

Re: [PATCH] Btrfs: take block group fragmentation into account for allocation

On Sun, Mar 08 2009, Yien Zheng wrote:> I tried applying this patch, but the fragmentation_percent function is
> giving me:
> 
> WARNING: "__udivdi3"
> [/home/partition6/yien/git/linux-git/fs/btrfs/btrfs.ko] undefined!
> 
> when I compile, and the module fails to load, even though the build
> completes.  I''ve traced it to the calculation of max_fragments:
> 
> max_fragments = (block_group->key.offset -
> 		btrfs_block_group_used(&block_group->item)) /
> 		block_group->fragment_size;
> 
> It seems that accessing block_group in here is causing a reference to
> __udivdi3 somehow.  Any idea what''s going on here?
You can''t to 64-bit divides on 32-bit archs. Make that

max_fragments = block_group->key.offset -
btrfs_block_group_used(&block_group->item);
do_div(max_fragments, block_group->fragment_size);

and it should work.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Yien Zheng

2009-Mar-09 02:03 UTC

head link

Re: [PATCH] Btrfs: take block group fragmentation into account for allocation

Thanks Jens!  My research had indicated something about 64-bit
division and using do_div, but you cleared it up for me.

The fragmentation_percent function ultimately does another divide to
get the percentage, so here''s what I did to get the function to work:

static int fragmentation_percent(struct btrfs_block_group_cache *block_group)
{
        u64 max_fragments;
        u64 fragments_ratio;

        max_fragments = block_group->key.offset -
                btrfs_block_group_used(&block_group->item);
        do_div(max_fragments, block_group->fragment_size);
        fragments_ratio = block_group->fragments;
        do_div(fragments_ratio, max_fragments);
        return (fragments_ratio * 100);
}

On Sun, Mar 8, 2009 at 10:37 AM, Jens Axboe <jens.axboe@oracle.com>
wrote:> On Sun, Mar 08 2009, Yien Zheng wrote:
>> I tried applying this patch, but the fragmentation_percent function is
>> giving me:
>>
>> WARNING: "__udivdi3"
>> [/home/partition6/yien/git/linux-git/fs/btrfs/btrfs.ko] undefined!
>>
>> when I compile, and the module fails to load, even though the build
>> completes.  I''ve traced it to the calculation of
max_fragments:
>>
>> max_fragments = (block_group->key.offset -
>>               btrfs_block_group_used(&block_group->item)) /
>>               block_group->fragment_size;
>>
>> It seems that accessing block_group in here is causing a reference to
>> __udivdi3 somehow.  Any idea what''s going on here?
>
> You can''t to 64-bit divides on 32-bit archs. Make that
>
> max_fragments = block_group->key.offset -
btrfs_block_group_used(&block_group->item);
> do_div(max_fragments, block_group->fragment_size);
>
> and it should work.
>
> --
> Jens Axboe
>
>--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Simon Holm Thøgersen

2009-Mar-09 13:56 UTC

head link

Re: [PATCH] Btrfs: take block group fragmentation into account for allocation

søn, 08 03 2009 kl. 20:03 -0600, skrev Yien Zheng:> Thanks Jens!  My research had indicated something about 64-bit
> division and using do_div, but you cleared it up for me.
> 
> The fragmentation_percent function ultimately does another divide to
> get the percentage, so here''s what I did to get the function to
work:
> 
> static int fragmentation_percent(struct btrfs_block_group_cache
*block_group)
> {
>         u64 max_fragments;
>         u64 fragments_ratio;
> 
>         max_fragments = block_group->key.offset -
>                 btrfs_block_group_used(&block_group->item);
>         do_div(max_fragments, block_group->fragment_size);
>         fragments_ratio = block_group->fragments;
>         do_div(fragments_ratio, max_fragments);
>         return (fragments_ratio * 100);
> }
> So the idea of the function is to return an integer in the range
[0,100]? The problem is that it will only return either 0 or 100, but
nothing in between, right?

Wouldn''t something along the lines of the following be better?

u64 tmp = (block_group->key.offset -
	btrfs_block_group_used(&block_group->item);
u64 tmp2 = 100 * block_group->fragment_size * block_group->fragments;
do_div(tmp2, tmp);
return tmp2;


Simon Holm Thøgersen

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Oliver Mattos

2009-Mar-09 15:21 UTC

head link

Re: [PATCH] Btrfs: take block group fragmentation into account for allocation

> So the idea of the function is to return an integer in the range
> [0,100]?
Why are we using a range of 0 to 100 anyway?  100 seems like an arbitary 
value for kernel space - why not just keep it as a value in the range 
[0,2^32) ?   That eliminates the arbitary constant of 100, and in some cases 
could reduce the effects of rounding and allow finer control at no 
additional expense. 

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Josef Bacik

2009-Mar-09 15:24 UTC

head link

Re: [PATCH] Btrfs: take block group fragmentation into account for allocation

On Mon, Mar 09, 2009 at 03:21:06PM -0000, Oliver Mattos
wrote:>
>> So the idea of the function is to return an integer in the range
>> [0,100]?
>
> Why are we using a range of 0 to 100 anyway?  100 seems like an arbitary  
> value for kernel space - why not just keep it as a value in the range  
> [0,2^32) ?   That eliminates the arbitary constant of 100, and in some 
> cases could reduce the effects of rounding and allow finer control at no  
> additional expense. 
>
Its not arbitrary, its a percentage, so 0-100 percent.

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

jim owens

2009-Mar-09 17:29 UTC

head link

Re: [PATCH] Btrfs: take block group fragmentation into account for allocation

Josef Bacik wrote:> On Mon, Mar 09, 2009 at 03:21:06PM -0000, Oliver Mattos wrote:
>>> So the idea of the function is to return an integer in the range
>>> [0,100]?
>> Why are we using a range of 0 to 100 anyway?  100 seems like an
arbitary
>> value for kernel space - why not just keep it as a value in the range  
>> [0,2^32) ?   That eliminates the arbitary constant of 100, and in some 
>> cases could reduce the effects of rounding and allow finer control at
no
>> additional expense. 
>>
> 
> Its not arbitrary, its a percentage, so 0-100 percent.
True, however the decision to use a percentage scheme is arbitrary.

For triggers like this it is just as good to pick points
like 1/8 (12.5%) or 1/4 (25%) that can be calculated without
doing division.  Forgetting the 32 bit architecture problem,
many architectures are really slow at divide so it is better
to not use them unless it adds real value.  Other than percent
being easy to document, what is the justification for percent.

jim
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Btrfs devel - Mar 2009 - [PATCH] Btrfs: take block group fragmentation into account for allocation

[PATCH] Btrfs: take block group fragmentation into account for allocation

Re: [PATCH] Btrfs: take block group fragmentation into account for allocation

Re: [PATCH] Btrfs: take block group fragmentation into account for allocation

Re: [PATCH] Btrfs: take block group fragmentation into account for allocation

Re: [PATCH] Btrfs: take block group fragmentation into account for allocation

Re: [PATCH] Btrfs: take block group fragmentation into account for allocation

Re: [PATCH] Btrfs: take block group fragmentation into account for allocation

Re: [PATCH] Btrfs: take block group fragmentation into account for allocation