thr3ads.net - Btrfs devel - Bulk discard doesn''t work after add/delete of devices [Feb 2012]

If this information is useful, please help other people find it:
Share via:

Lutz Euler

2012-Feb-05 20:37 UTC

Bulk discard doesn''t work after add/delete of devices

... maybe even the block group cache is nonfunctional then?

I am using a btrfs file system, mirrored data and metadata on two SSDs,
and recently tried for the first time to run fstrim on it. fstrim
unexpectedly said "0 bytes were trimmed". strace -T shows that it
spends
only a few microseconds in the ioctl system call (basically the overhead
of strace, it seems), so I engaged in some "printk" debugging and
found
that after "btrfs_trim_fs" executes its first statement,

  cache = btrfs_lookup_block_group(fs_info, range->start);

cache is 0. As the file system was created with a very recent kernel and
always mounted with the default "space_cache" option I guessed that
this
might have something to do with the fact that I exchanged both the
filesystem''s devices earlier (as you can see from the devid''s
in the
following output -- this is the only btrfs file system on the machine):

# btrfs fi show /dev/sda3
Label: none  uuid: 88af7576-3027-4a3b-a5ae-34bfd167982f
	Total devices 2 FS bytes used 28.13GB
	devid    4 size 74.53GB used 38.06GB path /dev/sdb1
	devid    3 size 75.24GB used 38.06GB path /dev/sda3

("exchanged" means: I created the filesystem as a mirror of sdb1 and
sdb2, put data on it, then added sda3, balanced, deleted sdb2, balanced
again, then unmounted, repartitioned sdb1 and sdb2 together as a larger
sdb1, mounted degraded, added sdb1, balanced and deleted "missing".)

So I tried to reproduce this and got the following:

# dd if=/dev/zero of=img0 bs=1 count=0 seek=5G
# dd if=/dev/zero of=img1 bs=1 count=0 seek=5G
# losetup -f img0
# losetup -f img1
# mkfs.btrfs -d raid1 -m raid1 /dev/loop0 /dev/loop1
# mount /dev/loop0 /mnt
# fstrim -v /mnt
/mnt: 4332265472 bytes were trimmed

(The above mentioned kernel instrumentation shows that "cache" was
not 0 here.)

# dd if=/dev/zero of=img2 bs=1 count=0 seek=5G
# losetup -f img2
# btrfs device add /dev/loop2 /mnt
# btrfs device delete /dev/loop0 /mnt
# fstrim -v /mnt
/mnt: 0 bytes were trimmed

("cache" was 0 here.)

Software versions: See my earlier mail about degraded mirrors;
reproducing the issue was on v3.3-rc2-172-g23783f8. I built fstrim
from git, that is, util-linux pulled yesterday from git.kernel.org.

What''s going on here?
If the space cache is indeed broken on this filesystem, can I repair
it without risk to my data?
By mounting with "nospace_cache" once or somehow using
"clear_cache"?

Greetings,

Lutz
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Liu Bo

2012-Feb-09 08:42 UTC

head link

Re: Bulk discard doesn''t work after add/delete of devices

On 02/06/2012 04:37 AM, Lutz Euler wrote:> ... maybe even the block group cache is nonfunctional then?
> 
> I am using a btrfs file system, mirrored data and metadata on two SSDs,
> and recently tried for the first time to run fstrim on it. fstrim
> unexpectedly said "0 bytes were trimmed". strace -T shows that it
spends
> only a few microseconds in the ioctl system call (basically the overhead
> of strace, it seems), so I engaged in some "printk" debugging and
found
> that after "btrfs_trim_fs" executes its first statement,
> 
>   cache = btrfs_lookup_block_group(fs_info, range->start);
> 
> cache is 0. As the file system was created with a very recent kernel and
> always mounted with the default "space_cache" option I guessed
that this
> might have something to do with the fact that I exchanged both the
> filesystem''s devices earlier (as you can see from the
devid''s in the
> following output -- this is the only btrfs file system on the machine):
> 
> # btrfs fi show /dev/sda3
> Label: none  uuid: 88af7576-3027-4a3b-a5ae-34bfd167982f
> 	Total devices 2 FS bytes used 28.13GB
> 	devid    4 size 74.53GB used 38.06GB path /dev/sdb1
> 	devid    3 size 75.24GB used 38.06GB path /dev/sda3
> 
> ("exchanged" means: I created the filesystem as a mirror of sdb1
and
> sdb2, put data on it, then added sda3, balanced, deleted sdb2, balanced
> again, then unmounted, repartitioned sdb1 and sdb2 together as a larger
> sdb1, mounted degraded, added sdb1, balanced and deleted
"missing".)
> 
> So I tried to reproduce this and got the following:
> 
> # dd if=/dev/zero of=img0 bs=1 count=0 seek=5G
> # dd if=/dev/zero of=img1 bs=1 count=0 seek=5G
> # losetup -f img0
> # losetup -f img1
> # mkfs.btrfs -d raid1 -m raid1 /dev/loop0 /dev/loop1
> # mount /dev/loop0 /mnt
> # fstrim -v /mnt
> /mnt: 4332265472 bytes were trimmed
> 
> (The above mentioned kernel instrumentation shows that "cache"
was
> not 0 here.)
> 
> # dd if=/dev/zero of=img2 bs=1 count=0 seek=5G
> # losetup -f img2
> # btrfs device add /dev/loop2 /mnt
> # btrfs device delete /dev/loop0 /mnt
> # fstrim -v /mnt
> /mnt: 0 bytes were trimmed
> 
> ("cache" was 0 here.)
> 
> Software versions: See my earlier mail about degraded mirrors;
> reproducing the issue was on v3.3-rc2-172-g23783f8. I built fstrim
> from git, that is, util-linux pulled yesterday from git.kernel.org.
> 
> What''s going on here?
> If the space cache is indeed broken on this filesystem, can I repair
> it without risk to my data?
> By mounting with "nospace_cache" once or somehow using
"clear_cache"?
> 
Hi Lutz,

Would you please test the following patch on your box?

thanks,
liubo

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 77ea23c..b6e2c92 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -7653,9 +7653,16 @@ int btrfs_trim_fs(struct btrfs_root *root, struct
fstrim_range *range)
 	u64 start;
 	u64 end;
 	u64 trimmed = 0;
+	u64 total_bytes = btrfs_super_total_bytes(fs_info->super_copy);
 	int ret = 0;
 
-	cache = btrfs_lookup_block_group(fs_info, range->start);
+	/*
+	 * try to trim all FS space, our block group may start from non-zero.
+	 */
+	if (range->len == total_bytes)
+		cache = btrfs_lookup_first_block_group(fs_info, range->start);
+	else
+		cache = btrfs_lookup_block_group(fs_info, range->start);
 
 	while (cache) {
 		if (cache->key.objectid >= (range->start + range->len)) {
-- 
1.6.5.2

> Greetings,
> 
> Lutz
> --
> To unsubscribe from this list: send the line "unsubscribe
linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Lutz Euler

2012-Feb-09 15:50 UTC

head link

Re: Bulk discard doesn''t work after add/delete of devices

Hi Liu,

thanks for looking into this!

You wrote:
> Would you please test the following patch on your box?
> 
> thanks,
> liubo
> 
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 77ea23c..b6e2c92 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -7653,9 +7653,16 @@ int btrfs_trim_fs(struct btrfs_root *root, struct
fstrim_range *range)
>  	u64 start;
>  	u64 end;
>  	u64 trimmed = 0;
> +	u64 total_bytes = btrfs_super_total_bytes(fs_info->super_copy);
>  	int ret = 0;
>  
> -	cache = btrfs_lookup_block_group(fs_info, range->start);
> +	/*
> +	 * try to trim all FS space, our block group may start from non-zero.
> +	 */
> +	if (range->len == total_bytes)
> +		cache = btrfs_lookup_first_block_group(fs_info, range->start);
> +	else
> +		cache = btrfs_lookup_block_group(fs_info, range->start);
>  
>  	while (cache) {
>  		if (cache->key.objectid >= (range->start + range->len)) {
> -- 
> 1.6.5.2
I''m sorry but that didn''t help.
I applied the patch on top of v3.3-rc2-172-g23783f8 and still fstrim
says: "0 bytes were trimmed".

I left the printk''s I mentioned in place. They show that now
"cache"
is not 0 so the "while (cache)" loop is entered but nevertheless
"btrfs_trim_block_group" is never called. So I put in more
printk''s
and found that the objectid is so large that the loop is immediately
terminated (output of printk''s in "//" comments below):

int btrfs_trim_fs(struct btrfs_root *root, struct fstrim_range *range)
{
	struct btrfs_fs_info *fs_info = root->fs_info;
	struct btrfs_block_group_cache *cache = NULL;
	u64 group_trimmed;
	u64 start;
	u64 end;
	u64 trimmed = 0;
	u64 total_bytes = btrfs_super_total_bytes(fs_info->super_copy);
	int ret = 0;

// range.start = 0, .len = 160815300608, .minlen = 512

	/*
	 * try to trim all FS space, our block group may start from non-zero.
	 */
	if (range->len == total_bytes)
		cache = btrfs_lookup_first_block_group(fs_info, range->start);
	else
		cache = btrfs_lookup_block_group(fs_info, range->start);

// cache != 0

	while (cache) {

// cache->key.objectid = 213702934528

		if (cache->key.objectid >= (range->start + range->len)) {
			btrfs_put_block_group(cache);
			break;
		}

// This line is not reached.

Here''s the output of "btrfs fi show /dev/sda3" again:

failed to read /dev/sr0
Label: none  uuid: 88af7576-3027-4a3b-a5ae-34bfd167982f
	Total devices 2 FS bytes used 29.05GB
	devid    4 size 74.53GB used 40.06GB path /dev/sdb1
	devid    3 size 75.24GB used 40.06GB path /dev/sda3

Kind regards,

Lutz
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Liu Bo

2012-Feb-10 01:56 UTC

head link

Re: Bulk discard doesn''t work after add/delete of devices

On 02/09/2012 11:50 PM, Lutz Euler wrote:> Hi Liu,
> 
> thanks for looking into this!
> 
> You wrote:
> 
>> Would you please test the following patch on your box?
>>
>> thanks,
>> liubo
>>
>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
>> index 77ea23c..b6e2c92 100644
>> --- a/fs/btrfs/extent-tree.c
>> +++ b/fs/btrfs/extent-tree.c
>> @@ -7653,9 +7653,16 @@ int btrfs_trim_fs(struct btrfs_root *root,
struct fstrim_range *range)
>>  	u64 start;
>>  	u64 end;
>>  	u64 trimmed = 0;
>> +	u64 total_bytes = btrfs_super_total_bytes(fs_info->super_copy);
>>  	int ret = 0;
>>  
>> -	cache = btrfs_lookup_block_group(fs_info, range->start);
>> +	/*
>> +	 * try to trim all FS space, our block group may start from non-zero.
>> +	 */
>> +	if (range->len == total_bytes)
>> +		cache = btrfs_lookup_first_block_group(fs_info, range->start);
>> +	else
>> +		cache = btrfs_lookup_block_group(fs_info, range->start);
>>  
>>  	while (cache) {
>>  		if (cache->key.objectid >= (range->start + range->len))
{
>> -- 
>> 1.6.5.2
> 
> I''m sorry but that didn''t help.
> I applied the patch on top of v3.3-rc2-172-g23783f8 and still fstrim
> says: "0 bytes were trimmed".
> 
> I left the printk''s I mentioned in place. They show that now
"cache"
> is not 0 so the "while (cache)" loop is entered but nevertheless
> "btrfs_trim_block_group" is never called. So I put in more
printk''s
> and found that the objectid is so large that the loop is immediately
> terminated (output of printk''s in "//" comments below):
> 
> int btrfs_trim_fs(struct btrfs_root *root, struct fstrim_range *range)
> {
> 	struct btrfs_fs_info *fs_info = root->fs_info;
> 	struct btrfs_block_group_cache *cache = NULL;
> 	u64 group_trimmed;
> 	u64 start;
> 	u64 end;
> 	u64 trimmed = 0;
> 	u64 total_bytes = btrfs_super_total_bytes(fs_info->super_copy);
> 	int ret = 0;
> 
> // range.start = 0, .len = 160815300608, .minlen = 512
> 
> 	/*
> 	 * try to trim all FS space, our block group may start from non-zero.
> 	 */
> 	if (range->len == total_bytes)
> 		cache = btrfs_lookup_first_block_group(fs_info, range->start);
> 	else
> 		cache = btrfs_lookup_block_group(fs_info, range->start);
> 
> // cache != 0
> 
> 	while (cache) {
> 
> // cache->key.objectid = 213702934528
> 
> 		if (cache->key.objectid >= (range->start + range->len)) {
> 			btrfs_put_block_group(cache);
> 			break;
> 		}
> 
> // This line is not reached.
> 
> Here''s the output of "btrfs fi show /dev/sda3" again:
> 
> failed to read /dev/sr0
> Label: none  uuid: 88af7576-3027-4a3b-a5ae-34bfd167982f
> 	Total devices 2 FS bytes used 29.05GB
> 	devid    4 size 74.53GB used 40.06GB path /dev/sdb1
> 	devid    3 size 75.24GB used 40.06GB path /dev/sda3
> 
Hi Lutz,

Great thanks for your details, now this should be a worked one.

liubo,
thanks,

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 77ea23c..693241a 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -7653,12 +7653,22 @@ int btrfs_trim_fs(struct btrfs_root *root, struct
fstrim_range *range)
 	u64 start;
 	u64 end;
 	u64 trimmed = 0;
+	u64 total_bytes = btrfs_super_total_bytes(fs_info->super_copy);
+	bool all = false;
 	int ret = 0;
 
-	cache = btrfs_lookup_block_group(fs_info, range->start);
+	/* try to trim all FS space */
+	if (range->len == total_bytes)
+		all = true;
+
+	if (all)
+		/* our block group may start from non-zero */
+		cache = btrfs_lookup_first_block_group(fs_info, range->start);
+	else
+		cache = btrfs_lookup_block_group(fs_info, range->start);
 
 	while (cache) {
-		if (cache->key.objectid >= (range->start + range->len)) {
+		if (!all && cache->key.objectid >= (range->start +
range->len)) {
 			btrfs_put_block_group(cache);
 			break;
 		}
-- 
1.6.5.2


> Kind regards,
> 
> Lutz
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Lutz Euler

2012-Feb-12 17:01 UTC

head link

Re: Bulk discard doesn''t work after add/delete of devices

Hi,

tl;dr: btrfs_trim_fs, IMHO, needs more thorough surgery.

Thanks for providing the new patch. I think it will work in the case
that "fstrim" is called without specifying a range to trim (that is,
defaulting to the whole filesystem), but I didn''t test that yet, sorry.

Instead, I have been thinking about what happens if a range smaller than
the whole filesystem is specified. After all, the observation that in my
filesystem the smallest "objectid" is already larger than the
filesystem
size still holds even in that case, and wanting to trim only part of the
filesystem seems to be a valid wish, too.

So I dug into the code myself and came to the conclusion that the
way "btrfs_trim_fs" interprets the range passed from the ioctl is
fundamentally broken.

Instead of papering over that I''d very much prefer a more thorough fix
here, which in addition might make it unnecessary to treat the "trim the
complete filesystem" case specially. Please read on for the details:

The current implementation of "btrfs_trim_fs" simply compares the
objectid''s it finds in the chunk tree against the range passed from
the ioctl, seemingly assuming that both kinds of numbers span the same
range. But this is clearly not true: Even without adding and deleting
of devices, in a simple mirror the objectids will span only about half
the size of the filesystem. With suitable add/delete of devices I can
construct a filesystem with a smallest objectid of 0 and a largest
objectid much larger than the size of the filesystem (so, obviously,
with large holes in the set of used objectid''s), or, as in my
filesystem
mentioned above, with a smallest objectid larger than the size of the
filesystem.

So, in the function the statement
  cache = btrfs_lookup_block_group(fs_info, range->start);
will lead to nothing being trimmed if the smallest objectid is too
large; if the
  if (cache->key.objectid >= (range->start + range->len))
is left in, too little will be trimmed; if it is left out, the following
calculations
  start = max(range->start, cache->key.objectid);
  end = min(range->start + range->len,
            cache->key.objectid + cache->key.offset);
  if (end - start >= range->minlen)
are bound to misbehave, for example start being larger than end seems
possible, overflowing the subtraction and, I believe, leading to the
whole filesystem being trimmed where the user only requested a part.

Here is the start of the chunk tree of my filesystem for reference,
showing the device sizes and the first objectid:

chunk tree
node 241620221952 level 1 items 2 free 119 generation 16831 owner 3
fs uuid 88af7576-3027-4a3b-a5ae-34bfd167982f
chunk uuid 9a7d25e3-ebcf-4f31-bdd4-f44531a2fb1c
        key (DEV_ITEMS DEV_ITEM 3) block 241620226048 (58989313) gen
          16831
        key (FIRST_CHUNK_TREE CHUNK_ITEM 231956545536) block
          241620230144 (58989314) gen 16831
leaf 241620226048 items 19 free space 1420 generation 16831 owner 3
fs uuid 88af7576-3027-4a3b-a5ae-34bfd167982f
chunk uuid 9a7d25e3-ebcf-4f31-bdd4-f44531a2fb1c
        item 0 key (DEV_ITEMS DEV_ITEM 3) itemoff 3897 itemsize 98
                dev item devid 3 total_bytes 80789987328 bytes used
                  44090523648
        item 1 key (DEV_ITEMS DEV_ITEM 4) itemoff 3799 itemsize 98
                dev item devid 4 total_bytes 80025313280 bytes used
                  44090523648
        item 2 key (FIRST_CHUNK_TREE CHUNK_ITEM 213702934528) itemoff
          3687 itemsize 112
                chunk length 1073741824 owner 2 type 20 num_stripes 2
                        stripe 0 devid 4 offset 1074790400
                        stripe 1 devid 3 offset 40870346752

So, to make bulk discard of ranges useful, it seems the incoming range
should be interpreted relative to the size of the filesystem and not to
the allocated chunks. As AFAIK the size of the filesystem is just the
sum of the sizes of its devices it might be possible to map the range
onto a virtual concatenation of the devices, these perhaps ordered by
devid, and then to find the free space by searching for the resulting
devid(s) and device-relative offsets in the device tree?

I understand that currently unallocated space is never trimmed?
To nevertheless do that might be useful after a balance or at file
system creation time? If there is a way to find the unallocated space
for a device, this could be improved at the same time, too.

Kind regards,

Lutz
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Liu Bo

2012-Feb-13 05:57 UTC

head link

Re: Bulk discard doesn''t work after add/delete of devices

On 02/13/2012 01:01 AM, Lutz Euler wrote:> Hi,
> 
> tl;dr: btrfs_trim_fs, IMHO, needs more thorough surgery.
> 
> Thanks for providing the new patch. I think it will work in the case
> that "fstrim" is called without specifying a range to trim (that
is,
> defaulting to the whole filesystem), but I didn''t test that yet,
sorry.
> 
> Instead, I have been thinking about what happens if a range smaller than
> the whole filesystem is specified. After all, the observation that in my
> filesystem the smallest "objectid" is already larger than the
filesystem
> size still holds even in that case, and wanting to trim only part of the
> filesystem seems to be a valid wish, too.
> 
> So I dug into the code myself and came to the conclusion that the
> way "btrfs_trim_fs" interprets the range passed from the ioctl is
> fundamentally broken.
> 
> Instead of papering over that I''d very much prefer a more thorough
fix
> here, which in addition might make it unnecessary to treat the "trim
the
> complete filesystem" case specially. Please read on for the details:
> 
> The current implementation of "btrfs_trim_fs" simply compares the
> objectid''s it finds in the chunk tree against the range passed
from
> the ioctl, seemingly assuming that both kinds of numbers span the same
> range. But this is clearly not true: Even without adding and deleting
> of devices, in a simple mirror the objectids will span only about half
> the size of the filesystem. With suitable add/delete of devices I can
> construct a filesystem with a smallest objectid of 0 and a largest
> objectid much larger than the size of the filesystem (so, obviously,
> with large holes in the set of used objectid''s), or, as in my
filesystem
> mentioned above, with a smallest objectid larger than the size of the
> filesystem.
> 
[...]> So, to make bulk discard of ranges useful, it seems the incoming range
> should be interpreted relative to the size of the filesystem and not to
> the allocated chunks. As AFAIK the size of the filesystem is just the
> sum of the sizes of its devices it might be possible to map the range
> onto a virtual concatenation of the devices, these perhaps ordered by
> devid, and then to find the free space by searching for the resulting
> devid(s) and device-relative offsets in the device tree?
> 
Actually I have no idea how to deal with this properly :(

Because btrfs supports multi-devices so that we have to set the filesystem
logical range to [0, (u64)-1] to get things to work well, while other
filesystems''s logical range is [0, device''s total_bytes].

What''s more, in btrfs, devices can be mirrored to RAID, and the free
space is
also ordered by logical addr [0, (u64)-1], so IMO it is impossible to
interpreted the range.

I''d better pick up "treat the "trim the complete
filesystem" case specially",
and drop the following commit:

commit f4c697e6406da5dd445eda8d923c53e1138793dd
Author: Lukas Czerner <lczerner@redhat.com>
Date:   Mon Sep 5 16:34:54 2011 +0200

    btrfs: return EINVAL if start > total_bytes in fitrim ioctl
    
    We should retirn EINVAL if the start is beyond the end of the file
    system in the btrfs_ioctl_fitrim(). Fix that by adding the appropriate
    check for it.
    
    Also in the btrfs_trim_fs() it is possible that len+start might overflow
    if big values are passed. Fix it by decrementing the len so that start+len
    is equal to the file system size in the worst case.
    
    Signed-off-by: Lukas Czerner <lczerner@redhat.com>

thanks,
liubo
> I understand that currently unallocated space is never trimmed?
> To nevertheless do that might be useful after a balance or at file
> system creation time? If there is a way to find the unallocated space
> for a device, this could be improved at the same time, too.
> 
> Kind regards,
> 
> Lutz
> --
> To unsubscribe from this list: send the line "unsubscribe
linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Lutz Euler

2012-Feb-14 17:32 UTC

head link

Re: Bulk discard doesn''t work after add/delete of devices

Hi,

Liu Bo wrote:
> Actually I have no idea how to deal with this properly :(
> 
> Because btrfs supports multi-devices so that we have to set the
> filesystem logical range to [0, (u64)-1] to get things to work well,
> while other filesystems''s logical range is [0, device''s
total_bytes].
> 
> What''s more, in btrfs, devices can be mirrored to RAID, and the
free
> space is also ordered by logical addr [0, (u64)-1], so IMO it is
> impossible to interpreted the range.
I do not concur with this reasoning, but please see below that I concur
with your conclusion.

I still think that the range could be interpreted like I hinted at in my
mail:
>> So, to make bulk discard of ranges useful, it seems the incoming range
>> should be interpreted relative to the size of the filesystem and not
>> to the allocated chunks. As AFAIK the size of the filesystem is just
>> the sum of the sizes of its devices it might be possible to map the
>> range onto a virtual concatenation of the devices, these perhaps
>> ordered by devid, and then to find the free space by searching for the
>> resulting devid(s) and device-relative offsets in the device tree?
This would only be somewhat difficult to use if the filesystem consisted
of a mixture of non trim-capable and trim-capable devices and if the
idea behind ranges is that the user can expect an equal amount of
trim-capable storage behind different equally sized ranges - but I
don''t
know if this is indeed an idea behind ranges. If, on the other hand,
there is a way outside of the ioctl to find out which devices the
filesystem consists of and which of these support discard, the above
mentioned way to interpret ranges would extend to this setting.

But maybe this would already be too ugly, essentially working around
the shortcomings of an interface that is too restricted.

Moreover, I don''t know why ranges smaller than the filesystem are
supported by fstrim. I couldn''t find any use cases or rationale for it
in fstrim''s or the ioctl''s documentation or elsewhere. This
makes it
difficult to find out what might be useful in the case of a multi-device
filesystem ;-)

So, with ranges being this unclear, I concur with your suggestion:
> I''d better pick up "treat the "trim the complete
filesystem" case
> specially", and drop the following commit:
>
> commit f4c697e6406da5dd445eda8d923c53e1138793dd
> Author: Lukas Czerner <lczerner@redhat.com>
> Date:   Mon Sep 5 16:34:54 2011 +0200
>
>     btrfs: return EINVAL if start > total_bytes in fitrim ioctl
More specifically, what I do wish for is:
- Fix the problem I started the thread for: Make fstrim of the complete
  filesystem work.
And then, if possible:
- Simplify btrfs_trim_fs as much as possible to remove all traces
  of support for ranges smaller than the filesystem, document that this
  anyway wouldn''t do what the user might expect, and, if possible,
  return an error in these cases.
- Also trim unallocated space (for use after balancing and at mkfs
  time).

Thanks for your time and your work,

Lutz
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Lutz Euler

2012-Feb-29 00:17 UTC

head link

Re: Bulk discard doesn''t work after add/delete of devices

Hi,

I have seen that in the meantime a patch for this issue has found
its way into 3.3-rc5. Unfortunately with this kernel my filesystem still
says "0 bytes were trimmed", so the bug is not fixed for me.

Now that might just have been caused by the fact that the patch that
went into 3.3-rc5 (as commit 2cac13e41bf5b99ffc426bd28dfd2248df1dfa67)
is the one from Liu Bo''s mail <4F3386D8.6010108@cn.fujitsu.com>,
which
I already responded to by saying it didn''t help my issue.

But the next patch Liu sent (in the mail
<4F34795B.2020004@cn.fujitsu.com>) also does not help.
At first I responded to that mail:
> Thanks for providing the new patch. I think it will work in the case
> that "fstrim" is called without specifying a range to trim (that
is,
> defaulting to the whole filesystem), but I didn''t test that yet,
sorry.
I was too optimistic. I have now tested this second patch and still get
"0 bytes were trimmed".

Thanks again and regards,

Lutz
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Lutz Euler

2012-Apr-10 17:34 UTC

head link

Re: Bulk discard doesn''t work after add/delete of devices

Hi,

the issue I raised in this thread is still not fixed. Please, could
someone look into it? A fix should be simple, especially as the issue is
easily reproducable. (In case the information in this thread so far is
not sufficient for that I will happily provide a recipe; just ask.)
Also, I believe so far not even a regression test has been added for
this issue.

Here''s the state of affairs again, as I mailed previously:
> I have seen that in the meantime a patch for this issue has found
> its way into 3.3-rc5. Unfortunately with this kernel my filesystem still
> says "0 bytes were trimmed", so the bug is not fixed for me.
> 
> Now that might just have been caused by the fact that the patch that
> went into 3.3-rc5 (as commit 2cac13e41bf5b99ffc426bd28dfd2248df1dfa67)
> is the one from Liu Bo''s mail
<4F3386D8.6010108@cn.fujitsu.com>, which
> I already responded to by saying it didn''t help my issue.
> 
> But the next patch Liu sent (in the mail
> <4F34795B.2020004@cn.fujitsu.com>) also does not help.
> At first I responded to that mail:
> 
> > Thanks for providing the new patch. I think it will work in the case
> > that "fstrim" is called without specifying a range to trim
(that is,
> > defaulting to the whole filesystem), but I didn''t test that
yet, sorry.
> 
> I was too optimistic. I have now tested this second patch and still get
> "0 bytes were trimmed".
Thanks for providing btrfs! Greetings,

Lutz
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Lutz Euler

2012-Nov-14 21:10 UTC

head link

Re: Bulk discard doesn''t work after add/delete of devices

Hi,

the issue I raised in this thread is still not fixed. So, to get things
going, I prepared a patch myself which I will send in a separate email.
The subject will be "really fix trim 0 bytes after a device delete".
I would be grateful if you could consider the patch -- with it applied
I finally could trim my disks successfully.

Greetings,

Lutz
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Lutz Euler

2012-Nov-14 21:17 UTC

head link

[PATCH] Btrfs: really fix trim 0 bytes after a device delete

Commit 2cac13e41bf5b99ffc426bd28dfd2248df1dfa67, "fix trim 0 bytes after
a device delete", said:
  A user reported a bug of btrfs''s trim, that is we will trim 0 bytes
  after a device delete.
The commit didn''t attack the root of the problem so did not fix the bug
except for a special case.

For block discard, btrfs_trim_fs directly compares the range passed in
against the filesystem''s objectids. The former is bounded by the sum of
the sizes of the devices of the filesystem, the latter is a completely
unrelated set of intervals of 64-bit integers. The bug reported occurred
as the smallest objectid was larger than the sum of the device sizes.
The previous commit only fixes the case where the smallest objectid is
nonzero and the largest objectid less than the sum of the device sizes,
but it would still trim too little if the largest objectid is larger
than that and nothing in the reported situation.

The current mapping between the given range and the objectids is thus
clearly broken, so, to fix the bug and as a first step towards a
complete solution, simply ignore the range parameter''s start and length
fields and always trim the whole filesystem. (While this makes it
impossible to only partly trim a filesystem, due to the broken mapping
this often didn''t work anyway.)

Reported-by: Lutz Euler <lutz.euler@freenet.de>
Signed-off-by: Lutz Euler <lutz.euler@freenet.de>
---
 fs/btrfs/extent-tree.c |   55 +++++++++++++++++++++--------------------------
 1 files changed, 25 insertions(+), 30 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 3d3e2c1..9c2bb59 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -8092,44 +8092,39 @@ int btrfs_trim_fs(struct btrfs_root *root, struct
fstrim_range *range)
 	u64 start;
 	u64 end;
 	u64 trimmed = 0;
-	u64 total_bytes = btrfs_super_total_bytes(fs_info->super_copy);
 	int ret = 0;
 
 	/*
-	 * try to trim all FS space, our block group may start from non-zero.
+	 * The range passed in is a subinterval of the interval from 0
+	 * to the sum of the sizes of the devices of the filesystem.
+	 * The objectid''s used in the filesystem can span any set of
+	 * subintervals of the interval from 0 to (u64)-1. As there is
+	 * neither a simple nor an agreed upon mapping between these
+	 * two ranges we ignore the range parameter''s start and len
+	 * fields and always trim the whole filesystem (that is, only
+	 * the free space in allocated chunks).
 	 */
-	if (range->len == total_bytes)
-		cache = btrfs_lookup_first_block_group(fs_info, range->start);
-	else
-		cache = btrfs_lookup_block_group(fs_info, range->start);
+	cache = btrfs_lookup_first_block_group(fs_info, 0);
 
 	while (cache) {
-		if (cache->key.objectid >= (range->start + range->len)) {
-			btrfs_put_block_group(cache);
-			break;
-		}
-
-		start = max(range->start, cache->key.objectid);
-		end = min(range->start + range->len,
-				cache->key.objectid + cache->key.offset);
+		start = cache->key.objectid;
+		end = cache->key.objectid + cache->key.offset;
 
-		if (end - start >= range->minlen) {
-			if (!block_group_cache_done(cache)) {
-				ret = cache_block_group(cache, NULL, root, 0);
-				if (!ret)
-					wait_block_group_cache_done(cache);
-			}
-			ret = btrfs_trim_block_group(cache,
-						     &group_trimmed,
-						     start,
-						     end,
-						     range->minlen);
+		if (!block_group_cache_done(cache)) {
+			ret = cache_block_group(cache, NULL, root, 0);
+			if (!ret)
+				wait_block_group_cache_done(cache);
+		}
+		ret = btrfs_trim_block_group(cache,
+					     &group_trimmed,
+					     start,
+					     end,
+					     range->minlen);
 
-			trimmed += group_trimmed;
-			if (ret) {
-				btrfs_put_block_group(cache);
-				break;
-			}
+		trimmed += group_trimmed;
+		if (ret) {
+			btrfs_put_block_group(cache);
+			break;
 		}
 
 		cache = next_block_group(fs_info->tree_root, cache);
-- 
1.7.4.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Btrfs devel - Feb 2012 - Bulk discard doesn't work after add/delete of devices

Bulk discard doesn''t work after add/delete of devices

Re: Bulk discard doesn''t work after add/delete of devices

Re: Bulk discard doesn''t work after add/delete of devices

Re: Bulk discard doesn''t work after add/delete of devices

Re: Bulk discard doesn''t work after add/delete of devices

Re: Bulk discard doesn''t work after add/delete of devices

Re: Bulk discard doesn''t work after add/delete of devices

Re: Bulk discard doesn''t work after add/delete of devices

Re: Bulk discard doesn''t work after add/delete of devices

Re: Bulk discard doesn''t work after add/delete of devices

[PATCH] Btrfs: really fix trim 0 bytes after a device delete