... maybe even the block group cache is nonfunctional then? I am using a btrfs file system, mirrored data and metadata on two SSDs, and recently tried for the first time to run fstrim on it. fstrim unexpectedly said "0 bytes were trimmed". strace -T shows that it spends only a few microseconds in the ioctl system call (basically the overhead of strace, it seems), so I engaged in some "printk" debugging and found that after "btrfs_trim_fs" executes its first statement, cache = btrfs_lookup_block_group(fs_info, range->start); cache is 0. As the file system was created with a very recent kernel and always mounted with the default "space_cache" option I guessed that this might have something to do with the fact that I exchanged both the filesystem''s devices earlier (as you can see from the devid''s in the following output -- this is the only btrfs file system on the machine): # btrfs fi show /dev/sda3 Label: none uuid: 88af7576-3027-4a3b-a5ae-34bfd167982f Total devices 2 FS bytes used 28.13GB devid 4 size 74.53GB used 38.06GB path /dev/sdb1 devid 3 size 75.24GB used 38.06GB path /dev/sda3 ("exchanged" means: I created the filesystem as a mirror of sdb1 and sdb2, put data on it, then added sda3, balanced, deleted sdb2, balanced again, then unmounted, repartitioned sdb1 and sdb2 together as a larger sdb1, mounted degraded, added sdb1, balanced and deleted "missing".) So I tried to reproduce this and got the following: # dd if=/dev/zero of=img0 bs=1 count=0 seek=5G # dd if=/dev/zero of=img1 bs=1 count=0 seek=5G # losetup -f img0 # losetup -f img1 # mkfs.btrfs -d raid1 -m raid1 /dev/loop0 /dev/loop1 # mount /dev/loop0 /mnt # fstrim -v /mnt /mnt: 4332265472 bytes were trimmed (The above mentioned kernel instrumentation shows that "cache" was not 0 here.) # dd if=/dev/zero of=img2 bs=1 count=0 seek=5G # losetup -f img2 # btrfs device add /dev/loop2 /mnt # btrfs device delete /dev/loop0 /mnt # fstrim -v /mnt /mnt: 0 bytes were trimmed ("cache" was 0 here.) Software versions: See my earlier mail about degraded mirrors; reproducing the issue was on v3.3-rc2-172-g23783f8. I built fstrim from git, that is, util-linux pulled yesterday from git.kernel.org. What''s going on here? If the space cache is indeed broken on this filesystem, can I repair it without risk to my data? By mounting with "nospace_cache" once or somehow using "clear_cache"? Greetings, Lutz -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 02/06/2012 04:37 AM, Lutz Euler wrote:> ... maybe even the block group cache is nonfunctional then? > > I am using a btrfs file system, mirrored data and metadata on two SSDs, > and recently tried for the first time to run fstrim on it. fstrim > unexpectedly said "0 bytes were trimmed". strace -T shows that it spends > only a few microseconds in the ioctl system call (basically the overhead > of strace, it seems), so I engaged in some "printk" debugging and found > that after "btrfs_trim_fs" executes its first statement, > > cache = btrfs_lookup_block_group(fs_info, range->start); > > cache is 0. As the file system was created with a very recent kernel and > always mounted with the default "space_cache" option I guessed that this > might have something to do with the fact that I exchanged both the > filesystem''s devices earlier (as you can see from the devid''s in the > following output -- this is the only btrfs file system on the machine): > > # btrfs fi show /dev/sda3 > Label: none uuid: 88af7576-3027-4a3b-a5ae-34bfd167982f > Total devices 2 FS bytes used 28.13GB > devid 4 size 74.53GB used 38.06GB path /dev/sdb1 > devid 3 size 75.24GB used 38.06GB path /dev/sda3 > > ("exchanged" means: I created the filesystem as a mirror of sdb1 and > sdb2, put data on it, then added sda3, balanced, deleted sdb2, balanced > again, then unmounted, repartitioned sdb1 and sdb2 together as a larger > sdb1, mounted degraded, added sdb1, balanced and deleted "missing".) > > So I tried to reproduce this and got the following: > > # dd if=/dev/zero of=img0 bs=1 count=0 seek=5G > # dd if=/dev/zero of=img1 bs=1 count=0 seek=5G > # losetup -f img0 > # losetup -f img1 > # mkfs.btrfs -d raid1 -m raid1 /dev/loop0 /dev/loop1 > # mount /dev/loop0 /mnt > # fstrim -v /mnt > /mnt: 4332265472 bytes were trimmed > > (The above mentioned kernel instrumentation shows that "cache" was > not 0 here.) > > # dd if=/dev/zero of=img2 bs=1 count=0 seek=5G > # losetup -f img2 > # btrfs device add /dev/loop2 /mnt > # btrfs device delete /dev/loop0 /mnt > # fstrim -v /mnt > /mnt: 0 bytes were trimmed > > ("cache" was 0 here.) > > Software versions: See my earlier mail about degraded mirrors; > reproducing the issue was on v3.3-rc2-172-g23783f8. I built fstrim > from git, that is, util-linux pulled yesterday from git.kernel.org. > > What''s going on here? > If the space cache is indeed broken on this filesystem, can I repair > it without risk to my data? > By mounting with "nospace_cache" once or somehow using "clear_cache"? >Hi Lutz, Would you please test the following patch on your box? thanks, liubo diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 77ea23c..b6e2c92 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -7653,9 +7653,16 @@ int btrfs_trim_fs(struct btrfs_root *root, struct fstrim_range *range) u64 start; u64 end; u64 trimmed = 0; + u64 total_bytes = btrfs_super_total_bytes(fs_info->super_copy); int ret = 0; - cache = btrfs_lookup_block_group(fs_info, range->start); + /* + * try to trim all FS space, our block group may start from non-zero. + */ + if (range->len == total_bytes) + cache = btrfs_lookup_first_block_group(fs_info, range->start); + else + cache = btrfs_lookup_block_group(fs_info, range->start); while (cache) { if (cache->key.objectid >= (range->start + range->len)) { -- 1.6.5.2> Greetings, > > Lutz > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Lutz Euler
2012-Feb-09 15:50 UTC
Re: Bulk discard doesn''t work after add/delete of devices
Hi Liu, thanks for looking into this! You wrote:> Would you please test the following patch on your box? > > thanks, > liubo > > diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c > index 77ea23c..b6e2c92 100644 > --- a/fs/btrfs/extent-tree.c > +++ b/fs/btrfs/extent-tree.c > @@ -7653,9 +7653,16 @@ int btrfs_trim_fs(struct btrfs_root *root, struct fstrim_range *range) > u64 start; > u64 end; > u64 trimmed = 0; > + u64 total_bytes = btrfs_super_total_bytes(fs_info->super_copy); > int ret = 0; > > - cache = btrfs_lookup_block_group(fs_info, range->start); > + /* > + * try to trim all FS space, our block group may start from non-zero. > + */ > + if (range->len == total_bytes) > + cache = btrfs_lookup_first_block_group(fs_info, range->start); > + else > + cache = btrfs_lookup_block_group(fs_info, range->start); > > while (cache) { > if (cache->key.objectid >= (range->start + range->len)) { > -- > 1.6.5.2I''m sorry but that didn''t help. I applied the patch on top of v3.3-rc2-172-g23783f8 and still fstrim says: "0 bytes were trimmed". I left the printk''s I mentioned in place. They show that now "cache" is not 0 so the "while (cache)" loop is entered but nevertheless "btrfs_trim_block_group" is never called. So I put in more printk''s and found that the objectid is so large that the loop is immediately terminated (output of printk''s in "//" comments below): int btrfs_trim_fs(struct btrfs_root *root, struct fstrim_range *range) { struct btrfs_fs_info *fs_info = root->fs_info; struct btrfs_block_group_cache *cache = NULL; u64 group_trimmed; u64 start; u64 end; u64 trimmed = 0; u64 total_bytes = btrfs_super_total_bytes(fs_info->super_copy); int ret = 0; // range.start = 0, .len = 160815300608, .minlen = 512 /* * try to trim all FS space, our block group may start from non-zero. */ if (range->len == total_bytes) cache = btrfs_lookup_first_block_group(fs_info, range->start); else cache = btrfs_lookup_block_group(fs_info, range->start); // cache != 0 while (cache) { // cache->key.objectid = 213702934528 if (cache->key.objectid >= (range->start + range->len)) { btrfs_put_block_group(cache); break; } // This line is not reached. Here''s the output of "btrfs fi show /dev/sda3" again: failed to read /dev/sr0 Label: none uuid: 88af7576-3027-4a3b-a5ae-34bfd167982f Total devices 2 FS bytes used 29.05GB devid 4 size 74.53GB used 40.06GB path /dev/sdb1 devid 3 size 75.24GB used 40.06GB path /dev/sda3 Kind regards, Lutz -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 02/09/2012 11:50 PM, Lutz Euler wrote:> Hi Liu, > > thanks for looking into this! > > You wrote: > >> Would you please test the following patch on your box? >> >> thanks, >> liubo >> >> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c >> index 77ea23c..b6e2c92 100644 >> --- a/fs/btrfs/extent-tree.c >> +++ b/fs/btrfs/extent-tree.c >> @@ -7653,9 +7653,16 @@ int btrfs_trim_fs(struct btrfs_root *root, struct fstrim_range *range) >> u64 start; >> u64 end; >> u64 trimmed = 0; >> + u64 total_bytes = btrfs_super_total_bytes(fs_info->super_copy); >> int ret = 0; >> >> - cache = btrfs_lookup_block_group(fs_info, range->start); >> + /* >> + * try to trim all FS space, our block group may start from non-zero. >> + */ >> + if (range->len == total_bytes) >> + cache = btrfs_lookup_first_block_group(fs_info, range->start); >> + else >> + cache = btrfs_lookup_block_group(fs_info, range->start); >> >> while (cache) { >> if (cache->key.objectid >= (range->start + range->len)) { >> -- >> 1.6.5.2 > > I''m sorry but that didn''t help. > I applied the patch on top of v3.3-rc2-172-g23783f8 and still fstrim > says: "0 bytes were trimmed". > > I left the printk''s I mentioned in place. They show that now "cache" > is not 0 so the "while (cache)" loop is entered but nevertheless > "btrfs_trim_block_group" is never called. So I put in more printk''s > and found that the objectid is so large that the loop is immediately > terminated (output of printk''s in "//" comments below): > > int btrfs_trim_fs(struct btrfs_root *root, struct fstrim_range *range) > { > struct btrfs_fs_info *fs_info = root->fs_info; > struct btrfs_block_group_cache *cache = NULL; > u64 group_trimmed; > u64 start; > u64 end; > u64 trimmed = 0; > u64 total_bytes = btrfs_super_total_bytes(fs_info->super_copy); > int ret = 0; > > // range.start = 0, .len = 160815300608, .minlen = 512 > > /* > * try to trim all FS space, our block group may start from non-zero. > */ > if (range->len == total_bytes) > cache = btrfs_lookup_first_block_group(fs_info, range->start); > else > cache = btrfs_lookup_block_group(fs_info, range->start); > > // cache != 0 > > while (cache) { > > // cache->key.objectid = 213702934528 > > if (cache->key.objectid >= (range->start + range->len)) { > btrfs_put_block_group(cache); > break; > } > > // This line is not reached. > > Here''s the output of "btrfs fi show /dev/sda3" again: > > failed to read /dev/sr0 > Label: none uuid: 88af7576-3027-4a3b-a5ae-34bfd167982f > Total devices 2 FS bytes used 29.05GB > devid 4 size 74.53GB used 40.06GB path /dev/sdb1 > devid 3 size 75.24GB used 40.06GB path /dev/sda3 >Hi Lutz, Great thanks for your details, now this should be a worked one. liubo, thanks, diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 77ea23c..693241a 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -7653,12 +7653,22 @@ int btrfs_trim_fs(struct btrfs_root *root, struct fstrim_range *range) u64 start; u64 end; u64 trimmed = 0; + u64 total_bytes = btrfs_super_total_bytes(fs_info->super_copy); + bool all = false; int ret = 0; - cache = btrfs_lookup_block_group(fs_info, range->start); + /* try to trim all FS space */ + if (range->len == total_bytes) + all = true; + + if (all) + /* our block group may start from non-zero */ + cache = btrfs_lookup_first_block_group(fs_info, range->start); + else + cache = btrfs_lookup_block_group(fs_info, range->start); while (cache) { - if (cache->key.objectid >= (range->start + range->len)) { + if (!all && cache->key.objectid >= (range->start + range->len)) { btrfs_put_block_group(cache); break; } -- 1.6.5.2> Kind regards, > > Lutz >-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Lutz Euler
2012-Feb-12 17:01 UTC
Re: Bulk discard doesn''t work after add/delete of devices
Hi, tl;dr: btrfs_trim_fs, IMHO, needs more thorough surgery. Thanks for providing the new patch. I think it will work in the case that "fstrim" is called without specifying a range to trim (that is, defaulting to the whole filesystem), but I didn''t test that yet, sorry. Instead, I have been thinking about what happens if a range smaller than the whole filesystem is specified. After all, the observation that in my filesystem the smallest "objectid" is already larger than the filesystem size still holds even in that case, and wanting to trim only part of the filesystem seems to be a valid wish, too. So I dug into the code myself and came to the conclusion that the way "btrfs_trim_fs" interprets the range passed from the ioctl is fundamentally broken. Instead of papering over that I''d very much prefer a more thorough fix here, which in addition might make it unnecessary to treat the "trim the complete filesystem" case specially. Please read on for the details: The current implementation of "btrfs_trim_fs" simply compares the objectid''s it finds in the chunk tree against the range passed from the ioctl, seemingly assuming that both kinds of numbers span the same range. But this is clearly not true: Even without adding and deleting of devices, in a simple mirror the objectids will span only about half the size of the filesystem. With suitable add/delete of devices I can construct a filesystem with a smallest objectid of 0 and a largest objectid much larger than the size of the filesystem (so, obviously, with large holes in the set of used objectid''s), or, as in my filesystem mentioned above, with a smallest objectid larger than the size of the filesystem. So, in the function the statement cache = btrfs_lookup_block_group(fs_info, range->start); will lead to nothing being trimmed if the smallest objectid is too large; if the if (cache->key.objectid >= (range->start + range->len)) is left in, too little will be trimmed; if it is left out, the following calculations start = max(range->start, cache->key.objectid); end = min(range->start + range->len, cache->key.objectid + cache->key.offset); if (end - start >= range->minlen) are bound to misbehave, for example start being larger than end seems possible, overflowing the subtraction and, I believe, leading to the whole filesystem being trimmed where the user only requested a part. Here is the start of the chunk tree of my filesystem for reference, showing the device sizes and the first objectid: chunk tree node 241620221952 level 1 items 2 free 119 generation 16831 owner 3 fs uuid 88af7576-3027-4a3b-a5ae-34bfd167982f chunk uuid 9a7d25e3-ebcf-4f31-bdd4-f44531a2fb1c key (DEV_ITEMS DEV_ITEM 3) block 241620226048 (58989313) gen 16831 key (FIRST_CHUNK_TREE CHUNK_ITEM 231956545536) block 241620230144 (58989314) gen 16831 leaf 241620226048 items 19 free space 1420 generation 16831 owner 3 fs uuid 88af7576-3027-4a3b-a5ae-34bfd167982f chunk uuid 9a7d25e3-ebcf-4f31-bdd4-f44531a2fb1c item 0 key (DEV_ITEMS DEV_ITEM 3) itemoff 3897 itemsize 98 dev item devid 3 total_bytes 80789987328 bytes used 44090523648 item 1 key (DEV_ITEMS DEV_ITEM 4) itemoff 3799 itemsize 98 dev item devid 4 total_bytes 80025313280 bytes used 44090523648 item 2 key (FIRST_CHUNK_TREE CHUNK_ITEM 213702934528) itemoff 3687 itemsize 112 chunk length 1073741824 owner 2 type 20 num_stripes 2 stripe 0 devid 4 offset 1074790400 stripe 1 devid 3 offset 40870346752 So, to make bulk discard of ranges useful, it seems the incoming range should be interpreted relative to the size of the filesystem and not to the allocated chunks. As AFAIK the size of the filesystem is just the sum of the sizes of its devices it might be possible to map the range onto a virtual concatenation of the devices, these perhaps ordered by devid, and then to find the free space by searching for the resulting devid(s) and device-relative offsets in the device tree? I understand that currently unallocated space is never trimmed? To nevertheless do that might be useful after a balance or at file system creation time? If there is a way to find the unallocated space for a device, this could be improved at the same time, too. Kind regards, Lutz -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 02/13/2012 01:01 AM, Lutz Euler wrote:> Hi, > > tl;dr: btrfs_trim_fs, IMHO, needs more thorough surgery. > > Thanks for providing the new patch. I think it will work in the case > that "fstrim" is called without specifying a range to trim (that is, > defaulting to the whole filesystem), but I didn''t test that yet, sorry. > > Instead, I have been thinking about what happens if a range smaller than > the whole filesystem is specified. After all, the observation that in my > filesystem the smallest "objectid" is already larger than the filesystem > size still holds even in that case, and wanting to trim only part of the > filesystem seems to be a valid wish, too. > > So I dug into the code myself and came to the conclusion that the > way "btrfs_trim_fs" interprets the range passed from the ioctl is > fundamentally broken. > > Instead of papering over that I''d very much prefer a more thorough fix > here, which in addition might make it unnecessary to treat the "trim the > complete filesystem" case specially. Please read on for the details: > > The current implementation of "btrfs_trim_fs" simply compares the > objectid''s it finds in the chunk tree against the range passed from > the ioctl, seemingly assuming that both kinds of numbers span the same > range. But this is clearly not true: Even without adding and deleting > of devices, in a simple mirror the objectids will span only about half > the size of the filesystem. With suitable add/delete of devices I can > construct a filesystem with a smallest objectid of 0 and a largest > objectid much larger than the size of the filesystem (so, obviously, > with large holes in the set of used objectid''s), or, as in my filesystem > mentioned above, with a smallest objectid larger than the size of the > filesystem. >[...]> So, to make bulk discard of ranges useful, it seems the incoming range > should be interpreted relative to the size of the filesystem and not to > the allocated chunks. As AFAIK the size of the filesystem is just the > sum of the sizes of its devices it might be possible to map the range > onto a virtual concatenation of the devices, these perhaps ordered by > devid, and then to find the free space by searching for the resulting > devid(s) and device-relative offsets in the device tree? >Actually I have no idea how to deal with this properly :( Because btrfs supports multi-devices so that we have to set the filesystem logical range to [0, (u64)-1] to get things to work well, while other filesystems''s logical range is [0, device''s total_bytes]. What''s more, in btrfs, devices can be mirrored to RAID, and the free space is also ordered by logical addr [0, (u64)-1], so IMO it is impossible to interpreted the range. I''d better pick up "treat the "trim the complete filesystem" case specially", and drop the following commit: commit f4c697e6406da5dd445eda8d923c53e1138793dd Author: Lukas Czerner <lczerner@redhat.com> Date: Mon Sep 5 16:34:54 2011 +0200 btrfs: return EINVAL if start > total_bytes in fitrim ioctl We should retirn EINVAL if the start is beyond the end of the file system in the btrfs_ioctl_fitrim(). Fix that by adding the appropriate check for it. Also in the btrfs_trim_fs() it is possible that len+start might overflow if big values are passed. Fix it by decrementing the len so that start+len is equal to the file system size in the worst case. Signed-off-by: Lukas Czerner <lczerner@redhat.com> thanks, liubo> I understand that currently unallocated space is never trimmed? > To nevertheless do that might be useful after a balance or at file > system creation time? If there is a way to find the unallocated space > for a device, this could be improved at the same time, too. > > Kind regards, > > Lutz > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Lutz Euler
2012-Feb-14 17:32 UTC
Re: Bulk discard doesn''t work after add/delete of devices
Hi, Liu Bo wrote:> Actually I have no idea how to deal with this properly :( > > Because btrfs supports multi-devices so that we have to set the > filesystem logical range to [0, (u64)-1] to get things to work well, > while other filesystems''s logical range is [0, device''s total_bytes]. > > What''s more, in btrfs, devices can be mirrored to RAID, and the free > space is also ordered by logical addr [0, (u64)-1], so IMO it is > impossible to interpreted the range.I do not concur with this reasoning, but please see below that I concur with your conclusion. I still think that the range could be interpreted like I hinted at in my mail:>> So, to make bulk discard of ranges useful, it seems the incoming range >> should be interpreted relative to the size of the filesystem and not >> to the allocated chunks. As AFAIK the size of the filesystem is just >> the sum of the sizes of its devices it might be possible to map the >> range onto a virtual concatenation of the devices, these perhaps >> ordered by devid, and then to find the free space by searching for the >> resulting devid(s) and device-relative offsets in the device tree?This would only be somewhat difficult to use if the filesystem consisted of a mixture of non trim-capable and trim-capable devices and if the idea behind ranges is that the user can expect an equal amount of trim-capable storage behind different equally sized ranges - but I don''t know if this is indeed an idea behind ranges. If, on the other hand, there is a way outside of the ioctl to find out which devices the filesystem consists of and which of these support discard, the above mentioned way to interpret ranges would extend to this setting. But maybe this would already be too ugly, essentially working around the shortcomings of an interface that is too restricted. Moreover, I don''t know why ranges smaller than the filesystem are supported by fstrim. I couldn''t find any use cases or rationale for it in fstrim''s or the ioctl''s documentation or elsewhere. This makes it difficult to find out what might be useful in the case of a multi-device filesystem ;-) So, with ranges being this unclear, I concur with your suggestion:> I''d better pick up "treat the "trim the complete filesystem" case > specially", and drop the following commit: > > commit f4c697e6406da5dd445eda8d923c53e1138793dd > Author: Lukas Czerner <lczerner@redhat.com> > Date: Mon Sep 5 16:34:54 2011 +0200 > > btrfs: return EINVAL if start > total_bytes in fitrim ioctlMore specifically, what I do wish for is: - Fix the problem I started the thread for: Make fstrim of the complete filesystem work. And then, if possible: - Simplify btrfs_trim_fs as much as possible to remove all traces of support for ranges smaller than the filesystem, document that this anyway wouldn''t do what the user might expect, and, if possible, return an error in these cases. - Also trim unallocated space (for use after balancing and at mkfs time). Thanks for your time and your work, Lutz -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Lutz Euler
2012-Feb-29 00:17 UTC
Re: Bulk discard doesn''t work after add/delete of devices
Hi, I have seen that in the meantime a patch for this issue has found its way into 3.3-rc5. Unfortunately with this kernel my filesystem still says "0 bytes were trimmed", so the bug is not fixed for me. Now that might just have been caused by the fact that the patch that went into 3.3-rc5 (as commit 2cac13e41bf5b99ffc426bd28dfd2248df1dfa67) is the one from Liu Bo''s mail <4F3386D8.6010108@cn.fujitsu.com>, which I already responded to by saying it didn''t help my issue. But the next patch Liu sent (in the mail <4F34795B.2020004@cn.fujitsu.com>) also does not help. At first I responded to that mail:> Thanks for providing the new patch. I think it will work in the case > that "fstrim" is called without specifying a range to trim (that is, > defaulting to the whole filesystem), but I didn''t test that yet, sorry.I was too optimistic. I have now tested this second patch and still get "0 bytes were trimmed". Thanks again and regards, Lutz -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Lutz Euler
2012-Apr-10 17:34 UTC
Re: Bulk discard doesn''t work after add/delete of devices
Hi, the issue I raised in this thread is still not fixed. Please, could someone look into it? A fix should be simple, especially as the issue is easily reproducable. (In case the information in this thread so far is not sufficient for that I will happily provide a recipe; just ask.) Also, I believe so far not even a regression test has been added for this issue. Here''s the state of affairs again, as I mailed previously:> I have seen that in the meantime a patch for this issue has found > its way into 3.3-rc5. Unfortunately with this kernel my filesystem still > says "0 bytes were trimmed", so the bug is not fixed for me. > > Now that might just have been caused by the fact that the patch that > went into 3.3-rc5 (as commit 2cac13e41bf5b99ffc426bd28dfd2248df1dfa67) > is the one from Liu Bo''s mail <4F3386D8.6010108@cn.fujitsu.com>, which > I already responded to by saying it didn''t help my issue. > > But the next patch Liu sent (in the mail > <4F34795B.2020004@cn.fujitsu.com>) also does not help. > At first I responded to that mail: > > > Thanks for providing the new patch. I think it will work in the case > > that "fstrim" is called without specifying a range to trim (that is, > > defaulting to the whole filesystem), but I didn''t test that yet, sorry. > > I was too optimistic. I have now tested this second patch and still get > "0 bytes were trimmed".Thanks for providing btrfs! Greetings, Lutz -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Lutz Euler
2012-Nov-14 21:10 UTC
Re: Bulk discard doesn''t work after add/delete of devices
Hi, the issue I raised in this thread is still not fixed. So, to get things going, I prepared a patch myself which I will send in a separate email. The subject will be "really fix trim 0 bytes after a device delete". I would be grateful if you could consider the patch -- with it applied I finally could trim my disks successfully. Greetings, Lutz -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Lutz Euler
2012-Nov-14 21:17 UTC
[PATCH] Btrfs: really fix trim 0 bytes after a device delete
Commit 2cac13e41bf5b99ffc426bd28dfd2248df1dfa67, "fix trim 0 bytes after a device delete", said: A user reported a bug of btrfs''s trim, that is we will trim 0 bytes after a device delete. The commit didn''t attack the root of the problem so did not fix the bug except for a special case. For block discard, btrfs_trim_fs directly compares the range passed in against the filesystem''s objectids. The former is bounded by the sum of the sizes of the devices of the filesystem, the latter is a completely unrelated set of intervals of 64-bit integers. The bug reported occurred as the smallest objectid was larger than the sum of the device sizes. The previous commit only fixes the case where the smallest objectid is nonzero and the largest objectid less than the sum of the device sizes, but it would still trim too little if the largest objectid is larger than that and nothing in the reported situation. The current mapping between the given range and the objectids is thus clearly broken, so, to fix the bug and as a first step towards a complete solution, simply ignore the range parameter''s start and length fields and always trim the whole filesystem. (While this makes it impossible to only partly trim a filesystem, due to the broken mapping this often didn''t work anyway.) Reported-by: Lutz Euler <lutz.euler@freenet.de> Signed-off-by: Lutz Euler <lutz.euler@freenet.de> --- fs/btrfs/extent-tree.c | 55 +++++++++++++++++++++-------------------------- 1 files changed, 25 insertions(+), 30 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 3d3e2c1..9c2bb59 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -8092,44 +8092,39 @@ int btrfs_trim_fs(struct btrfs_root *root, struct fstrim_range *range) u64 start; u64 end; u64 trimmed = 0; - u64 total_bytes = btrfs_super_total_bytes(fs_info->super_copy); int ret = 0; /* - * try to trim all FS space, our block group may start from non-zero. + * The range passed in is a subinterval of the interval from 0 + * to the sum of the sizes of the devices of the filesystem. + * The objectid''s used in the filesystem can span any set of + * subintervals of the interval from 0 to (u64)-1. As there is + * neither a simple nor an agreed upon mapping between these + * two ranges we ignore the range parameter''s start and len + * fields and always trim the whole filesystem (that is, only + * the free space in allocated chunks). */ - if (range->len == total_bytes) - cache = btrfs_lookup_first_block_group(fs_info, range->start); - else - cache = btrfs_lookup_block_group(fs_info, range->start); + cache = btrfs_lookup_first_block_group(fs_info, 0); while (cache) { - if (cache->key.objectid >= (range->start + range->len)) { - btrfs_put_block_group(cache); - break; - } - - start = max(range->start, cache->key.objectid); - end = min(range->start + range->len, - cache->key.objectid + cache->key.offset); + start = cache->key.objectid; + end = cache->key.objectid + cache->key.offset; - if (end - start >= range->minlen) { - if (!block_group_cache_done(cache)) { - ret = cache_block_group(cache, NULL, root, 0); - if (!ret) - wait_block_group_cache_done(cache); - } - ret = btrfs_trim_block_group(cache, - &group_trimmed, - start, - end, - range->minlen); + if (!block_group_cache_done(cache)) { + ret = cache_block_group(cache, NULL, root, 0); + if (!ret) + wait_block_group_cache_done(cache); + } + ret = btrfs_trim_block_group(cache, + &group_trimmed, + start, + end, + range->minlen); - trimmed += group_trimmed; - if (ret) { - btrfs_put_block_group(cache); - break; - } + trimmed += group_trimmed; + if (ret) { + btrfs_put_block_group(cache); + break; } cache = next_block_group(fs_info->tree_root, cache); -- 1.7.4.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html