There''s an off-by-one bug: # create a file with lots of 4K file extents # btrfs fi defrag /mnt/file # sync # filefrag -v /mnt/file Filesystem type is: 9123683e File size of /mnt/file is 1228800 (300 blocks, blocksize 4096) ext logical physical expected length flags 0 0 3372 64 1 64 3136 3435 1 2 65 3436 3136 64 3 129 3201 3499 1 4 130 3500 3201 64 5 194 3266 3563 1 6 195 3564 3266 64 7 259 3331 3627 1 8 260 3628 3331 40 eof After this patch: ... # filefrag -v /mnt/file Filesystem type is: 9123683e File size of /mnt/file is 1228800 (300 blocks, blocksize 4096) ext logical physical expected length flags 0 0 3372 300 eof /mnt/file: 1 extent found Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> --- fs/btrfs/ioctl.c | 6 ++++-- 1 files changed, 4 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 970977a..31fe6d4 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -1081,7 +1081,6 @@ int btrfs_defrag_file(struct inode *inode, struct file *file, defrag_count += ret; balance_dirty_pages_ratelimited_nr(inode->i_mapping, ret); - i += ret; if (newer_than) { if (newer_off == (u64)-1) @@ -1101,7 +1100,10 @@ int btrfs_defrag_file(struct inode *inode, struct file *file, break; } } else { - i++; + if (ret > 0) + i += ret; + else + i++; } } -- 1.7.3.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Li Zefan
2011-Sep-02 07:56 UTC
[PATCH 2/4] Btrfs: use i_size_read() in btrfs_defrag_file()
Don''t use inode->i_size directly, since we''re not holding i_mutex. This also fixes another bug, that i_size can change after it''s checked against 0 and then (i_size - 1) can be negative. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> --- fs/btrfs/ioctl.c | 7 ++++--- 1 files changed, 4 insertions(+), 3 deletions(-) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 31fe6d4..6f2b257 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -972,6 +972,7 @@ int btrfs_defrag_file(struct inode *inode, struct file *file, struct btrfs_super_block *disk_super; struct file_ra_state *ra = NULL; unsigned long last_index; + u64 isize = i_size_read(inode); u64 features; u64 last_len = 0; u64 skip = 0; @@ -997,7 +998,7 @@ int btrfs_defrag_file(struct inode *inode, struct file *file, compress_type = range->compress_type; } - if (inode->i_size == 0) + if (isize == 0) return 0; /* @@ -1022,10 +1023,10 @@ int btrfs_defrag_file(struct inode *inode, struct file *file, /* find the last page to defrag */ if (range->start + range->len > range->start) { - last_index = min_t(u64, inode->i_size - 1, + last_index = min_t(u64, isize - 1, range->start + range->len - 1) >> PAGE_CACHE_SHIFT; } else { - last_index = (inode->i_size - 1) >> PAGE_CACHE_SHIFT; + last_index = (isize - 1) >> PAGE_CACHE_SHIFT; } if (newer_than) { -- 1.7.3.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Li Zefan
2011-Sep-02 07:56 UTC
[PATCH 3/4] Btrfs: fix wrong max_to_defrag in btrfs_defrag_file()
It''s off-by-one, and thus we may skip the last page while defragmenting. An example case: # create /mnt/file with 2 4K file extents # btrfs fi defrag /mnt/file # sync # filefrag /mnt/file /mnt/file: 2 extents found So it''s not defragmented. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> --- fs/btrfs/ioctl.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 6f2b257..57aa5b7 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -1046,7 +1046,7 @@ int btrfs_defrag_file(struct inode *inode, struct file *file, i = range->start >> PAGE_CACHE_SHIFT; } if (!max_to_defrag) - max_to_defrag = last_index - 1; + max_to_defrag = last_index; while (i <= last_index && defrag_count < max_to_defrag) { /* -- 1.7.3.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Li Zefan
2011-Sep-02 07:57 UTC
[PATCH 4/4] Btrfs: honor extent thresh during defragmentation
We won''t defrag an extent, if it''s bigger than the threshold we specified and there''s no small extent before it, but actually the code doesn''t work this way. There are three bugs: - When should_defrag_range() decides we should keep on defragmenting an extent, last_len is not incremented. (old bug) - The length that passes to should_defrag_range() is not the length we''re going to defrag. (new bug) - We always defrag 256K bytes data, and a big extent can be part of this range. (new bug) For a file with 4 extents: | 4K | 4K | 256K | 256K | The result of defrag with (the default) 256K extent thresh should be: | 264K | 256K | but with those bugs, we''ll get: | 520K | Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> --- fs/btrfs/ioctl.c | 37 ++++++++++++++++++++++++++----------- 1 files changed, 26 insertions(+), 11 deletions(-) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 57aa5b7..90d7904 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -760,7 +760,7 @@ static int should_defrag_range(struct inode *inode, u64 start, u64 len, int ret = 1; /* - * make sure that once we start defragging and extent, we keep on + * make sure that once we start defragging an extent, we keep on * defragging it */ if (start < *defrag_end) @@ -805,7 +805,6 @@ static int should_defrag_range(struct inode *inode, u64 start, u64 len, * extent will force at least part of that big extent to be defragged. */ if (ret) { - *last_len += len; *defrag_end = extent_map_end(em); } else { *last_len = 0; @@ -978,13 +977,14 @@ int btrfs_defrag_file(struct inode *inode, struct file *file, u64 skip = 0; u64 defrag_end = 0; u64 newer_off = range->start; - int newer_left = 0; unsigned long i; + unsigned long ra_index = 0; int ret; int defrag_count = 0; int compress_type = BTRFS_COMPRESS_ZLIB; int extent_thresh = range->extent_thresh; - int newer_cluster = (256 * 1024) >> PAGE_CACHE_SHIFT; + int max_cluster = (256 * 1024) >> PAGE_CACHE_SHIFT; + int cluster = max_cluster; u64 new_align = ~((u64)128 * 1024 - 1); struct page **pages = NULL; @@ -1014,7 +1014,7 @@ int btrfs_defrag_file(struct inode *inode, struct file *file, ra = &file->f_ra; } - pages = kmalloc(sizeof(struct page *) * newer_cluster, + pages = kmalloc(sizeof(struct page *) * max_cluster, GFP_NOFS); if (!pages) { ret = -ENOMEM; @@ -1039,7 +1039,6 @@ int btrfs_defrag_file(struct inode *inode, struct file *file, * the extents in the file evenly spaced */ i = (newer_off & new_align) >> PAGE_CACHE_SHIFT; - newer_left = newer_cluster; } else goto out_ra; } else { @@ -1071,12 +1070,26 @@ int btrfs_defrag_file(struct inode *inode, struct file *file, i = max(i + 1, next); continue; } + + if (!newer_than) { + cluster = (PAGE_CACHE_ALIGN(defrag_end) >> + PAGE_CACHE_SHIFT) - i; + cluster = min(cluster, max_cluster); + } else { + cluster = max_cluster; + } + if (range->flags & BTRFS_DEFRAG_RANGE_COMPRESS) BTRFS_I(inode)->force_compress = compress_type; - btrfs_force_ra(inode->i_mapping, ra, file, i, newer_cluster); + if (i + cluster > ra_index) { + ra_index = max(i, ra_index); + btrfs_force_ra(inode->i_mapping, ra, file, ra_index, + cluster); + ra_index += max_cluster; + } - ret = cluster_pages_for_defrag(inode, pages, i, newer_cluster); + ret = cluster_pages_for_defrag(inode, pages, i, cluster); if (ret < 0) goto out_ra; @@ -1096,15 +1109,17 @@ int btrfs_defrag_file(struct inode *inode, struct file *file, if (!ret) { range->start = newer_off; i = (newer_off & new_align) >> PAGE_CACHE_SHIFT; - newer_left = newer_cluster; } else { break; } } else { - if (ret > 0) + if (ret > 0) { i += ret; - else + last_len += ret << PAGE_CACHE_SHIFT; + } else { i++; + last_len = 0; + } } } -- 1.7.3.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Christoph Hellwig
2011-Sep-02 08:42 UTC
Re: [PATCH 1/4] Btrfs: fix defragmentation regression
On Fri, Sep 02, 2011 at 03:56:25PM +0800, Li Zefan wrote:> There''s an off-by-one bug: > > # create a file with lots of 4K file extents > # btrfs fi defrag /mnt/file > # sync > # filefrag -v /mnt/file > Filesystem type is: 9123683e > File size of /mnt/file is 1228800 (300 blocks, blocksize 4096) > ext logical physical expected length flags > 0 0 3372 64 > 1 64 3136 3435 1 > 2 65 3436 3136 64 > 3 129 3201 3499 1 > 4 130 3500 3201 64 > 5 194 3266 3563 1 > 6 195 3564 3266 64 > 7 259 3331 3627 1 > 8 260 3628 3331 40 eof > > After this patch:Can you please create an xfstests testcase for this? -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
David Sterba
2011-Sep-22 10:42 UTC
Re: [PATCH 3/4] Btrfs: fix wrong max_to_defrag in btrfs_defrag_file()
On Fri, Sep 02, 2011 at 03:56:55PM +0800, Li Zefan wrote:> It''s off-by-one, and thus we may skip the last page while defragmenting.Good catch.> > An example case: > > # create /mnt/file with 2 4K file extents > # btrfs fi defrag /mnt/file > # sync > # filefrag /mnt/file > /mnt/file: 2 extents found > > So it''s not defragmented. > > Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>Reviewed-by: David Sterba <dsterba@suse.cz>> --- > fs/btrfs/ioctl.c | 2 +- > 1 files changed, 1 insertions(+), 1 deletions(-) > > diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c > index 6f2b257..57aa5b7 100644 > --- a/fs/btrfs/ioctl.c > +++ b/fs/btrfs/ioctl.c > @@ -1046,7 +1046,7 @@ int btrfs_defrag_file(struct inode *inode, struct file *file, > i = range->start >> PAGE_CACHE_SHIFT; > } > if (!max_to_defrag) > - max_to_defrag = last_index - 1; > + max_to_defrag = last_index; > > while (i <= last_index && defrag_count < max_to_defrag) { > /* > -- > 1.7.3.1 > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
David Sterba
2011-Sep-22 10:58 UTC
Re: [PATCH 2/4] Btrfs: use i_size_read() in btrfs_defrag_file()
On Fri, Sep 02, 2011 at 03:56:39PM +0800, Li Zefan wrote:> Don''t use inode->i_size directly, since we''re not holding i_mutex. > > This also fixes another bug, that i_size can change after it''s checked > against 0 and then (i_size - 1) can be negative. > > Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>Reviewed-by: David Sterba <dsterba@suse.cz>> --- > fs/btrfs/ioctl.c | 7 ++++--- > 1 files changed, 4 insertions(+), 3 deletions(-) > > diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c > index 31fe6d4..6f2b257 100644 > --- a/fs/btrfs/ioctl.c > +++ b/fs/btrfs/ioctl.c > @@ -972,6 +972,7 @@ int btrfs_defrag_file(struct inode *inode, struct file *file, > struct btrfs_super_block *disk_super; > struct file_ra_state *ra = NULL; > unsigned long last_index; > + u64 isize = i_size_read(inode); > u64 features; > u64 last_len = 0; > u64 skip = 0; > @@ -997,7 +998,7 @@ int btrfs_defrag_file(struct inode *inode, struct file *file, > compress_type = range->compress_type; > } > > - if (inode->i_size == 0) > + if (isize == 0) > return 0; > > /* > @@ -1022,10 +1023,10 @@ int btrfs_defrag_file(struct inode *inode, struct file *file, > > /* find the last page to defrag */ > if (range->start + range->len > range->start) { > - last_index = min_t(u64, inode->i_size - 1, > + last_index = min_t(u64, isize - 1, > range->start + range->len - 1) >> PAGE_CACHE_SHIFT; > } else { > - last_index = (inode->i_size - 1) >> PAGE_CACHE_SHIFT; > + last_index = (isize - 1) >> PAGE_CACHE_SHIFT; > } > > if (newer_than) { > -- > 1.7.3.1 > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Sep 2, 2011 at 4:42 AM, Christoph Hellwig <hch@infradead.org> wrote:> On Fri, Sep 02, 2011 at 03:56:25PM +0800, Li Zefan wrote: >> There''s an off-by-one bug: >> >> # create a file with lots of 4K file extents >> # btrfs fi defrag /mnt/file >> # sync >> # filefrag -v /mnt/file >> Filesystem type is: 9123683e >> File size of /mnt/file is 1228800 (300 blocks, blocksize 4096) >> ext logical physical expected length flags >> 0 0 3372 64 >> 1 64 3136 3435 1 >> 2 65 3436 3136 64 >> 3 129 3201 3499 1 >> 4 130 3500 3201 64 >> 5 194 3266 3563 1 >> 6 195 3564 3266 64 >> 7 259 3331 3627 1 >> 8 260 3628 3331 40 eof >> >> After this patch: > > Can you please create an xfstests testcase for this?Did this fix get lost? I don''t see it in git, and defragmenting a file still results in 10x as many fragments as it started with. (3.1-rc9) -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
16:48, Dan Merillat wrote:> On Fri, Sep 2, 2011 at 4:42 AM, Christoph Hellwig <hch@infradead.org> wrote: >> On Fri, Sep 02, 2011 at 03:56:25PM +0800, Li Zefan wrote: >>> There''s an off-by-one bug: >>> >>> # create a file with lots of 4K file extents >>> # btrfs fi defrag /mnt/file >>> # sync >>> # filefrag -v /mnt/file >>> Filesystem type is: 9123683e >>> File size of /mnt/file is 1228800 (300 blocks, blocksize 4096) >>> ext logical physical expected length flags >>> 0 0 3372 64 >>> 1 64 3136 3435 1 >>> 2 65 3436 3136 64 >>> 3 129 3201 3499 1 >>> 4 130 3500 3201 64 >>> 5 194 3266 3563 1 >>> 6 195 3564 3266 64 >>> 7 259 3331 3627 1 >>> 8 260 3628 3331 40 eof >>> >>> After this patch: >> >> Can you please create an xfstests testcase for this? > > Did this fix get lost? I don''t see it in git, and defragmenting a > file still results in 10x as many fragments as it started with. > (3.1-rc9) >No, it''s queued for 3.2, but I think it''s a good candidate for 3.1.x. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html