Miao Xie
2013-Nov-21 13:43 UTC
[PATCH 1/5] Btrfs: wake up the tasks that wait for the io earlier
The tasks that wait for the IO_DONE flag just care about the io of the dirty pages, so it is better to wake up them immediately after all the pages are written, not the whole process of the io completes. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> --- fs/btrfs/ordered-data.c | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c index eb5bac4..1bd7002 100644 --- a/fs/btrfs/ordered-data.c +++ b/fs/btrfs/ordered-data.c @@ -348,10 +348,13 @@ int btrfs_dec_test_first_ordered_pending(struct inode *inode, if (!uptodate) set_bit(BTRFS_ORDERED_IOERR, &entry->flags); - if (entry->bytes_left == 0) + if (entry->bytes_left == 0) { ret = test_and_set_bit(BTRFS_ORDERED_IO_DONE, &entry->flags); - else + if (waitqueue_active(&entry->wait)) + wake_up(&entry->wait); + } else { ret = 1; + } out: if (!ret && cached && entry) { *cached = entry; @@ -408,10 +411,13 @@ have_entry: if (!uptodate) set_bit(BTRFS_ORDERED_IOERR, &entry->flags); - if (entry->bytes_left == 0) + if (entry->bytes_left == 0) { ret = test_and_set_bit(BTRFS_ORDERED_IO_DONE, &entry->flags); - else + if (waitqueue_active(&entry->wait)) + wake_up(&entry->wait); + } else { ret = 1; + } out: if (!ret && cached && entry) { *cached = entry; -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Miao Xie
2013-Nov-21 13:43 UTC
[PATCH 2/5] Btrfs: just do diry page flush for the inode with compression before direct IO
As the comment in the btrfs_direct_IO says, only the compressed pages need be flush again to make sure they are on the disk, but the common pages needn''t, so we add a if statement to check if the inode has compressed pages or not, if no, skip the flush. And in order to prevent the write ranges from intersecting, we need wait for the running ordered extents. But the current code waits for them twice, one is done before the direct IO starts (in btrfs_wait_ordered_range()), the other is before we get the blocks, it is unnecessary. because we can do the direct IO without holding i_mutex, it means that the intersected ordered extents may happen during the direct IO, the first wait can not avoid this problem. So we use filemap_fdatawrite_range() instead of btrfs_wait_ordered_range() to remove the first wait. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> --- fs/btrfs/inode.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index da8d2f6..a407242 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -7229,15 +7229,15 @@ static ssize_t btrfs_direct_IO(int rw, struct kiocb *iocb, smp_mb__after_atomic_inc(); /* - * The generic stuff only does filemap_write_and_wait_range, which isn''t - * enough if we''ve written compressed pages to this area, so we need to - * call btrfs_wait_ordered_range to make absolutely sure that any - * outstanding dirty pages are on disk. + * The generic stuff only does filemap_write_and_wait_range, which + * isn''t enough if we''ve written compressed pages to this area, so + * we need to flush the dirty pages again to make absolutely sure + * that any outstanding dirty pages are on disk. */ count = iov_length(iov, nr_segs); - ret = btrfs_wait_ordered_range(inode, offset, count); - if (ret) - return ret; + if (test_bit(BTRFS_INODE_HAS_ASYNC_EXTENT, + &BTRFS_I(inode)->runtime_flags)) + filemap_fdatawrite_range(inode->i_mapping, offset, count); if (rw & WRITE) { /* -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Miao Xie
2013-Nov-21 13:43 UTC
[PATCH 3/5] Btrfs: remove the unnecessary flush when preparing the pages
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> --- fs/btrfs/file.c | 11 ++++------- 1 file changed, 4 insertions(+), 7 deletions(-) diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 82d0342..27f65e4 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -1286,12 +1286,11 @@ again: struct btrfs_ordered_extent *ordered; lock_extent_bits(&BTRFS_I(inode)->io_tree, start_pos, last_pos - 1, 0, &cached_state); - ordered = btrfs_lookup_first_ordered_extent(inode, - last_pos - 1); + ordered = btrfs_lookup_ordered_range(inode, start_pos, + last_pos - start_pos); if (ordered && ordered->file_offset + ordered->len > start_pos && ordered->file_offset < last_pos) { - btrfs_put_ordered_extent(ordered); unlock_extent_cached(&BTRFS_I(inode)->io_tree, start_pos, last_pos - 1, &cached_state, GFP_NOFS); @@ -1299,10 +1298,8 @@ again: unlock_page(pages[i]); page_cache_release(pages[i]); } - err = btrfs_wait_ordered_range(inode, start_pos, - last_pos - start_pos); - if (err) - goto fail; + btrfs_start_ordered_extent(inode, ordered, 1); + btrfs_put_ordered_extent(ordered); goto again; } if (ordered) -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Miao Xie
2013-Nov-21 13:43 UTC
[PATCH 4/5] Btrfs: remove unnecessary lock in may_commit_transaction()
The reason is: - The per-cpu counter has its own lock to protect itself. - Here we need get a exact value. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> --- fs/btrfs/extent-tree.c | 9 +-------- 1 file changed, 1 insertion(+), 8 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 45d98d0..12a5b6d 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -4142,13 +4142,9 @@ static int may_commit_transaction(struct btrfs_root *root, goto commit; /* See if there is enough pinned space to make this reservation */ - spin_lock(&space_info->lock); if (percpu_counter_compare(&space_info->total_bytes_pinned, - bytes) >= 0) { - spin_unlock(&space_info->lock); + bytes) >= 0) goto commit; - } - spin_unlock(&space_info->lock); /* * See if there is some space in the delayed insertion reservation for @@ -4157,16 +4153,13 @@ static int may_commit_transaction(struct btrfs_root *root, if (space_info != delayed_rsv->space_info) return -ENOSPC; - spin_lock(&space_info->lock); spin_lock(&delayed_rsv->lock); if (percpu_counter_compare(&space_info->total_bytes_pinned, bytes - delayed_rsv->size) >= 0) { spin_unlock(&delayed_rsv->lock); - spin_unlock(&space_info->lock); return -ENOSPC; } spin_unlock(&delayed_rsv->lock); - spin_unlock(&space_info->lock); commit: trans = btrfs_join_transaction(root); -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Miao Xie
2013-Nov-21 13:43 UTC
[PATCH 5/5] Btrfs: reclaim the reserved metadata space at background
Before applying this patch, the task had to reclaim the metadata space by itself if the metadata space was not enough. And When the task started the space reclamation, all the other tasks which wanted to reserve the metadata space were blocked. At some cases, they would be blocked for a long time, it made the performance fluctuate wildly. So we introduce the background metadata space reclamation, when the space is about to be exhausted, we insert a reclaim work into the workqueue, the worker of the workqueue helps us to reclaim the reserved space at the background. By this way, the tasks needn''t reclaim the space by themselves at most cases, and even if the tasks have to reclaim the space or are blocked for the space reclamation, they will get enough space more quickly. Here is my test result(Tested by sysbench for 3 times): Memory: 2GB CPU: 2Cores * 1CPU Partition: 20GB(SSD) w/o w seqwr-512KB-1Thread-2GB 180.08MB/s 178.00MB/s seqwr-512KB-8Threads-2GB 179.14MB/s 179.25MB/s seqwr-128KB-1Thread-2GB 186.3MB/s 186.57MB/s seqwr-128KB-8Threads-2GB 176.05MB/s 183.05MB/s rndwr-512KB-1Thread-2GB 114.20MB/s 117.42MB/s rndwr-512KB-8Threads-2GB 129.18MB/s 132.51MB/s rndwr-128KB-1Thread-2GB 963.94MB/s 1.4058GB/s rndwr-128KB-8Threads-2GB 1.1950GB/s 1.3787GB/s The above numbers are the averages. rndwr: random write test seqwr: sequential write test 512KB,128KB: the size of each request 1Thread, 8Threads: the number of the test threads 2GB: The total file size For the last two cases, the ramdom write test just executed 10000 requests every time we ran the test, so the most data was cached in the page cache, it is why the random test was faster than the sequential write test. And besides that, according to our test result, the random write performance on the kernel without the patch was not stable, it fluctuated between 840MB/s and 1.25GB/s. But after applying the patch, the performance of the random write was stable. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> --- fs/btrfs/ctree.h | 6 +++ fs/btrfs/disk-io.c | 3 ++ fs/btrfs/extent-tree.c | 99 +++++++++++++++++++++++++++++++++++++++++++++++++- fs/btrfs/super.c | 1 + 4 files changed, 108 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index f9aeb27..e0d2a3b 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -33,6 +33,7 @@ #include <asm/kmap_types.h> #include <linux/pagemap.h> #include <linux/btrfs.h> +#include <linux/workqueue.h> #include "extent_io.h" #include "extent_map.h" #include "async-thread.h" @@ -1289,6 +1290,8 @@ struct btrfs_stripe_hash_table { #define BTRFS_STRIPE_HASH_TABLE_BITS 11 +void btrfs_init_async_reclaim_work(struct work_struct *work); + /* fs_info */ struct reloc_control; struct btrfs_device; @@ -1655,6 +1658,9 @@ struct btrfs_fs_info { struct semaphore uuid_tree_rescan_sem; unsigned int update_uuid_tree_gen:1; + + /* Used to reclaim the metadata space in the background. */ + struct work_struct async_reclaim_work; }; /* diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 4c4ed0b..2b49923 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2234,6 +2234,7 @@ int open_ctree(struct super_block *sb, atomic_set(&fs_info->balance_cancel_req, 0); fs_info->balance_ctl = NULL; init_waitqueue_head(&fs_info->balance_wait_q); + btrfs_init_async_reclaim_work(&fs_info->async_reclaim_work); sb->s_blocksize = 4096; sb->s_blocksize_bits = blksize_bits(4096); @@ -3579,6 +3580,8 @@ int close_ctree(struct btrfs_root *root) /* clear out the rbtree of defraggable inodes */ btrfs_cleanup_defrag_inodes(fs_info); + cancel_work_sync(&fs_info->async_reclaim_work); + if (!(fs_info->sb->s_flags & MS_RDONLY)) { ret = btrfs_commit_super(root); if (ret) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 12a5b6d..c122dc7 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -4230,6 +4230,98 @@ static int flush_space(struct btrfs_root *root, return ret; } + +static inline u64 +btrfs_calc_reclaim_metadata_size(struct btrfs_root *root, + struct btrfs_space_info *space_info) +{ + u64 used; + u64 expected; + u64 to_reclaim; + + to_reclaim = min_t(u64, num_online_cpus() * 1024 * 1024, + 32 * 1024 * 1024); + spin_lock(&space_info->lock); + if (can_overcommit(root, space_info, to_reclaim, + BTRFS_RESERVE_FLUSH_ALL)) { + to_reclaim = 0; + goto out; + } + + used = space_info->bytes_used + space_info->bytes_reserved + + space_info->bytes_pinned + space_info->bytes_readonly + + space_info->bytes_may_use; + if (can_overcommit(root, space_info, 1024 * 1024, + BTRFS_RESERVE_FLUSH_ALL)) + expected = div_factor_fine(space_info->total_bytes, 95); + else + expected = div_factor_fine(space_info->total_bytes, 90); + to_reclaim = used - expected; +out: + spin_unlock(&space_info->lock); + + return to_reclaim; +} + +static inline int need_do_async_reclaim(struct btrfs_space_info *space_info, + struct btrfs_fs_info *fs_info, u64 used) +{ + return (used >= div_factor_fine(space_info->total_bytes, 95) && + !btrfs_fs_closing(fs_info) && + !test_bit(BTRFS_FS_STATE_REMOUNTING, &fs_info->fs_state)); +} + +static int btrfs_need_do_async_reclaim(struct btrfs_space_info *space_info, + struct btrfs_fs_info *fs_info) +{ + u64 used; + + spin_lock(&space_info->lock); + used = space_info->bytes_used + space_info->bytes_reserved + + space_info->bytes_pinned + space_info->bytes_readonly + + space_info->bytes_may_use; + if (need_do_async_reclaim(space_info, fs_info, used)) { + spin_unlock(&space_info->lock); + return 1; + } + spin_unlock(&space_info->lock); + + return 0; +} + +static void btrfs_async_reclaim_metadata_space(struct work_struct *work) +{ + struct btrfs_fs_info *fs_info; + struct btrfs_space_info *space_info; + u64 to_reclaim; + int flush_state; + + fs_info = container_of(work, struct btrfs_fs_info, async_reclaim_work); + space_info = __find_space_info(fs_info, BTRFS_BLOCK_GROUP_METADATA); + + to_reclaim = btrfs_calc_reclaim_metadata_size(fs_info->fs_root, + space_info); + if (!to_reclaim) + return; + + flush_state = FLUSH_DELAYED_ITEMS_NR; + do { + flush_space(fs_info->fs_root, space_info, to_reclaim, + to_reclaim, flush_state); + flush_state++; + if (!btrfs_need_do_async_reclaim(space_info, fs_info)) + return; + } while (flush_state <= COMMIT_TRANS); + + if (btrfs_need_do_async_reclaim(space_info, fs_info)) + queue_work(system_unbound_wq, work); +} + +void btrfs_init_async_reclaim_work(struct work_struct *work) +{ + INIT_WORK(work, btrfs_async_reclaim_metadata_space); +} + /** * reserve_metadata_bytes - try to reserve bytes from the block_rsv''s space * @root - the root we''re allocating for @@ -4337,8 +4429,13 @@ again: if (ret && flush != BTRFS_RESERVE_NO_FLUSH) { flushing = true; space_info->flush = 1; + } else if (!ret && space_info->flags & BTRFS_BLOCK_GROUP_METADATA) { + used += orig_bytes; + if (need_do_async_reclaim(space_info, root->fs_info, used) && + !work_busy(&root->fs_info->async_reclaim_work)) + queue_work(system_unbound_wq, + &root->fs_info->async_reclaim_work); } - spin_unlock(&space_info->lock); if (!ret || flush == BTRFS_RESERVE_NO_FLUSH) diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index 2d8ac1b..fb62c45 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -1329,6 +1329,7 @@ static int btrfs_remount(struct super_block *sb, int *flags, char *data) * this also happens on ''umount -rf'' or on shutdown, when * the filesystem is busy. */ + cancel_work_sync(&fs_info->async_reclaim_work); /* wait for the uuid_scan task to finish */ down(&fs_info->uuid_tree_rescan_sem); -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Liu Bo
2013-Nov-22 04:30 UTC
Re: [PATCH 1/5] Btrfs: wake up the tasks that wait for the io earlier
On Thu, Nov 21, 2013 at 09:43:14PM +0800, Miao Xie wrote:> The tasks that wait for the IO_DONE flag just care about the io of the dirty > pages, so it is better to wake up them immediately after all the pages are > written, not the whole process of the io completes.This doesn''t seem to make sense, the waiters still go to wait and schedule since IO_DONE is not set there yet. -liubo> > Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> > --- > fs/btrfs/ordered-data.c | 14 ++++++++++---- > 1 file changed, 10 insertions(+), 4 deletions(-) > > diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c > index eb5bac4..1bd7002 100644 > --- a/fs/btrfs/ordered-data.c > +++ b/fs/btrfs/ordered-data.c > @@ -348,10 +348,13 @@ int btrfs_dec_test_first_ordered_pending(struct inode *inode, > if (!uptodate) > set_bit(BTRFS_ORDERED_IOERR, &entry->flags); > > - if (entry->bytes_left == 0) > + if (entry->bytes_left == 0) { > ret = test_and_set_bit(BTRFS_ORDERED_IO_DONE, &entry->flags); > - else > + if (waitqueue_active(&entry->wait)) > + wake_up(&entry->wait); > + } else { > ret = 1; > + } > out: > if (!ret && cached && entry) { > *cached = entry; > @@ -408,10 +411,13 @@ have_entry: > if (!uptodate) > set_bit(BTRFS_ORDERED_IOERR, &entry->flags); > > - if (entry->bytes_left == 0) > + if (entry->bytes_left == 0) { > ret = test_and_set_bit(BTRFS_ORDERED_IO_DONE, &entry->flags); > - else > + if (waitqueue_active(&entry->wait)) > + wake_up(&entry->wait); > + } else { > ret = 1; > + } > out: > if (!ret && cached && entry) { > *cached = entry; > -- > 1.8.3.1 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Miao Xie
2013-Nov-22 07:28 UTC
Re: [PATCH 1/5] Btrfs: wake up the tasks that wait for the io earlier
On Fri, 22 Nov 2013 12:30:40 +0800, Liu Bo wrote:> On Thu, Nov 21, 2013 at 09:43:14PM +0800, Miao Xie wrote: >> The tasks that wait for the IO_DONE flag just care about the io of the dirty >> pages, so it is better to wake up them immediately after all the pages are >> written, not the whole process of the io completes. > > This doesn''t seem to make sense, the waiters still go to wait and schedule since > IO_DONE is not set there yet.I can not understand what you said. We wake up the waiters after IO_DONE is set, the waiters who wait for IO_DONE flag will not go to wait. Miao> > -liubo > >> >> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> >> --- >> fs/btrfs/ordered-data.c | 14 ++++++++++---- >> 1 file changed, 10 insertions(+), 4 deletions(-) >> >> diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c >> index eb5bac4..1bd7002 100644 >> --- a/fs/btrfs/ordered-data.c >> +++ b/fs/btrfs/ordered-data.c >> @@ -348,10 +348,13 @@ int btrfs_dec_test_first_ordered_pending(struct inode *inode, >> if (!uptodate) >> set_bit(BTRFS_ORDERED_IOERR, &entry->flags); >> >> - if (entry->bytes_left == 0) >> + if (entry->bytes_left == 0) { >> ret = test_and_set_bit(BTRFS_ORDERED_IO_DONE, &entry->flags); >> - else >> + if (waitqueue_active(&entry->wait)) >> + wake_up(&entry->wait); >> + } else { >> ret = 1; >> + } >> out: >> if (!ret && cached && entry) { >> *cached = entry; >> @@ -408,10 +411,13 @@ have_entry: >> if (!uptodate) >> set_bit(BTRFS_ORDERED_IOERR, &entry->flags); >> >> - if (entry->bytes_left == 0) >> + if (entry->bytes_left == 0) { >> ret = test_and_set_bit(BTRFS_ORDERED_IO_DONE, &entry->flags); >> - else >> + if (waitqueue_active(&entry->wait)) >> + wake_up(&entry->wait); >> + } else { >> ret = 1; >> + } >> out: >> if (!ret && cached && entry) { >> *cached = entry; >> -- >> 1.8.3.1 >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Liu Bo
2013-Nov-22 08:47 UTC
Re: [PATCH 1/5] Btrfs: wake up the tasks that wait for the io earlier
On Fri, Nov 22, 2013 at 03:28:32PM +0800, Miao Xie wrote:> On Fri, 22 Nov 2013 12:30:40 +0800, Liu Bo wrote: > > On Thu, Nov 21, 2013 at 09:43:14PM +0800, Miao Xie wrote: > >> The tasks that wait for the IO_DONE flag just care about the io of the dirty > >> pages, so it is better to wake up them immediately after all the pages are > >> written, not the whole process of the io completes. > > > > This doesn''t seem to make sense, the waiters still go to wait and schedule since > > IO_DONE is not set there yet. > > I can not understand what you said. We wake up the waiters after IO_DONE is set, > the waiters who wait for IO_DONE flag will not go to wait. > > Miao > > > > > -liubo > > > >> > >> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> > >> --- > >> fs/btrfs/ordered-data.c | 14 ++++++++++---- > >> 1 file changed, 10 insertions(+), 4 deletions(-) > >> > >> diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c > >> index eb5bac4..1bd7002 100644 > >> --- a/fs/btrfs/ordered-data.c > >> +++ b/fs/btrfs/ordered-data.c > >> @@ -348,10 +348,13 @@ int btrfs_dec_test_first_ordered_pending(struct inode *inode, > >> if (!uptodate) > >> set_bit(BTRFS_ORDERED_IOERR, &entry->flags); > >> > >> - if (entry->bytes_left == 0) > >> + if (entry->bytes_left == 0) { > >> ret = test_and_set_bit(BTRFS_ORDERED_IO_DONE, &entry->flags); > >> - elseMy bad, something got in my eye, I was thinking ''else'' is keeped. Reviewed-by: Liu Bo <bo.li.liu@oracle.com> thanks, -liubo> >> + if (waitqueue_active(&entry->wait)) > >> + wake_up(&entry->wait); > >> + } else { > >> ret = 1; > >> + } > >> out: > >> if (!ret && cached && entry) { > >> *cached = entry; > >> @@ -408,10 +411,13 @@ have_entry: > >> if (!uptodate) > >> set_bit(BTRFS_ORDERED_IOERR, &entry->flags); > >> > >> - if (entry->bytes_left == 0) > >> + if (entry->bytes_left == 0) { > >> ret = test_and_set_bit(BTRFS_ORDERED_IO_DONE, &entry->flags); > >> - else > >> + if (waitqueue_active(&entry->wait)) > >> + wake_up(&entry->wait); > >> + } else { > >> ret = 1; > >> + } > >> out: > >> if (!ret && cached && entry) { > >> *cached = entry; > >> -- > >> 1.8.3.1 > >> > >> -- > >> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > >> the body of a message to majordomo@vger.kernel.org > >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > >-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Josef Bacik
2013-Nov-22 17:48 UTC
Re: [PATCH 5/5] Btrfs: reclaim the reserved metadata space at background
On Thu, Nov 21, 2013 at 09:43:18PM +0800, Miao Xie wrote:> Before applying this patch, the task had to reclaim the metadata space > by itself if the metadata space was not enough. And When the task started > the space reclamation, all the other tasks which wanted to reserve the > metadata space were blocked. At some cases, they would be blocked for > a long time, it made the performance fluctuate wildly. >So the reason the flushing is done this way is because of this level of hell called "early enospc." Basically we''d get people flushing randomly and other users would come in and use the reclaimed space, so whoever was flushing would often ENOSPC because they thought they did everything they could to flush and still couldn''t make allocations. This approach is a nice balance keeping the old "one at a time" flushers and adding a background flusher, but I still worry about people competing with the background flushing. Consider the case where the background flusher has started and taken all of the ordered extents on the system to flush (and lets assume that we only have reservations tied up in ordered extents, which is very possible). Then a task comes in to make a reservation but it can''t because it doesn''t have space, so it tries to flush. But the inline flushing stuff doesn''t find any ordered extents to flush because they''ve been spliced off the list by the background flusher. So we bail out and do -ENOSPC even though there is plenty of space. What I would like to see is some way for a flusher who has to flush inline be able to see that there is a background flusher and wait for it to finish its work before doing its own flushing. If I have to start tracking down early ENOSPC problems again I may very well quit doing file system work forever. Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html