Miao Xie
2013-Nov-04 15:13 UTC
[PATCH 0/7] Btrfs: improve the latency of the space reservation
This patchset improve the latency of the space reservation when doing the delalloc inode flush. As we know, the current code would wait for the completion all the ordered extents in the filesystem. It was unnecessary, we can finish the wait and try to do reservation once again earlier if we get enough free space. It can reduce the wait time and make the perfermance up. We did the buffered write test by the sysbench on my box, the following is the result. Memory: 2GB CPU: 2Cores * 1CPU Partition: 20GB(SSD) w/ w/o rndwr-512KB-1Thread-2GB 120.38MB/s 110.08MB/s rndwr-512KB-8Threads-2GB 119.43MB/s 110.96MB/s seqwr-512KB-1Thread-2GB 99.676MB/s 98.53MB/s seqwr-512KB-8Threads-2GB 111.79MB/s 99.176MB/s rndwr-128KB-1Thread-2GB 139.23MB/s 95.864MB/s rndwr-128KB-8Threads-2GB 126.16MB/s 96.686MB/s seqwr-128KB-1Thread-2GB 100.24MB/s 100.95.MB/s seqwr-128KB-8Threads-2GB 126.51MB/s 100.26MB/s rndwr: random write test seqwr: sequential write test 512KB,128KB: the size of each request 1Thread, 8Threads: the number of the test threads 2GB: The total file size Miao Xie (7): Btrfs: remove unnecessary initialization and memory barrior in shrink_delalloc() Btrfs: wait for the ordered extent only when we want Btrfs: pick up the code for the item number calculation in flush_space() Btrfs: fix the confusion between delalloc bytes and metadata bytes Btrfs: don''t wait for all the async delalloc when shrinking delalloc Btrfs: don''t wait for the completion of all the ordered extents Btrfs: rename btrfs_start_all_delalloc_inodes fs/btrfs/ctree.h | 3 +-- fs/btrfs/dev-replace.c | 6 ++--- fs/btrfs/extent-tree.c | 62 ++++++++++++++++++++++++++++++++++--------------- fs/btrfs/inode.c | 3 +-- fs/btrfs/ioctl.c | 2 +- fs/btrfs/ordered-data.c | 22 ++++++++++++++---- fs/btrfs/ordered-data.h | 4 ++-- fs/btrfs/relocation.c | 4 ++-- fs/btrfs/super.c | 2 +- fs/btrfs/transaction.c | 4 ++-- 10 files changed, 73 insertions(+), 39 deletions(-) -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Miao Xie
2013-Nov-04 15:13 UTC
[PATCH 1/7] Btrfs: remove unnecessary initialization and memory barrior in shrink_delalloc()
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> --- fs/btrfs/extent-tree.c | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 1d238ed..abe65ed 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -4016,15 +4016,14 @@ static void shrink_delalloc(struct btrfs_root *root, u64 to_reclaim, u64 orig, u64 delalloc_bytes; u64 max_reclaim; long time_left; - unsigned long nr_pages = (2 * 1024 * 1024) >> PAGE_CACHE_SHIFT; - int loops = 0; + unsigned long nr_pages; + int loops; enum btrfs_reserve_flush_enum flush; trans = (struct btrfs_trans_handle *)current->journal_info; block_rsv = &root->fs_info->delalloc_block_rsv; space_info = block_rsv->space_info; - smp_mb(); delalloc_bytes = percpu_counter_sum_positive( &root->fs_info->delalloc_bytes); if (delalloc_bytes == 0) { @@ -4034,6 +4033,7 @@ static void shrink_delalloc(struct btrfs_root *root, u64 to_reclaim, u64 orig, return; } + loops = 0; while (delalloc_bytes && loops < 3) { max_reclaim = min(delalloc_bytes, to_reclaim); nr_pages = max_reclaim >> PAGE_CACHE_SHIFT; @@ -4064,7 +4064,6 @@ static void shrink_delalloc(struct btrfs_root *root, u64 to_reclaim, u64 orig, if (time_left) break; } - smp_mb(); delalloc_bytes = percpu_counter_sum_positive( &root->fs_info->delalloc_bytes); } -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Miao Xie
2013-Nov-04 15:13 UTC
[PATCH 2/7] Btrfs: wait for the ordered extent only when we want
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> --- fs/btrfs/extent-tree.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index abe65ed..ed6eceb 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -4029,7 +4029,8 @@ static void shrink_delalloc(struct btrfs_root *root, u64 to_reclaim, u64 orig, if (delalloc_bytes == 0) { if (trans) return; - btrfs_wait_all_ordered_extents(root->fs_info); + if (wait_ordered) + btrfs_wait_all_ordered_extents(root->fs_info); return; } -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Miao Xie
2013-Nov-04 15:13 UTC
[PATCH 3/7] Btrfs: pick up the code for the item number calculation in flush_space()
This patch picked up the code that was used to calculate the number of the items for which we need reserve space, and we will use it in the next patch. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> --- fs/btrfs/extent-tree.c | 25 ++++++++++++++++--------- 1 file changed, 16 insertions(+), 9 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index ed6eceb..32dcf80 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -4004,6 +4004,18 @@ static void btrfs_writeback_inodes_sb_nr(struct btrfs_root *root, } } +static inline int calc_reclaim_items_nr(struct btrfs_root *root, u64 to_reclaim) +{ + u64 bytes; + int nr; + + bytes = btrfs_calc_trans_metadata_size(root, 1); + nr = (int)div64_u64(to_reclaim, bytes); + if (!nr) + nr = 1; + return nr; +} + /* * shrink metadata reservation for delalloc */ @@ -4149,16 +4161,11 @@ static int flush_space(struct btrfs_root *root, switch (state) { case FLUSH_DELAYED_ITEMS_NR: case FLUSH_DELAYED_ITEMS: - if (state == FLUSH_DELAYED_ITEMS_NR) { - u64 bytes = btrfs_calc_trans_metadata_size(root, 1); - - nr = (int)div64_u64(num_bytes, bytes); - if (!nr) - nr = 1; - nr *= 2; - } else { + if (state == FLUSH_DELAYED_ITEMS_NR) + nr = calc_reclaim_items_nr(root, num_bytes) * 2; + else nr = -1; - } + trans = btrfs_join_transaction(root); if (IS_ERR(trans)) { ret = PTR_ERR(trans); -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Miao Xie
2013-Nov-04 15:13 UTC
[PATCH 4/7] Btrfs: fix the confusion between delalloc bytes and metadata bytes
In shrink_delalloc(), what we need reclaim is the metadata space, so flushing pages by to_reclaim is not reasonable, it is very likely that the pages we flush are not enough. And then we had to invoke the flush function for several times, at the worst, we need call flush_space for several times. It wasted time. We improve this problem by converting the metadata space size we need reserve to the delalloc bytes, By this way, we can flush the pages by a reasonable number. (Now we use a fixed number to do conversion, it is not flexible, maybe we can find a good way to improve it in the future.) Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> --- fs/btrfs/extent-tree.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 32dcf80..c48fde1 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -4016,6 +4016,8 @@ static inline int calc_reclaim_items_nr(struct btrfs_root *root, u64 to_reclaim) return nr; } +#define EXTENT_SIZE_PER_ITEM (256 * 1024) + /* * shrink metadata reservation for delalloc */ @@ -4032,6 +4034,10 @@ static void shrink_delalloc(struct btrfs_root *root, u64 to_reclaim, u64 orig, int loops; enum btrfs_reserve_flush_enum flush; + /* Calc the number of the pages we need flush for space reservation */ + to_reclaim = calc_reclaim_items_nr(root, to_reclaim); + to_reclaim *= EXTENT_SIZE_PER_ITEM; + trans = (struct btrfs_trans_handle *)current->journal_info; block_rsv = &root->fs_info->delalloc_block_rsv; space_info = block_rsv->space_info; -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Miao Xie
2013-Nov-04 15:13 UTC
[PATCH 5/7] Btrfs: don''t wait for all the async delalloc when shrinking delalloc
It was very likely that there were lots of async delalloc pages in the filesystem, if we waited until all the pages were flushed, we would be blocked for a long time, and the performance would also drop down. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> --- fs/btrfs/extent-tree.c | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index c48fde1..d691e15 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -4061,9 +4061,19 @@ static void shrink_delalloc(struct btrfs_root *root, u64 to_reclaim, u64 orig, * We need to wait for the async pages to actually start before * we do anything. */ - wait_event(root->fs_info->async_submit_wait, - !atomic_read(&root->fs_info->async_delalloc_pages)); + max_reclaim = atomic_read(&root->fs_info->async_delalloc_pages); + if (!max_reclaim) + goto skip_async; + + if (max_reclaim <= nr_pages) + max_reclaim = 0; + else + max_reclaim -= nr_pages; + wait_event(root->fs_info->async_submit_wait, + atomic_read(&root->fs_info->async_delalloc_pages) <+ (int)max_reclaim); +skip_async: if (!trans) flush = BTRFS_RESERVE_FLUSH_ALL; else -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Miao Xie
2013-Nov-04 15:13 UTC
[PATCH 6/7] Btrfs: don''t wait for the completion of all the ordered extents
It is very likely that there are lots of ordered extents in the filesytem, if we wait for the completion of all of them when we want to reclaim some space for the metadata space reservation, we would be blocked for a long time. The performance would drop down suddenly for a long time. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> --- fs/btrfs/dev-replace.c | 4 ++-- fs/btrfs/extent-tree.c | 11 ++++++----- fs/btrfs/ioctl.c | 2 +- fs/btrfs/ordered-data.c | 22 +++++++++++++++++----- fs/btrfs/ordered-data.h | 4 ++-- fs/btrfs/relocation.c | 2 +- fs/btrfs/super.c | 2 +- fs/btrfs/transaction.c | 2 +- 8 files changed, 31 insertions(+), 18 deletions(-) diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c index 2a9bd5b..bc55b36 100644 --- a/fs/btrfs/dev-replace.c +++ b/fs/btrfs/dev-replace.c @@ -400,7 +400,7 @@ int btrfs_dev_replace_start(struct btrfs_root *root, args->result = BTRFS_IOCTL_DEV_REPLACE_RESULT_NO_ERROR; btrfs_dev_replace_unlock(dev_replace); - btrfs_wait_all_ordered_extents(root->fs_info); + btrfs_wait_ordered_roots(root->fs_info, -1); /* force writing the updated state information to disk */ trans = btrfs_start_transaction(root, 0); @@ -475,7 +475,7 @@ static int btrfs_dev_replace_finishing(struct btrfs_fs_info *fs_info, mutex_unlock(&dev_replace->lock_finishing_cancel_unmount); return ret; } - btrfs_wait_all_ordered_extents(root->fs_info); + btrfs_wait_ordered_roots(root->fs_info, -1); trans = btrfs_start_transaction(root, 0); if (IS_ERR(trans)) { diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index d691e15..410d972 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -4000,7 +4000,7 @@ static void btrfs_writeback_inodes_sb_nr(struct btrfs_root *root, */ btrfs_start_all_delalloc_inodes(root->fs_info, 0); if (!current->journal_info) - btrfs_wait_all_ordered_extents(root->fs_info); + btrfs_wait_ordered_roots(root->fs_info, -1); } } @@ -4032,11 +4032,12 @@ static void shrink_delalloc(struct btrfs_root *root, u64 to_reclaim, u64 orig, long time_left; unsigned long nr_pages; int loops; + int items; enum btrfs_reserve_flush_enum flush; /* Calc the number of the pages we need flush for space reservation */ - to_reclaim = calc_reclaim_items_nr(root, to_reclaim); - to_reclaim *= EXTENT_SIZE_PER_ITEM; + items = calc_reclaim_items_nr(root, to_reclaim); + to_reclaim = items * EXTENT_SIZE_PER_ITEM; trans = (struct btrfs_trans_handle *)current->journal_info; block_rsv = &root->fs_info->delalloc_block_rsv; @@ -4048,7 +4049,7 @@ static void shrink_delalloc(struct btrfs_root *root, u64 to_reclaim, u64 orig, if (trans) return; if (wait_ordered) - btrfs_wait_all_ordered_extents(root->fs_info); + btrfs_wait_ordered_roots(root->fs_info, items); return; } @@ -4087,7 +4088,7 @@ skip_async: loops++; if (wait_ordered && !trans) { - btrfs_wait_all_ordered_extents(root->fs_info); + btrfs_wait_ordered_roots(root->fs_info, items); } else { time_left = schedule_timeout_killable(1); if (time_left) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 9d46f60..524692d 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -574,7 +574,7 @@ static int create_snapshot(struct btrfs_root *root, struct inode *dir, if (ret) return ret; - btrfs_wait_ordered_extents(root); + btrfs_wait_ordered_extents(root, -1); pending_snapshot = kzalloc(sizeof(*pending_snapshot), GFP_NOFS); if (!pending_snapshot) diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c index c702cb6..97efc12 100644 --- a/fs/btrfs/ordered-data.c +++ b/fs/btrfs/ordered-data.c @@ -563,10 +563,11 @@ static void btrfs_run_ordered_extent_work(struct btrfs_work *work) * wait for all the ordered extents in a root. This is done when balancing * space between drives. */ -void btrfs_wait_ordered_extents(struct btrfs_root *root) +int btrfs_wait_ordered_extents(struct btrfs_root *root, int nr) { struct list_head splice, works; struct btrfs_ordered_extent *ordered, *next; + int count = 0; INIT_LIST_HEAD(&splice); INIT_LIST_HEAD(&works); @@ -574,7 +575,7 @@ void btrfs_wait_ordered_extents(struct btrfs_root *root) mutex_lock(&root->fs_info->ordered_operations_mutex); spin_lock(&root->ordered_extent_lock); list_splice_init(&root->ordered_extents, &splice); - while (!list_empty(&splice)) { + while (!list_empty(&splice) && nr) { ordered = list_first_entry(&splice, struct btrfs_ordered_extent, root_extent_list); list_move_tail(&ordered->root_extent_list, @@ -589,7 +590,11 @@ void btrfs_wait_ordered_extents(struct btrfs_root *root) cond_resched(); spin_lock(&root->ordered_extent_lock); + if (nr != -1) + nr--; + count++; } + list_splice_tail(&splice, &root->ordered_extents); spin_unlock(&root->ordered_extent_lock); list_for_each_entry_safe(ordered, next, &works, work_list) { @@ -599,18 +604,21 @@ void btrfs_wait_ordered_extents(struct btrfs_root *root) cond_resched(); } mutex_unlock(&root->fs_info->ordered_operations_mutex); + + return count; } -void btrfs_wait_all_ordered_extents(struct btrfs_fs_info *fs_info) +void btrfs_wait_ordered_roots(struct btrfs_fs_info *fs_info, int nr) { struct btrfs_root *root; struct list_head splice; + int done; INIT_LIST_HEAD(&splice); spin_lock(&fs_info->ordered_root_lock); list_splice_init(&fs_info->ordered_roots, &splice); - while (!list_empty(&splice)) { + while (!list_empty(&splice) && nr) { root = list_first_entry(&splice, struct btrfs_root, ordered_root); root = btrfs_grab_fs_root(root); @@ -619,10 +627,14 @@ void btrfs_wait_all_ordered_extents(struct btrfs_fs_info *fs_info) &fs_info->ordered_roots); spin_unlock(&fs_info->ordered_root_lock); - btrfs_wait_ordered_extents(root); + done = btrfs_wait_ordered_extents(root, nr); btrfs_put_fs_root(root); spin_lock(&fs_info->ordered_root_lock); + if (nr != -1) { + nr -= done; + WARN_ON(nr < 0); + } } spin_unlock(&fs_info->ordered_root_lock); } diff --git a/fs/btrfs/ordered-data.h b/fs/btrfs/ordered-data.h index 0c0b356..296f15d 100644 --- a/fs/btrfs/ordered-data.h +++ b/fs/btrfs/ordered-data.h @@ -195,8 +195,8 @@ int btrfs_run_ordered_operations(struct btrfs_trans_handle *trans, void btrfs_add_ordered_operation(struct btrfs_trans_handle *trans, struct btrfs_root *root, struct inode *inode); -void btrfs_wait_ordered_extents(struct btrfs_root *root); -void btrfs_wait_all_ordered_extents(struct btrfs_fs_info *fs_info); +int btrfs_wait_ordered_extents(struct btrfs_root *root, int nr); +void btrfs_wait_ordered_roots(struct btrfs_fs_info *fs_info, int nr); void btrfs_get_logged_extents(struct btrfs_root *log, struct inode *inode); void btrfs_wait_logged_extents(struct btrfs_root *log, u64 transid); void btrfs_free_logged_extents(struct btrfs_root *log, u64 transid); diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c index 4a35572..0cec204 100644 --- a/fs/btrfs/relocation.c +++ b/fs/btrfs/relocation.c @@ -4246,7 +4246,7 @@ int btrfs_relocate_block_group(struct btrfs_root *extent_root, u64 group_start) err = ret; goto out; } - btrfs_wait_all_ordered_extents(fs_info); + btrfs_wait_ordered_roots(fs_info, -1); while (1) { mutex_lock(&fs_info->cleaner_mutex); diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index e913328..217ccb9 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -921,7 +921,7 @@ int btrfs_sync_fs(struct super_block *sb, int wait) return 0; } - btrfs_wait_all_ordered_extents(fs_info); + btrfs_wait_ordered_roots(fs_info, -1); trans = btrfs_attach_transaction_barrier(root); if (IS_ERR(trans)) { diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index 8c81bdc..0fada5b 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -1603,7 +1603,7 @@ static inline int btrfs_start_delalloc_flush(struct btrfs_fs_info *fs_info) static inline void btrfs_wait_delalloc_flush(struct btrfs_fs_info *fs_info) { if (btrfs_test_opt(fs_info->tree_root, FLUSHONCOMMIT)) - btrfs_wait_all_ordered_extents(fs_info); + btrfs_wait_ordered_roots(fs_info, -1); } int btrfs_commit_transaction(struct btrfs_trans_handle *trans, -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
rename the function -- btrfs_start_all_delalloc_inodes(), and make its name be compatible to btrfs_wait_ordered_roots(), since they are always used at the same place. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> --- fs/btrfs/ctree.h | 3 +-- fs/btrfs/dev-replace.c | 2 +- fs/btrfs/extent-tree.c | 2 +- fs/btrfs/inode.c | 3 +-- fs/btrfs/relocation.c | 2 +- fs/btrfs/transaction.c | 2 +- 6 files changed, 6 insertions(+), 8 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 0506f40..1fbfc21 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -3675,8 +3675,7 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans, u32 min_type); int btrfs_start_delalloc_inodes(struct btrfs_root *root, int delay_iput); -int btrfs_start_all_delalloc_inodes(struct btrfs_fs_info *fs_info, - int delay_iput); +int btrfs_start_delalloc_roots(struct btrfs_fs_info *fs_info, int delay_iput); int btrfs_set_extent_delalloc(struct inode *inode, u64 start, u64 end, struct extent_state **cached_state); int btrfs_create_subvol_root(struct btrfs_trans_handle *trans, diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c index bc55b36..7436e08 100644 --- a/fs/btrfs/dev-replace.c +++ b/fs/btrfs/dev-replace.c @@ -470,7 +470,7 @@ static int btrfs_dev_replace_finishing(struct btrfs_fs_info *fs_info, * flush all outstanding I/O and inode extent mappings before the * copy operation is declared as being finished */ - ret = btrfs_start_all_delalloc_inodes(root->fs_info, 0); + ret = btrfs_start_delalloc_roots(root->fs_info, 0); if (ret) { mutex_unlock(&dev_replace->lock_finishing_cancel_unmount); return ret; diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 410d972..0aad9ed 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -3998,7 +3998,7 @@ static void btrfs_writeback_inodes_sb_nr(struct btrfs_root *root, * the filesystem is readonly(all dirty pages are written to * the disk). */ - btrfs_start_all_delalloc_inodes(root->fs_info, 0); + btrfs_start_delalloc_roots(root->fs_info, 0); if (!current->journal_info) btrfs_wait_ordered_roots(root->fs_info, -1); } diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index f4a6851..da618c4 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -8276,8 +8276,7 @@ int btrfs_start_delalloc_inodes(struct btrfs_root *root, int delay_iput) return ret; } -int btrfs_start_all_delalloc_inodes(struct btrfs_fs_info *fs_info, - int delay_iput) +int btrfs_start_delalloc_roots(struct btrfs_fs_info *fs_info, int delay_iput) { struct btrfs_root *root; struct list_head splice; diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c index 0cec204..af41487 100644 --- a/fs/btrfs/relocation.c +++ b/fs/btrfs/relocation.c @@ -4241,7 +4241,7 @@ int btrfs_relocate_block_group(struct btrfs_root *extent_root, u64 group_start) printk(KERN_INFO "btrfs: relocating block group %llu flags %llu\n", rc->block_group->key.objectid, rc->block_group->flags); - ret = btrfs_start_all_delalloc_inodes(fs_info, 0); + ret = btrfs_start_delalloc_roots(fs_info, 0); if (ret < 0) { err = ret; goto out; diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index 0fada5b..840e672 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -1596,7 +1596,7 @@ static int btrfs_flush_all_pending_stuffs(struct btrfs_trans_handle *trans, static inline int btrfs_start_delalloc_flush(struct btrfs_fs_info *fs_info) { if (btrfs_test_opt(fs_info->tree_root, FLUSHONCOMMIT)) - return btrfs_start_all_delalloc_inodes(fs_info, 1); + return btrfs_start_delalloc_roots(fs_info, 1); return 0; } -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Josef Bacik
2013-Nov-04 17:05 UTC
Re: [PATCH 0/7] Btrfs: improve the latency of the space reservation
On Mon, Nov 04, 2013 at 11:13:19PM +0800, Miao Xie wrote:> This patchset improve the latency of the space reservation when doing the > delalloc inode flush. As we know, the current code would wait for > the completion all the ordered extents in the filesystem. It was unnecessary, > we can finish the wait and try to do reservation once again earlier if we > get enough free space. It can reduce the wait time and make the perfermance > up. > > We did the buffered write test by the sysbench on my box, the following > is the result. > Memory: 2GB > CPU: 2Cores * 1CPU > Partition: 20GB(SSD) > w/ w/o > rndwr-512KB-1Thread-2GB 120.38MB/s 110.08MB/s > rndwr-512KB-8Threads-2GB 119.43MB/s 110.96MB/s > seqwr-512KB-1Thread-2GB 99.676MB/s 98.53MB/s > seqwr-512KB-8Threads-2GB 111.79MB/s 99.176MB/s > > rndwr-128KB-1Thread-2GB 139.23MB/s 95.864MB/s > rndwr-128KB-8Threads-2GB 126.16MB/s 96.686MB/s > seqwr-128KB-1Thread-2GB 100.24MB/s 100.95.MB/s > seqwr-128KB-8Threads-2GB 126.51MB/s 100.26MB/s > > rndwr: random write test > seqwr: sequential write test > 512KB,128KB: the size of each request > 1Thread, 8Threads: the number of the test threads > 2GB: The total file size > > Miao Xie (7): > Btrfs: remove unnecessary initialization and memory barrior in shrink_delalloc() > Btrfs: wait for the ordered extent only when we want > Btrfs: pick up the code for the item number calculation in flush_space() > Btrfs: fix the confusion between delalloc bytes and metadata bytes > Btrfs: don''t wait for all the async delalloc when shrinking delalloc > Btrfs: don''t wait for the completion of all the ordered extents > Btrfs: rename btrfs_start_all_delalloc_inodes >Thank you for this, I''ve been wanting to do this for months but I keep getting sucked into bug hell, remind me to kiss you at LSF. Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Josef Bacik
2013-Nov-04 17:09 UTC
Re: [PATCH 7/7] Btrfs: rename btrfs_start_all_delalloc_inodes
On Mon, Nov 04, 2013 at 11:13:26PM +0800, Miao Xie wrote:> rename the function -- btrfs_start_all_delalloc_inodes(), and make its > name be compatible to btrfs_wait_ordered_roots(), since they are always > used at the same place. > > Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> > --- > fs/btrfs/ctree.h | 3 +-- > fs/btrfs/dev-replace.c | 2 +- > fs/btrfs/extent-tree.c | 2 +- > fs/btrfs/inode.c | 3 +-- > fs/btrfs/relocation.c | 2 +- > fs/btrfs/transaction.c | 2 +- > 6 files changed, 6 insertions(+), 8 deletions(-) >Theres another use in ioctl.c in my tree, I''ve just rebased that change in, please check my tree when I push it to make sure I did it right. Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Miao Xie
2013-Nov-05 03:07 UTC
Re: [PATCH 7/7] Btrfs: rename btrfs_start_all_delalloc_inodes
On mon, 4 Nov 2013 12:09:18 -0500, Josef Bacik wrote:> On Mon, Nov 04, 2013 at 11:13:26PM +0800, Miao Xie wrote: >> rename the function -- btrfs_start_all_delalloc_inodes(), and make its >> name be compatible to btrfs_wait_ordered_roots(), since they are always >> used at the same place. >> >> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> >> --- >> fs/btrfs/ctree.h | 3 +-- >> fs/btrfs/dev-replace.c | 2 +- >> fs/btrfs/extent-tree.c | 2 +- >> fs/btrfs/inode.c | 3 +-- >> fs/btrfs/relocation.c | 2 +- >> fs/btrfs/transaction.c | 2 +- >> 6 files changed, 6 insertions(+), 8 deletions(-) >> > > Theres another use in ioctl.c in my tree, I''ve just rebased that change in, > please check my tree when I push it to make sure I did it right. Thanks,I have checked it, everything is OK. Thanks Miao> > Josef > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
David Sterba
2013-Nov-05 11:45 UTC
Re: [PATCH 1/7] Btrfs: remove unnecessary initialization and memory barrior in shrink_delalloc()
On Mon, Nov 04, 2013 at 11:13:20PM +0800, Miao Xie wrote:> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>Reviewed-by: David Sterba <dsterba@suse.cz> Would be nice to say why it''s safe to remove the barrier. The percpu counters do not need it, they use a spinlock. The barrier was there due to access to fs_info->delalloc_bytes that was removed in commit 963d678b0f764930 that introduced the percpu counters. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Miao Xie
2013-Nov-06 01:39 UTC
Re: [PATCH 1/7] Btrfs: remove unnecessary initialization and memory barrior in shrink_delalloc()
On Tue, 5 Nov 2013 12:45:16 +0100, David Sterba wrote:> On Mon, Nov 04, 2013 at 11:13:20PM +0800, Miao Xie wrote: >> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> > > Reviewed-by: David Sterba <dsterba@suse.cz> > > Would be nice to say why it''s safe to remove the barrier. The percpu > counters do not need it, they use a spinlock. The barrier was there due > to access to fs_info->delalloc_bytes that was removed in commit > 963d678b0f764930 that introduced the percpu counters. >Right, I''ll send a new one. Thanks for you review. Miao -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Miao Xie
2013-Nov-06 05:29 UTC
[PATCH V2 1/7] Btrfs: remove unnecessary initialization and memory barrior in shrink_delalloc()
- The variant nr_pages is assigned before we use it, so the initialization at the beginning is unnecessary. - If we enter the branch of the first if statement, the initialization of the variant loops is also unnecessary. - The memory barriers here are misused, the barrier is used to prevent the problems that we may get unexpected value that is caused by the random order of the independent memory operations. But here, though the memory operations before and after the barrier are independent, we are sure that all the variants we access before the memory barrier can not be changed, so even though the operations are reordered, it is safe and the memory barriers are unnecessary. So we remove those statements. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> --- Changelog v1 -> v2: - Add the explanation why we remove those statements in the changelog. --- fs/btrfs/extent-tree.c | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 1d238ed..abe65ed 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -4016,15 +4016,14 @@ static void shrink_delalloc(struct btrfs_root *root, u64 to_reclaim, u64 orig, u64 delalloc_bytes; u64 max_reclaim; long time_left; - unsigned long nr_pages = (2 * 1024 * 1024) >> PAGE_CACHE_SHIFT; - int loops = 0; + unsigned long nr_pages; + int loops; enum btrfs_reserve_flush_enum flush; trans = (struct btrfs_trans_handle *)current->journal_info; block_rsv = &root->fs_info->delalloc_block_rsv; space_info = block_rsv->space_info; - smp_mb(); delalloc_bytes = percpu_counter_sum_positive( &root->fs_info->delalloc_bytes); if (delalloc_bytes == 0) { @@ -4034,6 +4033,7 @@ static void shrink_delalloc(struct btrfs_root *root, u64 to_reclaim, u64 orig, return; } + loops = 0; while (delalloc_bytes && loops < 3) { max_reclaim = min(delalloc_bytes, to_reclaim); nr_pages = max_reclaim >> PAGE_CACHE_SHIFT; @@ -4064,7 +4064,6 @@ static void shrink_delalloc(struct btrfs_root *root, u64 to_reclaim, u64 orig, if (time_left) break; } - smp_mb(); delalloc_bytes = percpu_counter_sum_positive( &root->fs_info->delalloc_bytes); } -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html