Miao Xie
2012-Sep-26 11:22 UTC
[RFC][PATCH V2 0/4] Btrfs: introduce extent buffer cache to btrfs
This patchset introduce extent buffer cache to btrfs. The basic idea is to reduce the search time and the contentions of the extent buffer lock by re-using the last search result. I ran stress.sh, xfstests and some other tools to test it, all of them worked well. As a performance improvement patchset, of course we did performance test. Because this patchset is to improve the b+ tree search, in other words, it improves the performance of the metadata operations, we use file creation test to measure it. So we ran 10 tasks, and all of them created 100000 files in their own directories at the same time. As the result, we found this patchset makes btrfs ~20% faster(98s -> 77s). we can pull this patchset from the URL git://github.com/miaoxie/linux-btrfs.git extent-buffer-cache Thanks Miao --- Miao Xie (4): Btrfs: restructure btrfs_search_slot() Btrfs: introduce extent buffer cache for each i-node Btrfs: introduce extent buffer cache for delayed inode Btrfs: introduce extent buffer cache for delayed reference fs/btrfs/acl.c | 1 - fs/btrfs/btrfs_inode.h | 6 +- fs/btrfs/compression.c | 1 - fs/btrfs/ctree.c | 598 ++++++++++++++++++++++++++++++++++++--------- fs/btrfs/ctree.h | 32 +++- fs/btrfs/delayed-inode.c | 23 ++ fs/btrfs/dir-item.c | 2 + fs/btrfs/disk-io.c | 2 +- fs/btrfs/export.c | 1 - fs/btrfs/extent-tree.c | 6 + fs/btrfs/extent_io.c | 9 +- fs/btrfs/extent_io.h | 38 +++ fs/btrfs/file-item.c | 4 +- fs/btrfs/file.c | 5 +- fs/btrfs/inode-item.c | 10 +- fs/btrfs/inode.c | 36 +++- fs/btrfs/ioctl.c | 1 - fs/btrfs/ordered-data.c | 1 - fs/btrfs/relocation.c | 1 - fs/btrfs/send.c | 1 - fs/btrfs/super.c | 1 - fs/btrfs/transaction.c | 11 + fs/btrfs/transaction.h | 4 +- fs/btrfs/tree-log.c | 6 +- fs/btrfs/xattr.c | 2 +- 25 files changed, 651 insertions(+), 151 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
we splited btrfs_search_slot(), and pick up some code that is used to search the tree to make a new function, which can search the tree from any node that we specified. We will use it in the next patch. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> --- fs/btrfs/ctree.c | 317 +++++++++++++++++++++++++++++++++--------------------- 1 file changed, 197 insertions(+), 120 deletions(-) diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c index 6d183f6..3da5280 100644 --- a/fs/btrfs/ctree.c +++ b/fs/btrfs/ctree.c @@ -2409,104 +2409,53 @@ done: return ret; } +static int check_nodes_need_balance(struct btrfs_root *root, + struct btrfs_path *p, + struct extent_buffer *b, + int level, int ins_len) +{ + if ((p->search_for_split || ins_len > 0) && btrfs_header_nritems(b) >+ BTRFS_NODEPTRS_PER_BLOCK(root) - 3) + return 1; + + if (ins_len < 0 && + btrfs_header_nritems(b) < BTRFS_NODEPTRS_PER_BLOCK(root) / 2) + return 1; + + return 0; +} + /* - * look for key in the tree. path is filled in with nodes along the way - * if key is found, we return zero and you can find the item in the leaf - * level of the path (level 0) + * look for key from the specified node/leaf in the tree. path is filled in + * with nodes along the way if key is found, we return zero and you can find + * the item in the leaf level of the path (level 0). * * If the key isn''t found, the path points to the slot where it should - * be inserted, and 1 is returned. If there are other errors during the - * search a negative error number is returned. + * be inserted, and 1 is returned. * - * if ins_len > 0, nodes and leaves will be split as we walk down the - * tree. if ins_len < 0, nodes will be merged as we walk down the tree (if - * possible) + * Note: If we don''t search the tree from its root, though we also return 1 + * when the key isn''t found, this returned value - 1 - doesn''t mean + * the expected item is not in the tree, just tell us the expected item is + * not in the current branch. So we must deal with this case in the caller + * carefully. + * + * If we need re-search the specified branch, -EAGAIN will returned. And + * if we don''t search the tree from the root, we need restart the search + * from the root at some cases. At those cases, we will return -ERESTART. + * If there are other errors during the search the other negative error number + * is returned. */ -int btrfs_search_slot(struct btrfs_trans_handle *trans, struct btrfs_root - *root, struct btrfs_key *key, struct btrfs_path *p, int - ins_len, int cow) +static int __search_slot(struct btrfs_trans_handle *trans, + struct btrfs_root *root, struct extent_buffer *b, + struct btrfs_key *key, struct btrfs_path *p, + int ins_len, int cow, int lowest_unlock, + int *write_lock_level, int min_write_lock_level, + u8 lowest_level, bool search_from_root) { - struct extent_buffer *b; int slot; + int level; int ret; int err; - int level; - int lowest_unlock = 1; - int root_lock; - /* everything at write_lock_level or lower must be write locked */ - int write_lock_level = 0; - u8 lowest_level = 0; - int min_write_lock_level; - - lowest_level = p->lowest_level; - WARN_ON(lowest_level && ins_len > 0); - WARN_ON(p->nodes[0] != NULL); - - if (ins_len < 0) { - lowest_unlock = 2; - - /* when we are removing items, we might have to go up to level - * two as we update tree pointers Make sure we keep write - * for those levels as well - */ - write_lock_level = 2; - } else if (ins_len > 0) { - /* - * for inserting items, make sure we have a write lock on - * level 1 so we can update keys - */ - write_lock_level = 1; - } - - if (!cow) - write_lock_level = -1; - - if (cow && (p->keep_locks || p->lowest_level)) - write_lock_level = BTRFS_MAX_LEVEL; - - min_write_lock_level = write_lock_level; - -again: - /* - * we try very hard to do read locks on the root - */ - root_lock = BTRFS_READ_LOCK; - level = 0; - if (p->search_commit_root) { - /* - * the commit roots are read only - * so we always do read locks - */ - b = root->commit_root; - extent_buffer_get(b); - level = btrfs_header_level(b); - if (!p->skip_locking) - btrfs_tree_read_lock(b); - } else { - if (p->skip_locking) { - b = btrfs_root_node(root); - level = btrfs_header_level(b); - } else { - /* we don''t know the level of the root node - * until we actually have it read locked - */ - b = btrfs_read_lock_root_node(root); - level = btrfs_header_level(b); - if (level <= write_lock_level) { - /* whoops, must trade for write lock */ - btrfs_tree_read_unlock(b); - free_extent_buffer(b); - b = btrfs_lock_root_node(root); - root_lock = BTRFS_WRITE_LOCK; - - /* the level might have changed, check again */ - level = btrfs_header_level(b); - } - } - } - p->nodes[level] = b; - if (!p->skip_locking) - p->locks[level] = root_lock; while (b) { level = btrfs_header_level(b); @@ -2524,25 +2473,30 @@ again: if (!should_cow_block(trans, root, b)) goto cow_done; + if (!search_from_root && + (level + 1 >= BTRFS_MAX_LEVEL || + !p->nodes[level + 1])) { + ret = -ERESTART; + goto done; + } + btrfs_set_path_blocking(p); /* * must have write locks on this node and the * parent */ - if (level + 1 > write_lock_level) { - write_lock_level = level + 1; - btrfs_release_path(p); - goto again; + if (level + 1 > *write_lock_level) { + *write_lock_level = level + 1; + ret = -EAGAIN; + goto done; } - err = btrfs_cow_block(trans, root, b, + ret = btrfs_cow_block(trans, root, b, p->nodes[level + 1], p->slots[level + 1], &b); - if (err) { - ret = err; + if (ret) goto done; - } } cow_done: BUG_ON(!cow && ins_len); @@ -2573,13 +2527,23 @@ cow_done: slot -= 1; } p->slots[level] = slot; - err = setup_nodes_for_search(trans, root, p, b, level, - ins_len, &write_lock_level); - if (err == -EAGAIN) - goto again; - if (err) { - ret = err; - goto done; + if (!search_from_root && + (level + 1 >= BTRFS_MAX_LEVEL || + !p->nodes[level + 1])) { + err = check_nodes_need_balance(root, p, b, + level, ins_len); + if (err) { + ret = -ERESTART; + goto done; + } + } else { + err = setup_nodes_for_search(trans, root, p, b, + level, ins_len, + write_lock_level); + if (err) { + ret = err; + goto done; + } } b = p->nodes[level]; slot = p->slots[level]; @@ -2591,14 +2555,14 @@ cow_done: * on the parent */ if (slot == 0 && cow && - write_lock_level < level + 1) { - write_lock_level = level + 1; - btrfs_release_path(p); - goto again; + *write_lock_level < level + 1) { + *write_lock_level = level + 1; + ret = -EAGAIN; + goto done; } unlock_up(p, level, lowest_unlock, - min_write_lock_level, &write_lock_level); + min_write_lock_level, write_lock_level); if (level == lowest_level) { if (dec) @@ -2608,8 +2572,6 @@ cow_done: err = read_block_for_search(trans, root, p, &b, level, slot, key, 0); - if (err == -EAGAIN) - goto again; if (err) { ret = err; goto done; @@ -2617,13 +2579,13 @@ cow_done: if (!p->skip_locking) { level = btrfs_header_level(b); - if (level <= write_lock_level) { + if (level <= *write_lock_level) { err = btrfs_try_tree_write_lock(b); if (!err) { btrfs_set_path_blocking(p); btrfs_tree_lock(b); btrfs_clear_path_blocking(p, b, - BTRFS_WRITE_LOCK); + BTRFS_WRITE_LOCK); } p->locks[level] = BTRFS_WRITE_LOCK; } else { @@ -2632,7 +2594,7 @@ cow_done: btrfs_set_path_blocking(p); btrfs_tree_read_lock(b); btrfs_clear_path_blocking(p, b, - BTRFS_READ_LOCK); + BTRFS_READ_LOCK); } p->locks[level] = BTRFS_READ_LOCK; } @@ -2642,10 +2604,15 @@ cow_done: p->slots[level] = slot; if (ins_len > 0 && btrfs_leaf_free_space(root, b) < ins_len) { - if (write_lock_level < 1) { - write_lock_level = 1; - btrfs_release_path(p); - goto again; + if (*write_lock_level < 1) { + *write_lock_level = 1; + ret = -EAGAIN; + goto done; + } + + if (!search_from_root && !p->nodes[1]) { + ret = -ERESTART; + goto done; } btrfs_set_path_blocking(p); @@ -2653,7 +2620,6 @@ cow_done: p, ins_len, ret == 0); btrfs_clear_path_blocking(p, NULL, 0); - BUG_ON(err > 0); if (err) { ret = err; goto done; @@ -2661,7 +2627,8 @@ cow_done: } if (!p->search_for_split) unlock_up(p, level, lowest_unlock, - min_write_lock_level, &write_lock_level); + min_write_lock_level, + write_lock_level); goto done; } } @@ -2675,6 +2642,116 @@ done: btrfs_set_path_blocking(p); if (ret < 0) btrfs_release_path(p); + + return ret; +} + + +/* + * look for key in the tree. path is filled in with nodes along the way + * if key is found, we return zero and you can find the item in the leaf + * level of the path (level 0) + * + * If the key isn''t found, the path points to the slot where it should + * be inserted, and 1 is returned. If there are other errors during the + * search a negative error number is returned. + * + * if ins_len > 0, nodes and leaves will be split as we walk down the + * tree. if ins_len < 0, nodes will be merged as we walk down the tree (if + * possible) + */ +int btrfs_search_slot(struct btrfs_trans_handle *trans, struct btrfs_root + *root, struct btrfs_key *key, struct btrfs_path *p, int + ins_len, int cow) +{ + struct extent_buffer *b; + int ret; + int level; + int lowest_unlock = 1; + int root_lock; + /* everything at write_lock_level or lower must be write locked */ + int write_lock_level = 0; + u8 lowest_level = 0; + int min_write_lock_level; + + lowest_level = p->lowest_level; + WARN_ON(lowest_level && ins_len > 0); + WARN_ON(p->nodes[0] != NULL); + + if (ins_len < 0) { + lowest_unlock = 2; + + /* when we are removing items, we might have to go up to level + * two as we update tree pointers Make sure we keep write + * for those levels as well + */ + write_lock_level = 2; + } else if (ins_len > 0) { + /* + * for inserting items, make sure we have a write lock on + * level 1 so we can update keys + */ + write_lock_level = 1; + } + + if (!cow) + write_lock_level = -1; + + if (cow && (p->keep_locks || p->lowest_level)) + write_lock_level = BTRFS_MAX_LEVEL; + + min_write_lock_level = write_lock_level; + +again: + /* + * we try very hard to do read locks on the root + */ + root_lock = BTRFS_READ_LOCK; + level = 0; + if (p->search_commit_root) { + /* + * the commit roots are read only + * so we always do read locks + */ + b = root->commit_root; + extent_buffer_get(b); + level = btrfs_header_level(b); + if (!p->skip_locking) + btrfs_tree_read_lock(b); + } else { + if (p->skip_locking) { + b = btrfs_root_node(root); + level = btrfs_header_level(b); + } else { + /* we don''t know the level of the root node + * until we actually have it read locked + */ + b = btrfs_read_lock_root_node(root); + level = btrfs_header_level(b); + if (level <= write_lock_level) { + /* whoops, must trade for write lock */ + btrfs_tree_read_unlock(b); + free_extent_buffer(b); + b = btrfs_lock_root_node(root); + root_lock = BTRFS_WRITE_LOCK; + + /* the level might have changed, check again */ + level = btrfs_header_level(b); + } + } + } + p->nodes[level] = b; + if (!p->skip_locking) + p->locks[level] = root_lock; + + ret = __search_slot(trans, root, b, key, p, ins_len, cow, + lowest_unlock, &write_lock_level, + min_write_lock_level, lowest_level, true); + if (ret == -EAGAIN) + goto again; + + BUG_ON(ret == -ERESTART); + return ret; } -- 1.7.10.2 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Miao Xie
2012-Sep-26 11:37 UTC
[RFC][PATCH V2 2/4] Btrfs: introduce extent buffer cache for each i-node
This patch introduce extent buffer cache for every i-node. By this way, we needn''t search the item from the root of b+ tree, and can save the search time. Besides that we can also reduce the lock contention of the root. Implementation: - add two field - chg_seq and free_seq - into the extent buffer structure. chg_seq is used to record the time that we change the extent buffer for the b+ tree balance. free_seq is used to record the time that the extent buffer is removed from the tree. - define a structure named extent_buffer_cache which is used to keep the cached extent buffer, besides that, it also keep the transaction id, chg_seq and free_seq of the cached extent buffer. These variants can tell us if the cached extent buffer is stale or not. - add two extent buffer caches into btrfs_inode struct, one for nodes/leaves of fs/file tree, the other for nodes/leaves of the log tree. We will pass these caches to btrfs_search_slot() by the "path" parameter. - When we want to search fs/file trees or the relative log trees, we must try to search the cached extent buffer at first, if we can not find the item, we will do the common search. At some condition, we will jump to the common search directly: I. a new transaction starts. II. the cached extent buffer is stale. III. the cached extent buffer''s level is below the level we must lock. and so on. And beside that, if we can not find the item, and the slot points to the first the item or the last item in the leaf, we must jump out the cached extent buffer search, and do the common search. - After the common search (use btrfs_search_slot), we will cache the leaf or the level-1 node into the btrfs i-node object. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> --- fs/btrfs/acl.c | 1 - fs/btrfs/btrfs_inode.h | 6 +- fs/btrfs/compression.c | 1 - fs/btrfs/ctree.c | 313 ++++++++++++++++++++++++++++++++++++++++++++--- fs/btrfs/ctree.h | 32 ++++- fs/btrfs/dir-item.c | 2 + fs/btrfs/disk-io.c | 2 +- fs/btrfs/export.c | 1 - fs/btrfs/extent-tree.c | 1 + fs/btrfs/extent_io.c | 9 +- fs/btrfs/extent_io.h | 38 ++++++ fs/btrfs/file-item.c | 4 +- fs/btrfs/file.c | 5 +- fs/btrfs/inode-item.c | 10 +- fs/btrfs/inode.c | 36 ++++-- fs/btrfs/ioctl.c | 1 - fs/btrfs/ordered-data.c | 1 - fs/btrfs/relocation.c | 1 - fs/btrfs/send.c | 1 - fs/btrfs/super.c | 1 - fs/btrfs/transaction.c | 1 + fs/btrfs/transaction.h | 1 - fs/btrfs/tree-log.c | 6 +- fs/btrfs/xattr.c | 2 +- 24 files changed, 429 insertions(+), 47 deletions(-) diff --git a/fs/btrfs/acl.c b/fs/btrfs/acl.c index 761e2cd..c7302e5 100644 --- a/fs/btrfs/acl.c +++ b/fs/btrfs/acl.c @@ -25,7 +25,6 @@ #include <linux/slab.h> #include "ctree.h" -#include "btrfs_inode.h" #include "xattr.h" struct posix_acl *btrfs_get_acl(struct inode *inode, int type) diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h index 5b2ad6b..115c249 100644 --- a/fs/btrfs/btrfs_inode.h +++ b/fs/btrfs/btrfs_inode.h @@ -22,7 +22,6 @@ #include "extent_map.h" #include "extent_io.h" #include "ordered-data.h" -#include "delayed-inode.h" /* * ordered_data_close is set by truncate when a file that used @@ -39,6 +38,7 @@ #define BTRFS_INODE_HAS_ORPHAN_ITEM 5 #define BTRFS_INODE_HAS_ASYNC_EXTENT 6 +struct btrfs_delayed_node; /* in memory btrfs inode */ struct btrfs_inode { /* which subvolume this inode belongs to */ @@ -159,6 +159,9 @@ struct btrfs_inode { struct btrfs_delayed_node *delayed_node; + struct extent_buffer_cache fs_eb_cache; + struct extent_buffer_cache log_eb_cache; + struct inode vfs_inode; }; @@ -212,5 +215,4 @@ static inline int btrfs_inode_in_log(struct inode *inode, u64 generation) mutex_unlock(&root->log_mutex); return ret; } - #endif diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c index 43d1c5a..7cb135d 100644 --- a/fs/btrfs/compression.c +++ b/fs/btrfs/compression.c @@ -36,7 +36,6 @@ #include "ctree.h" #include "disk-io.h" #include "transaction.h" -#include "btrfs_inode.h" #include "volumes.h" #include "ordered-data.h" #include "compression.h" diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c index 3da5280..8803f8e 100644 --- a/fs/btrfs/ctree.c +++ b/fs/btrfs/ctree.c @@ -149,7 +149,10 @@ noinline void btrfs_release_path(struct btrfs_path *p) } free_extent_buffer(p->nodes[i]); p->nodes[i] = NULL; + p->free_seq[i] = 0; + p->chg_seq[i] = 0; } + p->max_level = 0; } /* @@ -1053,6 +1056,9 @@ static noinline int __btrfs_cow_block(struct btrfs_trans_handle *trans, btrfs_free_tree_block(trans, root, buf, parent_start, last_ref); } + + write_seqcount_barrier(&buf->free_seq); + if (unlock_orig) btrfs_tree_unlock(buf); free_extent_buffer_stale(buf); @@ -1734,6 +1740,9 @@ static noinline int balance_level(struct btrfs_trans_handle *trans, path->locks[level] = 0; path->nodes[level] = NULL; clean_tree_block(trans, root, mid); + + write_seqcount_barrier(&mid->free_seq); + btrfs_tree_unlock(mid); /* once for the path */ free_extent_buffer(mid); @@ -1788,6 +1797,7 @@ static noinline int balance_level(struct btrfs_trans_handle *trans, ret = wret; if (btrfs_header_nritems(right) == 0) { clean_tree_block(trans, root, right); + write_seqcount_barrier(&right->free_seq); btrfs_tree_unlock(right); del_ptr(trans, root, path, level + 1, pslot + 1, 1); root_sub_used(root, right->len); @@ -1832,6 +1842,7 @@ static noinline int balance_level(struct btrfs_trans_handle *trans, } if (btrfs_header_nritems(mid) == 0) { clean_tree_block(trans, root, mid); + write_seqcount_barrier(&mid->free_seq); btrfs_tree_unlock(mid); del_ptr(trans, root, path, level + 1, pslot, 1); root_sub_used(root, mid->len); @@ -2504,6 +2515,9 @@ cow_done: p->nodes[level] = b; btrfs_clear_path_blocking(p, NULL, 0); + p->chg_seq[level] = raw_seqcount_begin(&b->chg_seq); + p->free_seq[level] = raw_seqcount_begin(&b->free_seq); + /* * we have a lock on b and as long as we aren''t changing * the tree, there is no way to for the items in b to change. @@ -2647,22 +2661,10 @@ done: } -/* - * look for key in the tree. path is filled in with nodes along the way - * if key is found, we return zero and you can find the item in the leaf - * level of the path (level 0) - * - * If the key isn''t found, the path points to the slot where it should - * be inserted, and 1 is returned. If there are other errors during the - * search a negative error number is returned. - * - * if ins_len > 0, nodes and leaves will be split as we walk down the - * tree. if ins_len < 0, nodes will be merged as we walk down the tree (if - * possible) - */ -int btrfs_search_slot(struct btrfs_trans_handle *trans, struct btrfs_root - *root, struct btrfs_key *key, struct btrfs_path *p, int - ins_len, int cow) +static int btrfs_search_slot_root(struct btrfs_trans_handle *trans, + struct btrfs_root *root, + struct btrfs_key *key, struct btrfs_path *p, + int ins_len, int cow) { struct extent_buffer *b; int ret; @@ -2744,6 +2746,8 @@ again: if (!p->skip_locking) p->locks[level] = root_lock; + p->max_level = level; + ret = __search_slot(trans, root, b, key, p, ins_len, cow, lowest_unlock, &write_lock_level, min_write_lock_level, lowest_level, true); @@ -2755,6 +2759,268 @@ again: return ret; } +static int check_search_result_valid(struct btrfs_path *p, int ins_len, + int start_level, bool is_found) +{ + int i; + + if (ins_len < 0 && p->slots[start_level] == 0) { + for (i = start_level - 1; i >= 0; i--) { + if (p->slots[i] != 0) + return 1; + } + return 0; + } + + if (is_found) + return 1; + + if (p->slots[start_level] == 0) { + for (i = start_level - 1; i >= 0; i--) { + if (p->slots[i] != 0) + return 1; + } + return 0; + } + + while (start_level >= 0 && !p->locks[start_level]) + start_level--; + + if (start_level < 0) + return 0; + + for (i = start_level; i >= 0; i--) { + if (i > 0 && + p->slots[i] < btrfs_header_nritems(p->nodes[i]) - 1) + return 1; + if (i == 0 && + p->slots[i] != btrfs_header_nritems(p->nodes[i])) + return 1; + } + return 0; +} + +static int btrfs_search_slot_node(struct btrfs_trans_handle *trans, + struct btrfs_root *root, + struct extent_buffer *cached_eb, + unsigned free_seq, + struct btrfs_key *key, + struct btrfs_path *p, + int ins_len, int cow) +{ + struct btrfs_key tmp_key; + int ret; + int level; + int lowest_unlock = 1; + int write_lock_level = 0; + int min_write_lock_level = 0; + int root_lock; + u8 lowest_level = 0; + + if (cow && (p->keep_locks || p->lowest_level)) + return -ERESTART; + + BUG_ON(p->search_commit_root); + BUG_ON(!cached_eb); + + lowest_level = p->lowest_level; + WARN_ON(lowest_level && ins_len > 0); + WARN_ON(p->nodes[0] != NULL); + + if (ins_len < 0) { + lowest_unlock = 2; + write_lock_level = 1; + } + + if (!cow) + write_lock_level = -1; + + min_write_lock_level = write_lock_level; +again: + level = btrfs_header_level(cached_eb); + if (level < write_lock_level || level < lowest_level) + return -ERESTART; + + if (test_bit(EXTENT_BUFFER_STALE, &cached_eb->bflags)) + return -ERESTART; + + if (cow && btrfs_header_generation(cached_eb) != trans->transid) + return -ERESTART; + + extent_buffer_get(cached_eb); + + root_lock = 0; + if (!p->skip_locking) { + if (level <= write_lock_level) { + btrfs_tree_lock(cached_eb); + root_lock = BTRFS_WRITE_LOCK; + } else { + btrfs_tree_read_lock(cached_eb); + root_lock = BTRFS_READ_LOCK; + } + } + + level = btrfs_header_level(cached_eb); + p->nodes[level] = cached_eb; + p->locks[level] = root_lock; + + if (read_seqcount_retry(&cached_eb->free_seq, free_seq)) { + btrfs_release_path(p); + return -ERESTART; + } else if (cow && + btrfs_header_generation(cached_eb) != trans->transid) { + btrfs_release_path(p); + return -ERESTART; + } else if (level < write_lock_level || level < lowest_level) { + btrfs_release_path(p); + return -ERESTART; + } else if (test_bit(EXTENT_BUFFER_STALE, &cached_eb->bflags)) { + btrfs_release_path(p); + return -ERESTART; + } + + if (level > 0) + btrfs_node_key_to_cpu(cached_eb, &tmp_key, 0); + else + btrfs_item_key_to_cpu(cached_eb, &tmp_key, 0); + + if (btrfs_comp_cpu_keys(key, &tmp_key) < 0) { + btrfs_release_path(p); + return -ERESTART; + } + + if (level > 0) + btrfs_node_key_to_cpu(cached_eb, &tmp_key, + btrfs_header_nritems(cached_eb) - 1); + else + btrfs_item_key_to_cpu(cached_eb, &tmp_key, + btrfs_header_nritems(cached_eb) - 1); + + if (btrfs_comp_cpu_keys(key, &tmp_key) > 0) { + btrfs_release_path(p); + return -ERESTART; + } + + ret = __search_slot(trans, root, cached_eb, key, p, ins_len, cow, + lowest_unlock, &write_lock_level, + min_write_lock_level, lowest_level, false); + if (ret == -EAGAIN) + goto again; + + if (ret >= 0) { + if (!check_search_result_valid(p, ins_len, level, !ret)) { + ret = -ERESTART; + btrfs_release_path(p); + } + } + + return ret; +} + +/* + * look for key in the tree. path is filled in with nodes along the way + * if key is found, we return zero and you can find the item in the leaf + * level of the path (level 0) + * + * If the key isn''t found, the path points to the slot where it should + * be inserted, and 1 is returned. If there are other errors during the + * search a negative error number is returned. + * + * if ins_len > 0, nodes and leaves will be split as we walk down the + * tree. if ins_len < 0, nodes will be merged as we walk down the tree (if + * possible) + */ +int btrfs_search_slot(struct btrfs_trans_handle *trans, struct btrfs_root *root, + struct btrfs_key *key, struct btrfs_path *p, int ins_len, + int cow) +{ + struct extent_buffer_cache *eb_cache = NULL; + struct extent_buffer *cached_eb; + unsigned seq; + int cache_level; + int ret; + + if (unlikely(p->search_commit_root || p->keep_locks || + (cow && p->lowest_level))) + goto common_search; + + if (p->eb_cache) { + eb_cache = p->eb_cache; + + btrfs_lock_extent_buffer_cache(eb_cache); + cached_eb = eb_cache->cached_eb; + if (!cached_eb) { + btrfs_unlock_extent_buffer_cache(eb_cache); + goto common_search; + } + + if (trans && trans->transid != eb_cache->transid) { + btrfs_unlock_extent_buffer_cache(eb_cache); + goto common_search; + } + + seq = eb_cache->free_seq; + if (read_seqcount_retry(&cached_eb->free_seq, seq)) { + btrfs_unlock_extent_buffer_cache(eb_cache); + goto common_search; + } + + seq = eb_cache->chg_seq; + smp_rmb(); + if (raw_seqcount_begin(&cached_eb->chg_seq) - seq > 5) { + btrfs_unlock_extent_buffer_cache(eb_cache); + goto common_search; + } + seq = eb_cache->free_seq; + extent_buffer_get(cached_eb); + btrfs_unlock_extent_buffer_cache(eb_cache); + + ret = btrfs_search_slot_node(trans, root, cached_eb, seq, + key, p, ins_len, cow); + free_extent_buffer(cached_eb); + if (ret >= 0) + return ret; + + if (ret != -ERESTART) + return ret; + } +common_search: + ret = btrfs_search_slot_root(trans, root, key, p, ins_len, cow); + if (ret >= 0 && eb_cache) { + if (p->max_level <= 1) + goto out; + + cache_level = 1; + if (ins_len >= 0 && + (read_seqcount_retry(&p->nodes[cache_level]->free_seq, + p->free_seq[cache_level]) || + read_seqcount_retry(&p->nodes[cache_level]->chg_seq, + p->chg_seq[cache_level]))) + cache_level = 0; + + btrfs_lock_extent_buffer_cache(eb_cache); + eb_cache->chg_seq + read_seqcount_begin(&p->nodes[cache_level]->chg_seq); + eb_cache->free_seq + read_seqcount_begin(&p->nodes[cache_level]->free_seq); + + if (eb_cache->cached_eb == p->nodes[cache_level]) { + btrfs_unlock_extent_buffer_cache(eb_cache); + goto out; + } + + cached_eb = eb_cache->cached_eb; + eb_cache->cached_eb = p->nodes[cache_level]; + if (trans) + eb_cache->transid = trans->transid; + btrfs_unlock_extent_buffer_cache(eb_cache); + free_extent_buffer(cached_eb); + extent_buffer_get(p->nodes[cache_level]); + } +out: + return ret; +} + /* * Like btrfs_search_slot, this looks for a key in the given tree. It uses the * current state of the tree together with the operations recorded in the tree @@ -3056,6 +3322,8 @@ static int push_node_left(struct btrfs_trans_handle *trans, } btrfs_set_header_nritems(src, src_nritems - push_items); btrfs_set_header_nritems(dst, dst_nritems + push_items); + write_seqcount_barrier(&src->chg_seq); + write_seqcount_barrier(&dst->chg_seq); btrfs_mark_buffer_dirty(src); btrfs_mark_buffer_dirty(dst); @@ -3118,6 +3386,8 @@ static int balance_node_right(struct btrfs_trans_handle *trans, btrfs_set_header_nritems(src, src_nritems - push_items); btrfs_set_header_nritems(dst, dst_nritems + push_items); + write_seqcount_barrier(&src->chg_seq); + write_seqcount_barrier(&dst->chg_seq); btrfs_mark_buffer_dirty(src); btrfs_mark_buffer_dirty(dst); @@ -3313,6 +3583,8 @@ static noinline int split_node(struct btrfs_trans_handle *trans, btrfs_set_header_nritems(c, mid); ret = 0; + write_seqcount_barrier(&c->chg_seq); + write_seqcount_barrier(&split->chg_seq); btrfs_mark_buffer_dirty(c); btrfs_mark_buffer_dirty(split); @@ -3484,11 +3756,13 @@ static noinline int __push_leaf_right(struct btrfs_trans_handle *trans, left_nritems -= push_items; btrfs_set_header_nritems(left, left_nritems); + write_seqcount_barrier(&left->chg_seq); if (left_nritems) btrfs_mark_buffer_dirty(left); else clean_tree_block(trans, root, left); + write_seqcount_barrier(&right->chg_seq); btrfs_mark_buffer_dirty(right); btrfs_item_key(right, &disk_key, 0); @@ -3709,6 +3983,8 @@ static noinline int __push_leaf_left(struct btrfs_trans_handle *trans, btrfs_set_token_item_offset(right, item, push_space, &token); } + write_seqcount_barrier(&left->chg_seq); + write_seqcount_barrier(&right->chg_seq); btrfs_mark_buffer_dirty(left); if (right_nritems) btrfs_mark_buffer_dirty(right); @@ -3856,6 +4132,8 @@ static noinline void copy_for_split(struct btrfs_trans_handle *trans, insert_ptr(trans, root, path, &disk_key, right->start, path->slots[1] + 1, 1); + write_seqcount_barrier(&right->chg_seq); + write_seqcount_barrier(&l->chg_seq); btrfs_mark_buffer_dirty(right); btrfs_mark_buffer_dirty(l); BUG_ON(path->slots[0] != slot); @@ -4401,6 +4679,7 @@ void btrfs_truncate_item(struct btrfs_trans_handle *trans, item = btrfs_item_nr(leaf, slot); btrfs_set_item_size(leaf, item, new_size); + write_seqcount_barrier(&leaf->chg_seq); btrfs_mark_buffer_dirty(leaf); if (btrfs_leaf_free_space(root, leaf) < 0) { @@ -4838,6 +5117,8 @@ static noinline void btrfs_del_leaf(struct btrfs_trans_handle *trans, WARN_ON(btrfs_header_generation(leaf) != trans->transid); del_ptr(trans, root, path, 1, path->slots[1], 1); + write_seqcount_barrier(&leaf->free_seq); + /* * btrfs_free_extent is expensive, we want to make sure we * aren''t holding any locks when we call it diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index c38734a..ddda52e 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -548,9 +548,14 @@ struct btrfs_path { int slots[BTRFS_MAX_LEVEL]; /* if there is real range locking, this locks field will change */ int locks[BTRFS_MAX_LEVEL]; + unsigned free_seq[BTRFS_MAX_LEVEL]; + unsigned chg_seq[BTRFS_MAX_LEVEL]; int reada; /* keep some upper locks as we walk down */ int lowest_level; + int max_level; + + struct extent_buffer_cache *eb_cache; /* * set by btrfs_split_item, tells search_slot to keep all locks @@ -2730,6 +2735,27 @@ static inline u32 btrfs_level_size(struct btrfs_root *root, int level) return root->nodesize; } +#include "btrfs_inode.h" + +static inline void btrfs_path_set_eb_cache(struct btrfs_root *root, + struct inode *inode, + struct btrfs_path *path) +{ + WARN_ON(path->eb_cache); + + if (!path->search_commit_root) { + if (root->objectid == BTRFS_TREE_LOG_OBJECTID) + path->eb_cache = &BTRFS_I(inode)->log_eb_cache; + else + path->eb_cache = &BTRFS_I(inode)->fs_eb_cache; + } +} + +static inline void btrfs_path_clear_eb_cache(struct btrfs_path *path) +{ + path->eb_cache = NULL; +} + /* helper function to cast into the data area of the leaf. */ #define btrfs_item_ptr(leaf, slot, type) \ ((type *)(btrfs_leaf_data(leaf) + \ @@ -3167,11 +3193,11 @@ int btrfs_find_orphan_item(struct btrfs_root *root, u64 offset); int btrfs_insert_inode_ref(struct btrfs_trans_handle *trans, struct btrfs_root *root, const char *name, int name_len, - u64 inode_objectid, u64 ref_objectid, u64 index); + struct inode *inode, u64 ref_objectid, u64 index); int btrfs_del_inode_ref(struct btrfs_trans_handle *trans, struct btrfs_root *root, const char *name, int name_len, - u64 inode_objectid, u64 ref_objectid, u64 *index); + struct inode *inode, u64 ref_objectid, u64 *index); struct btrfs_inode_ref * btrfs_lookup_inode_ref(struct btrfs_trans_handle *trans, struct btrfs_root *root, @@ -3194,7 +3220,7 @@ int btrfs_lookup_bio_sums_dio(struct btrfs_root *root, struct inode *inode, struct bio *bio, u64 logical_offset); int btrfs_insert_file_extent(struct btrfs_trans_handle *trans, struct btrfs_root *root, - u64 objectid, u64 pos, + struct inode *inode, u64 pos, u64 disk_offset, u64 disk_num_bytes, u64 num_bytes, u64 offset, u64 ram_bytes, u8 compression, u8 encryption, u16 other_encoding); diff --git a/fs/btrfs/dir-item.c b/fs/btrfs/dir-item.c index c1a074d..de8caa3 100644 --- a/fs/btrfs/dir-item.c +++ b/fs/btrfs/dir-item.c @@ -20,6 +20,7 @@ #include "disk-io.h" #include "hash.h" #include "transaction.h" +#include "delayed-inode.h" /* * insert a name into a directory, doing overflow properly if there is a hash @@ -144,6 +145,7 @@ int btrfs_insert_dir_item(struct btrfs_trans_handle *trans, struct btrfs_root btrfs_cpu_key_to_disk(&disk_key, location); data_size = sizeof(*dir_item) + name_len; + btrfs_path_set_eb_cache(root, dir, path); dir_item = insert_with_overflow(trans, root, path, &key, data_size, name, name_len); if (IS_ERR(dir_item)) { diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 29c69e6..a8a8943 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -35,7 +35,6 @@ #include "ctree.h" #include "disk-io.h" #include "transaction.h" -#include "btrfs_inode.h" #include "volumes.h" #include "print-tree.h" #include "async-thread.h" @@ -45,6 +44,7 @@ #include "inode-map.h" #include "check-integrity.h" #include "rcu-string.h" +#include "delayed-inode.h" static struct extent_io_ops btree_extent_io_ops; static void end_workqueue_fn(struct btrfs_work *work); diff --git a/fs/btrfs/export.c b/fs/btrfs/export.c index 614f34a..f4f39fa 100644 --- a/fs/btrfs/export.c +++ b/fs/btrfs/export.c @@ -2,7 +2,6 @@ #include <linux/types.h> #include "ctree.h" #include "disk-io.h" -#include "btrfs_inode.h" #include "print-tree.h" #include "export.h" #include "compat.h" diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index ba58024..af0ea25 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -33,6 +33,7 @@ #include "volumes.h" #include "locking.h" #include "free-space-cache.h" +#include "delayed-inode.h" #undef SCRAMBLE_DELAYED_REFS diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 49085f2..f8ecd72 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -16,7 +16,6 @@ #include "extent_map.h" #include "compat.h" #include "ctree.h" -#include "btrfs_inode.h" #include "volumes.h" #include "check-integrity.h" #include "locking.h" @@ -3774,6 +3773,12 @@ int extent_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo, return -ENOMEM; path->leave_spinning = 1; + /* + * we are sure we will search the fs/file tree, so needn''t call + * btrfs_path_set_eb_cache + */ + path->eb_cache = &BTRFS_I(inode)->fs_eb_cache; + start = ALIGN(start, BTRFS_I(inode)->root->sectorsize); len = ALIGN(len, BTRFS_I(inode)->root->sectorsize); @@ -3965,6 +3970,8 @@ static struct extent_buffer *__alloc_extent_buffer(struct extent_io_tree *tree, eb->lock_nested = 0; init_waitqueue_head(&eb->write_lock_wq); init_waitqueue_head(&eb->read_lock_wq); + seqcount_init(&eb->chg_seq); + seqcount_init(&eb->free_seq); #if LEAK_DEBUG spin_lock_irqsave(&leak_lock, flags); diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h index 25900af..536b729 100644 --- a/fs/btrfs/extent_io.h +++ b/fs/btrfs/extent_io.h @@ -138,6 +138,11 @@ struct extent_buffer { struct rcu_head rcu_head; pid_t lock_owner; + /* trace the time of changing */ + seqcount_t chg_seq; + /* trace the time that this extent buffer is erased from the tree */ + seqcount_t free_seq; + /* count of read lock holders on the extent buffer */ atomic_t write_locks; atomic_t read_locks; @@ -164,6 +169,39 @@ struct extent_buffer { struct page **pages; }; +struct extent_buffer_cache { + struct extent_buffer *cached_eb; + u64 transid; + unsigned chg_seq; + unsigned free_seq; + spinlock_t lock; + bool need_lock; +}; + +static inline void btrfs_lock_extent_buffer_cache( + struct extent_buffer_cache *cache) +{ + if (cache->need_lock) + spin_lock(&cache->lock); +} + +static inline void btrfs_unlock_extent_buffer_cache( + struct extent_buffer_cache *cache) +{ + if (cache->need_lock) + spin_unlock(&cache->lock); +} + +static inline void extent_buffer_cache_init(struct extent_buffer_cache *cache) +{ + cache->cached_eb = NULL; + cache->chg_seq = 0; + cache->free_seq = 0; + cache->need_lock = 0; + cache->transid = 0; + spin_lock_init(&cache->lock); +} + static inline void extent_set_compress_type(unsigned long *bio_flags, int compress_type) { diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c index 857d93c..699c0f3 100644 --- a/fs/btrfs/file-item.c +++ b/fs/btrfs/file-item.c @@ -38,7 +38,7 @@ int btrfs_insert_file_extent(struct btrfs_trans_handle *trans, struct btrfs_root *root, - u64 objectid, u64 pos, + struct inode *inode, u64 pos, u64 disk_offset, u64 disk_num_bytes, u64 num_bytes, u64 offset, u64 ram_bytes, u8 compression, u8 encryption, u16 other_encoding) @@ -48,6 +48,7 @@ int btrfs_insert_file_extent(struct btrfs_trans_handle *trans, struct btrfs_key file_key; struct btrfs_path *path; struct extent_buffer *leaf; + u64 objectid = btrfs_ino(inode); path = btrfs_alloc_path(); if (!path) @@ -57,6 +58,7 @@ int btrfs_insert_file_extent(struct btrfs_trans_handle *trans, btrfs_set_key_type(&file_key, BTRFS_EXTENT_DATA_KEY); path->leave_spinning = 1; + btrfs_path_set_eb_cache(root, inode, path); ret = btrfs_insert_empty_item(trans, root, path, &file_key, sizeof(*item)); if (ret < 0) diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 9aa01ec..c14b24a 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -33,7 +33,6 @@ #include "ctree.h" #include "disk-io.h" #include "transaction.h" -#include "btrfs_inode.h" #include "ioctl.h" #include "print-tree.h" #include "tree-log.h" @@ -610,8 +609,10 @@ int btrfs_drop_extents(struct btrfs_trans_handle *trans, struct inode *inode, while (1) { recow = 0; + btrfs_path_set_eb_cache(root, inode, path); ret = btrfs_lookup_file_extent(trans, root, path, ino, search_start, modify_tree); + btrfs_path_clear_eb_cache(path); if (ret < 0) break; if (ret > 0 && path->slots[0] > 0 && search_start == start) { @@ -626,7 +627,9 @@ next_slot: leaf = path->nodes[0]; if (path->slots[0] >= btrfs_header_nritems(leaf)) { BUG_ON(del_nr > 0); + btrfs_path_set_eb_cache(root, inode, path); ret = btrfs_next_leaf(root, path); + btrfs_path_clear_eb_cache(path); if (ret < 0) break; if (ret > 0) { diff --git a/fs/btrfs/inode-item.c b/fs/btrfs/inode-item.c index a13cf1a..3b02f80 100644 --- a/fs/btrfs/inode-item.c +++ b/fs/btrfs/inode-item.c @@ -80,7 +80,7 @@ btrfs_lookup_inode_ref(struct btrfs_trans_handle *trans, int btrfs_del_inode_ref(struct btrfs_trans_handle *trans, struct btrfs_root *root, const char *name, int name_len, - u64 inode_objectid, u64 ref_objectid, u64 *index) + struct inode *inode, u64 ref_objectid, u64 *index) { struct btrfs_path *path; struct btrfs_key key; @@ -93,7 +93,7 @@ int btrfs_del_inode_ref(struct btrfs_trans_handle *trans, int ret; int del_len = name_len + sizeof(*ref); - key.objectid = inode_objectid; + key.objectid = btrfs_ino(inode); key.offset = ref_objectid; btrfs_set_key_type(&key, BTRFS_INODE_REF_KEY); @@ -103,6 +103,7 @@ int btrfs_del_inode_ref(struct btrfs_trans_handle *trans, path->leave_spinning = 1; + btrfs_path_set_eb_cache(root, inode, path); ret = btrfs_search_slot(trans, root, &key, path, -1, 1); if (ret > 0) { ret = -ENOENT; @@ -140,7 +141,7 @@ out: int btrfs_insert_inode_ref(struct btrfs_trans_handle *trans, struct btrfs_root *root, const char *name, int name_len, - u64 inode_objectid, u64 ref_objectid, u64 index) + struct inode *inode, u64 ref_objectid, u64 index) { struct btrfs_path *path; struct btrfs_key key; @@ -149,7 +150,7 @@ int btrfs_insert_inode_ref(struct btrfs_trans_handle *trans, int ret; int ins_len = name_len + sizeof(*ref); - key.objectid = inode_objectid; + key.objectid = btrfs_ino(inode); key.offset = ref_objectid; btrfs_set_key_type(&key, BTRFS_INODE_REF_KEY); @@ -158,6 +159,7 @@ int btrfs_insert_inode_ref(struct btrfs_trans_handle *trans, return -ENOMEM; path->leave_spinning = 1; + btrfs_path_set_eb_cache(root, inode, path); ret = btrfs_insert_empty_item(trans, root, path, &key, ins_len); if (ret == -EEXIST) { diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 6ba80b9..0e27d67 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -43,7 +43,6 @@ #include "ctree.h" #include "disk-io.h" #include "transaction.h" -#include "btrfs_inode.h" #include "ioctl.h" #include "print-tree.h" #include "ordered-data.h" @@ -54,6 +53,7 @@ #include "locking.h" #include "free-space-cache.h" #include "inode-map.h" +#include "delayed-inode.h" struct btrfs_iget_args { u64 ino; @@ -148,6 +148,7 @@ static noinline int insert_inline_extent(struct btrfs_trans_handle *trans, datasize = btrfs_file_extent_calc_inline_size(cur_size); inode_add_bytes(inode, size); + btrfs_path_set_eb_cache(root, inode, path); ret = btrfs_insert_empty_item(trans, root, path, &key, datasize); if (ret) { @@ -1177,8 +1178,10 @@ static noinline int run_delalloc_nocow(struct inode *inode, cow_start = (u64)-1; cur_offset = start; while (1) { + btrfs_path_set_eb_cache(root, inode, path); ret = btrfs_lookup_file_extent(trans, root, path, ino, cur_offset, 0); + btrfs_path_clear_eb_cache(path); if (ret < 0) { btrfs_abort_transaction(trans, root, ret); goto error; @@ -1195,7 +1198,9 @@ static noinline int run_delalloc_nocow(struct inode *inode, next_slot: leaf = path->nodes[0]; if (path->slots[0] >= btrfs_header_nritems(leaf)) { + btrfs_path_set_eb_cache(root, inode, path); ret = btrfs_next_leaf(root, path); + btrfs_path_clear_eb_cache(path); if (ret < 0) { btrfs_abort_transaction(trans, root, ret); goto error; @@ -1810,6 +1815,7 @@ static int insert_reserved_file_extent(struct btrfs_trans_handle *trans, ins.objectid = btrfs_ino(inode); ins.offset = file_pos; ins.type = BTRFS_EXTENT_DATA_KEY; + btrfs_path_set_eb_cache(root, inode, path); ret = btrfs_insert_empty_item(trans, root, path, &ins, sizeof(*fi)); if (ret) goto out; @@ -2783,6 +2789,7 @@ static int __btrfs_unlink_inode(struct btrfs_trans_handle *trans, } path->leave_spinning = 1; + btrfs_path_set_eb_cache(root, dir, path); di = btrfs_lookup_dir_item(trans, root, path, dir_ino, name, name_len, -1); if (IS_ERR(di)) { @@ -2800,7 +2807,7 @@ static int __btrfs_unlink_inode(struct btrfs_trans_handle *trans, goto err; btrfs_release_path(path); - ret = btrfs_del_inode_ref(trans, root, name, name_len, ino, + ret = btrfs_del_inode_ref(trans, root, name, name_len, inode, dir_ino, &index); if (ret) { printk(KERN_INFO "btrfs failed to delete reference to %.*s, " @@ -3119,8 +3126,10 @@ int btrfs_unlink_subvol(struct btrfs_trans_handle *trans, if (!path) return -ENOMEM; + btrfs_path_set_eb_cache(root, dir, path); di = btrfs_lookup_dir_item(trans, root, path, dir_ino, name, name_len, -1); + btrfs_path_clear_eb_cache(path); if (IS_ERR_OR_NULL(di)) { if (!di) ret = -ENOENT; @@ -3265,6 +3274,7 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans, if (!path) return -ENOMEM; path->reada = -1; + path->leave_spinning = 1; if (root->ref_cows || root == root->fs_info->tree_root) btrfs_drop_extent_cache(inode, new_size & (~mask), (u64)-1, 0); @@ -3282,8 +3292,8 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans, key.offset = (u64)-1; key.type = (u8)-1; + btrfs_path_set_eb_cache(root, inode, path); search_again: - path->leave_spinning = 1; ret = btrfs_search_slot(trans, root, &key, path, -1, 1); if (ret < 0) { err = ret; @@ -3631,7 +3641,7 @@ int btrfs_cont_expand(struct inode *inode, loff_t oldsize, loff_t size) } err = btrfs_insert_file_extent(trans, root, - btrfs_ino(inode), cur_offset, 0, + inode, cur_offset, 0, 0, hole_size, 0, hole_size, 0, 0, 0); if (err) { @@ -3865,6 +3875,7 @@ static int btrfs_inode_by_name(struct inode *dir, struct dentry *dentry, if (!path) return -ENOMEM; + btrfs_path_set_eb_cache(root, dir, path); di = btrfs_lookup_dir_item(NULL, root, path, btrfs_ino(dir), name, namelen, 0); if (IS_ERR(di)) @@ -4329,6 +4340,7 @@ static int btrfs_real_readdir(struct file *filp, void *dirent, key.offset = filp->f_pos; key.objectid = btrfs_ino(inode); + btrfs_path_set_eb_cache(root, inode, path); ret = btrfs_search_slot(NULL, root, &key, path, 0, 0); if (ret < 0) goto err; @@ -4687,6 +4699,7 @@ static struct inode *btrfs_new_inode(struct btrfs_trans_handle *trans, sizes[1] = name_len + sizeof(*ref); path->leave_spinning = 1; + btrfs_path_set_eb_cache(root, inode, path); ret = btrfs_insert_empty_items(trans, root, path, key, sizes, 2); if (ret != 0) goto fail; @@ -4776,7 +4789,7 @@ int btrfs_add_link(struct btrfs_trans_handle *trans, key.objectid, root->root_key.objectid, parent_ino, index, name, name_len); } else if (add_backref) { - ret = btrfs_insert_inode_ref(trans, root, name, name_len, ino, + ret = btrfs_insert_inode_ref(trans, root, name, name_len, inode, parent_ino, index); } @@ -4816,7 +4829,7 @@ fail_dir_item: int err; err = btrfs_del_inode_ref(trans, root, name, name_len, - ino, parent_ino, &local_index); + inode, parent_ino, &local_index); } return ret; } @@ -5228,6 +5241,7 @@ again: path->reada = 1; } + btrfs_path_set_eb_cache(root, inode, path); ret = btrfs_lookup_file_extent(trans, root, path, objectid, start, trans != NULL); if (ret < 0) { @@ -5698,6 +5712,7 @@ static noinline int can_nocow_odirect(struct btrfs_trans_handle *trans, if (!path) return -ENOMEM; + btrfs_path_set_eb_cache(root, inode, path); ret = btrfs_lookup_file_extent(trans, root, path, btrfs_ino(inode), offset, 0); if (ret < 0) @@ -6983,6 +6998,11 @@ struct inode *btrfs_alloc_inode(struct super_block *sb) INIT_LIST_HEAD(&ei->ordered_operations); RB_CLEAR_NODE(&ei->rb_node); + extent_buffer_cache_init(&ei->fs_eb_cache); + extent_buffer_cache_init(&ei->log_eb_cache); + ei->fs_eb_cache.need_lock = true; + ei->log_eb_cache.need_lock = true; + return inode; } @@ -7048,6 +7068,8 @@ void btrfs_destroy_inode(struct inode *inode) btrfs_drop_extent_cache(inode, 0, (u64)-1, 0); free: btrfs_remove_delayed_node(inode); + free_extent_buffer(BTRFS_I(inode)->fs_eb_cache.cached_eb); + free_extent_buffer(BTRFS_I(inode)->log_eb_cache.cached_eb); call_rcu(&inode->i_rcu, btrfs_i_callback); } @@ -7226,7 +7248,7 @@ static int btrfs_rename(struct inode *old_dir, struct dentry *old_dentry, ret = btrfs_insert_inode_ref(trans, dest, new_dentry->d_name.name, new_dentry->d_name.len, - old_ino, + old_inode, btrfs_ino(new_dir), index); if (ret) goto out_fail; diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index a1fbca0..e5b1419 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -46,7 +46,6 @@ #include "ctree.h" #include "disk-io.h" #include "transaction.h" -#include "btrfs_inode.h" #include "ioctl.h" #include "print-tree.h" #include "volumes.h" diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c index 643335a..f373889 100644 --- a/fs/btrfs/ordered-data.c +++ b/fs/btrfs/ordered-data.c @@ -22,7 +22,6 @@ #include <linux/pagevec.h> #include "ctree.h" #include "transaction.h" -#include "btrfs_inode.h" #include "extent_io.h" static u64 entry_end(struct btrfs_ordered_extent *entry) diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c index c5dbd91..78ff29f 100644 --- a/fs/btrfs/relocation.c +++ b/fs/btrfs/relocation.c @@ -27,7 +27,6 @@ #include "transaction.h" #include "volumes.h" #include "locking.h" -#include "btrfs_inode.h" #include "async-thread.h" #include "free-space-cache.h" #include "inode-map.h" diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c index bf232c8..74a5081 100644 --- a/fs/btrfs/send.c +++ b/fs/btrfs/send.c @@ -30,7 +30,6 @@ #include "backref.h" #include "locking.h" #include "disk-io.h" -#include "btrfs_inode.h" #include "transaction.h" static int g_verbose = 0; diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index 073c236..bc0584c 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -46,7 +46,6 @@ #include "ctree.h" #include "disk-io.h" #include "transaction.h" -#include "btrfs_inode.h" #include "ioctl.h" #include "print-tree.h" #include "xattr.h" diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index 3ee8d58..109c26c 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -30,6 +30,7 @@ #include "tree-log.h" #include "inode-map.h" #include "volumes.h" +#include "delayed-inode.h" #define BTRFS_ROOT_TRANS_TAG 0 diff --git a/fs/btrfs/transaction.h b/fs/btrfs/transaction.h index e8b8416..65e2282 100644 --- a/fs/btrfs/transaction.h +++ b/fs/btrfs/transaction.h @@ -18,7 +18,6 @@ #ifndef __BTRFS_TRANSACTION__ #define __BTRFS_TRANSACTION__ -#include "btrfs_inode.h" #include "delayed-ref.h" #include "ctree.h" diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c index c86670f..d0e608a 100644 --- a/fs/btrfs/tree-log.c +++ b/fs/btrfs/tree-log.c @@ -25,6 +25,7 @@ #include "print-tree.h" #include "compat.h" #include "tree-log.h" +#include "delayed-inode.h" /* magic values for the inode_only field in btrfs_log_inode: * @@ -2294,6 +2295,7 @@ int btrfs_del_dir_entries_in_log(struct btrfs_trans_handle *trans, goto out_unlock; } + btrfs_path_set_eb_cache(log, dir, path); di = btrfs_lookup_dir_item(trans, log, path, dir_ino, name, name_len, -1); if (IS_ERR(di)) { @@ -2385,7 +2387,7 @@ int btrfs_del_inode_ref_in_log(struct btrfs_trans_handle *trans, log = root->log_root; mutex_lock(&BTRFS_I(inode)->log_mutex); - ret = btrfs_del_inode_ref(trans, log, name, name_len, btrfs_ino(inode), + ret = btrfs_del_inode_ref(trans, log, name, name_len, inode, dirid, &index); mutex_unlock(&BTRFS_I(inode)->log_mutex); if (ret == -ENOSPC) { @@ -2868,6 +2870,8 @@ static int btrfs_log_inode(struct btrfs_trans_handle *trans, return ret; } + btrfs_path_set_eb_cache(root, inode, path); + btrfs_path_set_eb_cache(log, inode, dst_path); mutex_lock(&BTRFS_I(inode)->log_mutex); /* diff --git a/fs/btrfs/xattr.c b/fs/btrfs/xattr.c index 3f4e2d6..3a6547c 100644 --- a/fs/btrfs/xattr.c +++ b/fs/btrfs/xattr.c @@ -23,7 +23,6 @@ #include <linux/xattr.h> #include <linux/security.h> #include "ctree.h" -#include "btrfs_inode.h" #include "transaction.h" #include "xattr.h" #include "disk-io.h" @@ -102,6 +101,7 @@ static int do_setxattr(struct btrfs_trans_handle *trans, if (!path) return -ENOMEM; + btrfs_path_set_eb_cache(root, inode, path); if (flags & XATTR_REPLACE) { di = btrfs_lookup_xattr(trans, root, path, btrfs_ino(inode), name, name_len, -1); -- 1.7.10.2 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Miao Xie
2012-Sep-26 11:38 UTC
[RFC][PATCH V2 3/4] Btrfs: introduce extent buffer cache for delayed inode
This patch introduces extent buffer for the delayed inode, it can reduce the search time and the contentions of the extent buffer''s lock when doing delayed inode operations. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> --- fs/btrfs/delayed-inode.c | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+) diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c index 07d5eeb..5e116b6 100644 --- a/fs/btrfs/delayed-inode.c +++ b/fs/btrfs/delayed-inode.c @@ -21,6 +21,7 @@ #include "delayed-inode.h" #include "disk-io.h" #include "transaction.h" +#include "extent_io.h" #define BTRFS_DELAYED_WRITEBACK 400 #define BTRFS_DELAYED_BACKGROUND 100 @@ -1122,6 +1123,7 @@ static int __btrfs_run_delayed_items(struct btrfs_trans_handle *trans, struct btrfs_delayed_node *curr_node, *prev_node; struct btrfs_path *path; struct btrfs_block_rsv *block_rsv; + struct extent_buffer_cache cache; int ret = 0; bool count = (nr > 0); @@ -1133,6 +1135,9 @@ static int __btrfs_run_delayed_items(struct btrfs_trans_handle *trans, return -ENOMEM; path->leave_spinning = 1; + extent_buffer_cache_init(&cache); + path->eb_cache = &cache; + block_rsv = trans->block_rsv; trans->block_rsv = &root->fs_info->delayed_block_rsv; @@ -1149,6 +1154,11 @@ static int __btrfs_run_delayed_items(struct btrfs_trans_handle *trans, if (!ret) ret = btrfs_update_delayed_inode(trans, curr_root, path, curr_node); + + if (cache.cached_eb) + free_extent_buffer(cache.cached_eb); + extent_buffer_cache_init(&cache); + if (ret) { btrfs_release_delayed_node(curr_node); curr_node = NULL; @@ -1186,6 +1196,7 @@ static int __btrfs_commit_inode_delayed_items(struct btrfs_trans_handle *trans, { struct btrfs_path *path; struct btrfs_block_rsv *block_rsv; + struct extent_buffer_cache cache; int ret; path = btrfs_alloc_path(); @@ -1193,6 +1204,9 @@ static int __btrfs_commit_inode_delayed_items(struct btrfs_trans_handle *trans, return -ENOMEM; path->leave_spinning = 1; + extent_buffer_cache_init(&cache); + path->eb_cache = &cache; + block_rsv = trans->block_rsv; trans->block_rsv = &node->root->fs_info->delayed_block_rsv; @@ -1203,6 +1217,9 @@ static int __btrfs_commit_inode_delayed_items(struct btrfs_trans_handle *trans, ret = btrfs_update_delayed_inode(trans, node->root, path, node); btrfs_free_path(path); + if (cache.cached_eb) + free_extent_buffer(cache.cached_eb); + trans->block_rsv = block_rsv; return ret; } @@ -1255,6 +1272,7 @@ static void btrfs_async_run_delayed_node_done(struct btrfs_work *work) struct btrfs_delayed_node *delayed_node = NULL; struct btrfs_root *root; struct btrfs_block_rsv *block_rsv; + struct extent_buffer_cache cache; unsigned long nr = 0; int need_requeue = 0; int ret; @@ -1266,6 +1284,9 @@ static void btrfs_async_run_delayed_node_done(struct btrfs_work *work) goto out; path->leave_spinning = 1; + extent_buffer_cache_init(&cache); + path->eb_cache = &cache; + delayed_node = async_node->delayed_node; root = delayed_node->root; @@ -1284,6 +1305,8 @@ static void btrfs_async_run_delayed_node_done(struct btrfs_work *work) if (!ret) btrfs_update_delayed_inode(trans, root, path, delayed_node); + if (cache.cached_eb) + free_extent_buffer(cache.cached_eb); /* * Maybe new delayed items have been inserted, so we need requeue * the work. Besides that, we must dequeue the empty delayed nodes -- 1.7.10.2 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Miao Xie
2012-Sep-26 11:39 UTC
[RFC][PATCH V2 4/4] Btrfs: introduce extent buffer cache for delayed reference
This patch introduces extent buffer cache for the delayed reference. It can reduce the search time and the contentions of the extent buffer''s lock when dealing with the delayed references. Implementation: - add a extent buffer cache into the transaction handle - when dealing with the delayed references, we get the cache from the transaction handle and pass the cache into btrfs_search_slot(). - release the cached extent buffer when the transaction handle ends or we commit the transaction. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> --- fs/btrfs/extent-tree.c | 5 +++++ fs/btrfs/transaction.c | 10 ++++++++++ fs/btrfs/transaction.h | 3 +++ 3 files changed, 18 insertions(+) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index af0ea25..815876b 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -1918,6 +1918,7 @@ static int __btrfs_inc_extent_ref(struct btrfs_trans_handle *trans, path->reada = 1; path->leave_spinning = 1; + path->eb_cache = &trans->eb_cache; /* this will setup the path even if it fails to insert the back ref */ ret = insert_inline_extent_backref(trans, root->fs_info->extent_root, path, bytenr, num_bytes, parent, @@ -2049,6 +2050,7 @@ static int run_delayed_extent_op(struct btrfs_trans_handle *trans, path->reada = 1; path->leave_spinning = 1; + path->eb_cache = &trans->eb_cache; ret = btrfs_search_slot(trans, root->fs_info->extent_root, &key, path, 0, 1); if (ret < 0) { @@ -5067,6 +5069,7 @@ static int __btrfs_free_extent(struct btrfs_trans_handle *trans, is_data = owner_objectid >= BTRFS_FIRST_FREE_OBJECTID; BUG_ON(!is_data && refs_to_drop != 1); + path->eb_cache = &trans->eb_cache; ret = lookup_extent_backref(trans, extent_root, path, &iref, bytenr, num_bytes, parent, root_objectid, owner_objectid, @@ -6062,6 +6065,7 @@ static int alloc_reserved_file_extent(struct btrfs_trans_handle *trans, return -ENOMEM; path->leave_spinning = 1; + path->eb_cache = &trans->eb_cache; ret = btrfs_insert_empty_item(trans, fs_info->extent_root, path, ins, size); if (ret) { @@ -6126,6 +6130,7 @@ static int alloc_reserved_tree_block(struct btrfs_trans_handle *trans, return -ENOMEM; path->leave_spinning = 1; + path->eb_cache = &trans->eb_cache; ret = btrfs_insert_empty_item(trans, fs_info->extent_root, path, ins, size); if (ret) { diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index 109c26c..845898d 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -363,6 +363,7 @@ again: h->block_rsv = NULL; h->orig_rsv = NULL; h->aborted = 0; + extent_buffer_cache_init(&h->eb_cache); h->qgroup_reserved = qgroup_reserved; h->delayed_ref_elem.seq = 0; INIT_LIST_HEAD(&h->qgroup_ref_list); @@ -590,6 +591,9 @@ static int __btrfs_end_transaction(struct btrfs_trans_handle *trans, } assert_qgroups_uptodate(trans); + if (trans->eb_cache.cached_eb) + free_extent_buffer(trans->eb_cache.cached_eb); + memset(trans, 0, sizeof(*trans)); kmem_cache_free(btrfs_trans_handle_cachep, trans); return err; @@ -1299,6 +1303,9 @@ static void cleanup_transaction(struct btrfs_trans_handle *trans, if (current->journal_info == trans) current->journal_info = NULL; + if (trans->eb_cache.cached_eb) + free_extent_buffer(trans->eb_cache.cached_eb); + kmem_cache_free(btrfs_trans_handle_cachep, trans); } @@ -1587,6 +1594,9 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans, if (current->journal_info == trans) current->journal_info = NULL; + if (trans->eb_cache.cached_eb) + free_extent_buffer(trans->eb_cache.cached_eb); + kmem_cache_free(btrfs_trans_handle_cachep, trans); if (current != root->fs_info->transaction_kthread) diff --git a/fs/btrfs/transaction.h b/fs/btrfs/transaction.h index 65e2282..fe1bf95 100644 --- a/fs/btrfs/transaction.h +++ b/fs/btrfs/transaction.h @@ -46,6 +46,8 @@ struct btrfs_transaction { int aborted; }; +struct extent_buffer_cache; + struct btrfs_trans_handle { u64 transid; u64 bytes_reserved; @@ -57,6 +59,7 @@ struct btrfs_trans_handle { struct btrfs_transaction *transaction; struct btrfs_block_rsv *block_rsv; struct btrfs_block_rsv *orig_rsv; + struct extent_buffer_cache eb_cache; int aborted; int adding_csums; /* -- 1.7.10.2 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Tim Chen
2012-Oct-01 16:53 UTC
Re: [RFC][PATCH V2 0/4] Btrfs: introduce extent buffer cache to btrfs
Miao Xie <miaox <at> cn.fujitsu.com> writes:> > This patchset introduce extent buffer cache to btrfs. The basic idea > is to reduce the search time and the contentions of the extent buffer > lock by re-using the last search result. > > I ran stress.sh, xfstests and some other tools to test it, all of them > worked well. > > As a performance improvement patchset, of course we did performance test. > Because this patchset is to improve the b+ tree search, in other words, > it improves the performance of the metadata operations, we use file creation > test to measure it. So we ran 10 tasks, and all of them created 100000 files > in their own directories at the same time. As the result, we found thispatchset> makes btrfs ~20% faster(98s -> 77s). > > we can pull this patchset from the URL > > git://github.com/miaoxie/linux-btrfs.git extent-buffer-cache > > Thanks > Miao > ---I tested the patchset with aim7''s fileserver test with 16 processes on RAM emulated SCSI btrfs file system. I got about 18% speedup in throughput. The workload is a mix of file copy, read, write and file sync. The contention on btrfs_tree_lock operation is reduced with the patchset, by about 8.4% of cpu cycles (from 48.3% to 39.9%). Miao, I can send you the detailed profile if you are interested. Thanks. Tim Chen -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Miao Xie
2012-Oct-09 07:32 UTC
Re: [RFC][PATCH V2 0/4] Btrfs: introduce extent buffer cache to btrfs
Any comment? On wed, 26 Sep 2012 19:22:11 +0800, Miao Xie wrote:> This patchset introduce extent buffer cache to btrfs. The basic idea > is to reduce the search time and the contentions of the extent buffer > lock by re-using the last search result. > > I ran stress.sh, xfstests and some other tools to test it, all of them > worked well. > > As a performance improvement patchset, of course we did performance test. > Because this patchset is to improve the b+ tree search, in other words, > it improves the performance of the metadata operations, we use file creation > test to measure it. So we ran 10 tasks, and all of them created 100000 files > in their own directories at the same time. As the result, we found this patchset > makes btrfs ~20% faster(98s -> 77s). > > we can pull this patchset from the URL > > git://github.com/miaoxie/linux-btrfs.git extent-buffer-cache > > Thanks > Miao > --- > Miao Xie (4): > Btrfs: restructure btrfs_search_slot() > Btrfs: introduce extent buffer cache for each i-node > Btrfs: introduce extent buffer cache for delayed inode > Btrfs: introduce extent buffer cache for delayed reference > > fs/btrfs/acl.c | 1 - > fs/btrfs/btrfs_inode.h | 6 +- > fs/btrfs/compression.c | 1 - > fs/btrfs/ctree.c | 598 ++++++++++++++++++++++++++++++++++++--------- > fs/btrfs/ctree.h | 32 +++- > fs/btrfs/delayed-inode.c | 23 ++ > fs/btrfs/dir-item.c | 2 + > fs/btrfs/disk-io.c | 2 +- > fs/btrfs/export.c | 1 - > fs/btrfs/extent-tree.c | 6 + > fs/btrfs/extent_io.c | 9 +- > fs/btrfs/extent_io.h | 38 +++ > fs/btrfs/file-item.c | 4 +- > fs/btrfs/file.c | 5 +- > fs/btrfs/inode-item.c | 10 +- > fs/btrfs/inode.c | 36 +++- > fs/btrfs/ioctl.c | 1 - > fs/btrfs/ordered-data.c | 1 - > fs/btrfs/relocation.c | 1 - > fs/btrfs/send.c | 1 - > fs/btrfs/super.c | 1 - > fs/btrfs/transaction.c | 11 + > fs/btrfs/transaction.h | 4 +- > fs/btrfs/tree-log.c | 6 +- > fs/btrfs/xattr.c | 2 +- > 25 files changed, 651 insertions(+), 151 deletions(-) > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html