Hi Chris, I expect this to be my last fixup set for the 3.5 series. All seven fixes are pretty small, five of them for the tree mod log (3-7), one generic for backref walking (2) and one fixing a starvation issue when waiting for more delayed refs (1). You can pull my patches based on your current for-linus from git://git.jan-o-sch.net/btrfs-unstable for-chris I''ve run through xfstests as usual. All my backref walking tests are succeeding now. Even the qgroups patch set looks pretty good on top of these. If things go well, I''ll publish that patch series as soon as the current rc is out, so there''ll be plenty of time before 3.6. Thanks, -Jan Jan Schmidt (7): Btrfs: avoid waiting for delayed refs when we must not Btrfs: support root level changes in __resolve_indirect_ref Btrfs: fix tree mod log for root replacements at leaf level Btrfs: always put insert_ptr modifications into the tree mod log Btrfs: leave critical region in btrfs_find_all_roots as soon as possible Btrfs: fix tree mod log rewind of ADD operations Btrfs: resolve tree mod log locking issue in btrfs_next_leaf fs/btrfs/backref.c | 15 +++++++---- fs/btrfs/ctree.c | 60 ++++++++++++++++++++++++++++-------------------- fs/btrfs/extent-tree.c | 11 ++++---- 3 files changed, 50 insertions(+), 36 deletions(-) -- 1.7.3.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Jan Schmidt
2012-Jun-27 17:24 UTC
[PATCH 1/7] Btrfs: avoid waiting for delayed refs when we must not
We track two conditions to decide if we should sleep while waiting for more delayed refs, the number of delayed refs (num_refs) and the first entry in the list of blockers (first_seq). When we suspect staleness, we save num_refs and do one more cycle. If nothing changes, we then save first_seq for later comparison and do wait_event. We ought to save first_seq the very same moment we''re saving num_refs. Otherwise we cannot be sure that nothing has changed and we might start waiting when we shouldn''t, which could lead to starvation. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net> --- fs/btrfs/extent-tree.c | 11 ++++++----- 1 files changed, 6 insertions(+), 5 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 4b5a1e1..6e1d367 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -2347,12 +2347,10 @@ next: return count; } - static void wait_for_more_refs(struct btrfs_delayed_ref_root *delayed_refs, - unsigned long num_refs) + unsigned long num_refs, + struct list_head *first_seq) { - struct list_head *first_seq = delayed_refs->seq_head.next; - spin_unlock(&delayed_refs->lock); pr_debug("waiting for more refs (num %ld, first %p)\n", num_refs, first_seq); @@ -2381,6 +2379,7 @@ int btrfs_run_delayed_refs(struct btrfs_trans_handle *trans, struct btrfs_delayed_ref_root *delayed_refs; struct btrfs_delayed_ref_node *ref; struct list_head cluster; + struct list_head *first_seq = NULL; int ret; u64 delayed_start; int run_all = count == (unsigned long)-1; @@ -2436,8 +2435,10 @@ again: */ consider_waiting = 1; num_refs = delayed_refs->num_entries; + first_seq = root->fs_info->tree_mod_seq_list.next; } else { - wait_for_more_refs(delayed_refs, num_refs); + wait_for_more_refs(delayed_refs, + num_refs, first_seq); /* * after waiting, things have changed. we * dropped the lock and someone else might have -- 1.7.3.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Jan Schmidt
2012-Jun-27 17:24 UTC
[PATCH 2/7] Btrfs: support root level changes in __resolve_indirect_ref
With the tree mod log, we can have a tree that''s two levels high, but btrfs_search_old_slot may still return a path with the tree root at level one instead. __resolve_indirect_ref must care for this and accept parents in a lower level than expected. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net> --- fs/btrfs/backref.c | 12 ++++++++---- 1 files changed, 8 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c index 7301cdb..cf0df90 100644 --- a/fs/btrfs/backref.c +++ b/fs/btrfs/backref.c @@ -301,10 +301,14 @@ static int __resolve_indirect_ref(struct btrfs_fs_info *fs_info, goto out; eb = path->nodes[level]; - if (!eb) { - WARN_ON(1); - ret = 1; - goto out; + while (!eb) { + if (!level) { + WARN_ON(1); + ret = 1; + goto out; + } + level--; + eb = path->nodes[level]; } ret = add_all_parents(root, path, parents, level, &ref->key_for_search, -- 1.7.3.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Jan Schmidt
2012-Jun-27 17:24 UTC
[PATCH 3/7] Btrfs: fix tree mod log for root replacements at leaf level
For the tree mod log, we don''t log any operations at leaf level. If the root is at the leaf level (i.e. the tree consists only of the root), then __tree_mod_log_oldest_root will find a ROOT_REPLACE operation in the log (because we always log that one no matter which level), but no other operations. With this patch __tree_mod_log_oldest_root exits cleanly instead of BUGging in this situation. get_old_root checks if its really a root at leaf level in case we don''t have any operations and WARNs if this assumption breaks. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net> --- fs/btrfs/ctree.c | 28 +++++++++++++++------------- 1 files changed, 15 insertions(+), 13 deletions(-) diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c index 15cbc2b..7d1e4fc 100644 --- a/fs/btrfs/ctree.c +++ b/fs/btrfs/ctree.c @@ -1024,11 +1024,18 @@ __tree_mod_log_oldest_root(struct btrfs_fs_info *fs_info, if (!looped && !tm) return 0; /* - * we must have key remove operations in the log before the - * replace operation. + * if there are no tree operation for the oldest root, we simply + * return it. this should only happen if that (old) root is at + * level 0. */ - BUG_ON(!tm); + if (!tm) + break; + /* + * if there''s an operation that''s not a root replacement, we + * found the oldest version of our root. normally, we''ll find a + * MOD_LOG_KEY_REMOVE_WHILE_FREEING operation here. + */ if (tm->op != MOD_LOG_ROOT_REPLACE) break; @@ -1192,16 +1199,8 @@ get_old_root(struct btrfs_root *root, u64 time_seq) } tm = tree_mod_log_search(root->fs_info, logical, time_seq); - /* - * there was an item in the log when __tree_mod_log_oldest_root - * returned. this one must not go away, because the time_seq passed to - * us must be blocking its removal. - */ - BUG_ON(!tm); - if (old_root) - eb = alloc_dummy_extent_buffer(tm->index << PAGE_CACHE_SHIFT, - root->nodesize); + eb = alloc_dummy_extent_buffer(logical, root->nodesize); else eb = btrfs_clone_extent_buffer(root->node); btrfs_tree_read_unlock(root->node); @@ -1216,7 +1215,10 @@ get_old_root(struct btrfs_root *root, u64 time_seq) btrfs_set_header_level(eb, old_root->level); btrfs_set_header_generation(eb, old_generation); } - __tree_mod_log_rewind(eb, time_seq, tm); + if (tm) + __tree_mod_log_rewind(eb, time_seq, tm); + else + WARN_ON(btrfs_header_level(eb) != 0); extent_buffer_get(eb); return eb; -- 1.7.3.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Jan Schmidt
2012-Jun-27 17:24 UTC
[PATCH 4/7] Btrfs: always put insert_ptr modifications into the tree mod log
Several callers of insert_ptr set the tree_mod_log parameter to 0 to avoid addition to the tree mod log. In fact, we need all of those operations. This commit simply removes the additional parameter and makes addition to the tree mod log unconditional. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net> --- fs/btrfs/ctree.c | 14 +++++++------- 1 files changed, 7 insertions(+), 7 deletions(-) diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c index 7d1e4fc..e005d9b 100644 --- a/fs/btrfs/ctree.c +++ b/fs/btrfs/ctree.c @@ -2997,7 +2997,7 @@ static noinline int insert_new_root(struct btrfs_trans_handle *trans, static void insert_ptr(struct btrfs_trans_handle *trans, struct btrfs_root *root, struct btrfs_path *path, struct btrfs_disk_key *key, u64 bytenr, - int slot, int level, int tree_mod_log) + int slot, int level) { struct extent_buffer *lower; int nritems; @@ -3010,7 +3010,7 @@ static void insert_ptr(struct btrfs_trans_handle *trans, BUG_ON(slot > nritems); BUG_ON(nritems == BTRFS_NODEPTRS_PER_BLOCK(root)); if (slot != nritems) { - if (tree_mod_log && level) + if (level) tree_mod_log_eb_move(root->fs_info, lower, slot + 1, slot, nritems - slot); memmove_extent_buffer(lower, @@ -3018,7 +3018,7 @@ static void insert_ptr(struct btrfs_trans_handle *trans, btrfs_node_key_ptr_offset(slot), (nritems - slot) * sizeof(struct btrfs_key_ptr)); } - if (tree_mod_log && level) { + if (level) { ret = tree_mod_log_insert_key(root->fs_info, lower, slot, MOD_LOG_KEY_ADD); BUG_ON(ret < 0); @@ -3106,7 +3106,7 @@ static noinline int split_node(struct btrfs_trans_handle *trans, btrfs_mark_buffer_dirty(split); insert_ptr(trans, root, path, &disk_key, split->start, - path->slots[level + 1] + 1, level + 1, 1); + path->slots[level + 1] + 1, level + 1); if (path->slots[level] >= mid) { path->slots[level] -= mid; @@ -3643,7 +3643,7 @@ static noinline void copy_for_split(struct btrfs_trans_handle *trans, btrfs_set_header_nritems(l, mid); btrfs_item_key(right, &disk_key, 0); insert_ptr(trans, root, path, &disk_key, right->start, - path->slots[1] + 1, 1, 0); + path->slots[1] + 1, 1); btrfs_mark_buffer_dirty(right); btrfs_mark_buffer_dirty(l); @@ -3850,7 +3850,7 @@ again: if (mid <= slot) { btrfs_set_header_nritems(right, 0); insert_ptr(trans, root, path, &disk_key, right->start, - path->slots[1] + 1, 1, 0); + path->slots[1] + 1, 1); btrfs_tree_unlock(path->nodes[0]); free_extent_buffer(path->nodes[0]); path->nodes[0] = right; @@ -3859,7 +3859,7 @@ again: } else { btrfs_set_header_nritems(right, 0); insert_ptr(trans, root, path, &disk_key, right->start, - path->slots[1], 1, 0); + path->slots[1], 1); btrfs_tree_unlock(path->nodes[0]); free_extent_buffer(path->nodes[0]); path->nodes[0] = right; -- 1.7.3.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Jan Schmidt
2012-Jun-27 17:24 UTC
[PATCH 5/7] Btrfs: leave critical region in btrfs_find_all_roots as soon as possible
When delayed refs exist, btrfs_find_all_roots used to hold the delayed ref mutex way longer than actually required. We ought to drop it immediately after we''re done collecting all the delayed refs. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net> --- fs/btrfs/backref.c | 3 +-- 1 files changed, 1 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c index cf0df90..a383c18 100644 --- a/fs/btrfs/backref.c +++ b/fs/btrfs/backref.c @@ -839,6 +839,7 @@ again: } ret = __add_delayed_refs(head, delayed_ref_seq, &prefs_delayed); + mutex_unlock(&head->mutex); if (ret) { spin_unlock(&delayed_refs->lock); goto out; @@ -932,8 +933,6 @@ again: } out: - if (head) - mutex_unlock(&head->mutex); btrfs_free_path(path); while (!list_empty(&prefs)) { ref = list_first_entry(&prefs, struct __prelim_ref, list); -- 1.7.3.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Jan Schmidt
2012-Jun-27 17:24 UTC
[PATCH 6/7] Btrfs: fix tree mod log rewind of ADD operations
When a MOD_LOG_KEY_ADD operation is rewinded, we remove the key from the tree block. If its not the last key, removal involves a move operation. This move operation was explicitly done before this commit. However, at insertion time, there''s a move operation before the actual addition to make room for the new key, which is recorded in the tree mod log as well. This means, we must drop the move operation when rewinding the add operation, because the next operation we''ll be rewinding will be the corresponding MOD_LOG_MOVE_KEYS operation. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net> --- fs/btrfs/ctree.c | 6 +----- 1 files changed, 1 insertions(+), 5 deletions(-) diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c index e005d9b..b98f860 100644 --- a/fs/btrfs/ctree.c +++ b/fs/btrfs/ctree.c @@ -1094,11 +1094,7 @@ __tree_mod_log_rewind(struct extent_buffer *eb, u64 time_seq, tm->generation); break; case MOD_LOG_KEY_ADD: - if (tm->slot != n - 1) { - o_dst = btrfs_node_key_ptr_offset(tm->slot); - o_src = btrfs_node_key_ptr_offset(tm->slot + 1); - memmove_extent_buffer(eb, o_dst, o_src, p_size); - } + /* if a move operation is needed it''s in the log */ n--; break; case MOD_LOG_MOVE_KEYS: -- 1.7.3.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Jan Schmidt
2012-Jun-27 17:24 UTC
[PATCH 7/7] Btrfs: resolve tree mod log locking issue in btrfs_next_leaf
With the tree mod log, we may end up with two roots (the current root and a rewinded version of it) both pointing to two leaves, l1 and l2, of which l2 had already been cow-ed in the current transaction. If we don''t rewind any tree blocks, we cannot have two roots both pointing to an already cowed tree block. Now there is btrfs_next_leaf, which has a leaf locked and wants a lock on the next (right) leaf. And there is push_leaf_left, which has a (cowed!) leaf locked and wants a lock on the previous (left) leaf. In order to solve this dead lock situation, we use try_lock in btrfs_next_leaf (only in case it''s called with a tree mod log time_seq paramter) and if we fail to get a lock on the next leaf, we give up our lock on the current leaf and retry from the very beginning. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net> --- fs/btrfs/ctree.c | 12 ++++++++++++ 1 files changed, 12 insertions(+), 0 deletions(-) diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c index b98f860..8206b39 100644 --- a/fs/btrfs/ctree.c +++ b/fs/btrfs/ctree.c @@ -5119,6 +5119,18 @@ again: if (!path->skip_locking) { ret = btrfs_try_tree_read_lock(next); + if (!ret && time_seq) { + /* + * If we don''t get the lock, we may be racing + * with push_leaf_left, holding that lock while + * itself waiting for the leaf we''ve currently + * locked. To solve this situation, we give up + * on our lock and cycle. + */ + btrfs_release_path(path); + cond_resched(); + goto again; + } if (!ret) { btrfs_set_path_blocking(path); btrfs_tree_read_lock(next); -- 1.7.3.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html