Each time pick one dead root from the list and let the caller know if it''s needed to continue. This should improve responsiveness during umount and balance which at some point wait for cleaning all currently queued dead roots. A new dead root is added to the end of the list, so the snapshots disappear in the order of deletion. Process snapshot cleaning is now done only from the cleaner thread and the others wake it if needed. Signed-off-by: David Sterba <dsterba@suse.cz> --- * btrfs_clean_old_snapshots is removed from the reloc loop, I don''t know if this is safe wrt reloc''s assumptions * btrfs_run_delayed_iputs is left in place in super_commit, may get removed as well because transaction commit calls it in the end * the responsiveness can be improved further if btrfs_drop_snapshot check fs_closing, but this needs changes to error handling in the main reloc loop fs/btrfs/disk-io.c | 8 ++++-- fs/btrfs/relocation.c | 3 -- fs/btrfs/transaction.c | 57 ++++++++++++++++++++++++++++++++---------------- fs/btrfs/transaction.h | 2 +- 4 files changed, 44 insertions(+), 26 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 51bff86..6a02336 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -1635,15 +1635,17 @@ static int cleaner_kthread(void *arg) struct btrfs_root *root = arg; do { + int again = 0; + if (!(root->fs_info->sb->s_flags & MS_RDONLY) && mutex_trylock(&root->fs_info->cleaner_mutex)) { btrfs_run_delayed_iputs(root); - btrfs_clean_old_snapshots(root); + again = btrfs_clean_one_deleted_snapshot(root); mutex_unlock(&root->fs_info->cleaner_mutex); btrfs_run_defrag_inodes(root->fs_info); } - if (!try_to_freeze()) { + if (!try_to_freeze() && !again) { set_current_state(TASK_INTERRUPTIBLE); if (!kthread_should_stop()) schedule(); @@ -3301,8 +3303,8 @@ int btrfs_commit_super(struct btrfs_root *root) mutex_lock(&root->fs_info->cleaner_mutex); btrfs_run_delayed_iputs(root); - btrfs_clean_old_snapshots(root); mutex_unlock(&root->fs_info->cleaner_mutex); + wake_up_process(root->fs_info->cleaner_kthread); /* wait until ongoing cleanup work done */ down_write(&root->fs_info->cleanup_work_sem); diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c index ba5a321..ab6a718 100644 --- a/fs/btrfs/relocation.c +++ b/fs/btrfs/relocation.c @@ -4060,10 +4060,7 @@ int btrfs_relocate_block_group(struct btrfs_root *extent_root, u64 group_start) while (1) { mutex_lock(&fs_info->cleaner_mutex); - - btrfs_clean_old_snapshots(fs_info->tree_root); ret = relocate_block_group(rc); - mutex_unlock(&fs_info->cleaner_mutex); if (ret < 0) { err = ret; diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index 361fb7d..f1e3606 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -895,7 +895,7 @@ static noinline int commit_cowonly_roots(struct btrfs_trans_handle *trans, int btrfs_add_dead_root(struct btrfs_root *root) { spin_lock(&root->fs_info->trans_lock); - list_add(&root->root_list, &root->fs_info->dead_roots); + list_add_tail(&root->root_list, &root->fs_info->dead_roots); spin_unlock(&root->fs_info->trans_lock); return 0; } @@ -1783,31 +1783,50 @@ cleanup_transaction: } /* - * interface function to delete all the snapshots we have scheduled for deletion + * return < 0 if error + * 0 if there are no more dead_roots at the time of call + * 1 there are more to be processed, call me again + * + * The return value indicates there are certainly more snapshots to delete, but + * if there comes a new one during processing, it may return 0. We don''t mind, + * because btrfs_commit_super will poke cleaner thread and it will process it a + * few seconds later. */ -int btrfs_clean_old_snapshots(struct btrfs_root *root) +int btrfs_clean_one_deleted_snapshot(struct btrfs_root *root) { - LIST_HEAD(list); + int ret; + int run_again = 1; struct btrfs_fs_info *fs_info = root->fs_info; + if (root->fs_info->sb->s_flags & MS_RDONLY) { + pr_debug(G "btrfs: cleaner called for RO fs!\n"); + return 0; + } + spin_lock(&fs_info->trans_lock); - list_splice_init(&fs_info->dead_roots, &list); + if (list_empty(&fs_info->dead_roots)) { + spin_unlock(&fs_info->trans_lock); + return 0; + } + root = list_first_entry(&fs_info->dead_roots, + struct btrfs_root, root_list); + list_del(&root->root_list); spin_unlock(&fs_info->trans_lock); - while (!list_empty(&list)) { - int ret; - - root = list_entry(list.next, struct btrfs_root, root_list); - list_del(&root->root_list); + pr_debug("btrfs: cleaner removing %llu\n", + (unsigned long long)root->objectid); - btrfs_kill_all_delayed_nodes(root); + btrfs_kill_all_delayed_nodes(root); - if (btrfs_header_backref_rev(root->node) < - BTRFS_MIXED_BACKREF_REV) - ret = btrfs_drop_snapshot(root, NULL, 0, 0); - else - ret =btrfs_drop_snapshot(root, NULL, 1, 0); - BUG_ON(ret < 0); - } - return 0; + if (btrfs_header_backref_rev(root->node) < + BTRFS_MIXED_BACKREF_REV) + ret = btrfs_drop_snapshot(root, NULL, 0, 0); + else + ret = btrfs_drop_snapshot(root, NULL, 1, 0); + /* + * If we encounter a transaction abort during snapshot cleaning, we + * don''t want to crash here + */ + BUG_ON(ret < 0 && (ret != -EAGAIN || ret != -EROFS)); + return run_again || ret == -EAGAIN; } diff --git a/fs/btrfs/transaction.h b/fs/btrfs/transaction.h index 69700f7..f8e9583 100644 --- a/fs/btrfs/transaction.h +++ b/fs/btrfs/transaction.h @@ -118,7 +118,7 @@ int btrfs_write_and_wait_transaction(struct btrfs_trans_handle *trans, int btrfs_add_dead_root(struct btrfs_root *root); int btrfs_defrag_root(struct btrfs_root *root, int cacheonly); -int btrfs_clean_old_snapshots(struct btrfs_root *root); +int btrfs_clean_one_deleted_snapshot(struct btrfs_root *root); int btrfs_commit_transaction(struct btrfs_trans_handle *trans, struct btrfs_root *root); int btrfs_commit_transaction_async(struct btrfs_trans_handle *trans, -- 1.7.9 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi David, thank you for addressing this issue. On Mon, Feb 11, 2013 at 6:11 PM, David Sterba <dsterba@suse.cz> wrote:> Each time pick one dead root from the list and let the caller know if > it''s needed to continue. This should improve responsiveness during > umount and balance which at some point wait for cleaning all currently > queued dead roots. > > A new dead root is added to the end of the list, so the snapshots > disappear in the order of deletion. > > Process snapshot cleaning is now done only from the cleaner thread and > the others wake it if needed.This is great.> > Signed-off-by: David Sterba <dsterba@suse.cz> > --- > > * btrfs_clean_old_snapshots is removed from the reloc loop, I don''t know if this > is safe wrt reloc''s assumptions > > * btrfs_run_delayed_iputs is left in place in super_commit, may get removed as > well because transaction commit calls it in the end > > * the responsiveness can be improved further if btrfs_drop_snapshot check > fs_closing, but this needs changes to error handling in the main reloc loop > > fs/btrfs/disk-io.c | 8 ++++-- > fs/btrfs/relocation.c | 3 -- > fs/btrfs/transaction.c | 57 ++++++++++++++++++++++++++++++++---------------- > fs/btrfs/transaction.h | 2 +- > 4 files changed, 44 insertions(+), 26 deletions(-) > > diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c > index 51bff86..6a02336 100644 > --- a/fs/btrfs/disk-io.c > +++ b/fs/btrfs/disk-io.c > @@ -1635,15 +1635,17 @@ static int cleaner_kthread(void *arg) > struct btrfs_root *root = arg; > > do { > + int again = 0; > + > if (!(root->fs_info->sb->s_flags & MS_RDONLY) && > mutex_trylock(&root->fs_info->cleaner_mutex)) { > btrfs_run_delayed_iputs(root); > - btrfs_clean_old_snapshots(root); > + again = btrfs_clean_one_deleted_snapshot(root); > mutex_unlock(&root->fs_info->cleaner_mutex); > btrfs_run_defrag_inodes(root->fs_info); > } > > - if (!try_to_freeze()) { > + if (!try_to_freeze() && !again) { > set_current_state(TASK_INTERRUPTIBLE); > if (!kthread_should_stop()) > schedule(); > @@ -3301,8 +3303,8 @@ int btrfs_commit_super(struct btrfs_root *root) > > mutex_lock(&root->fs_info->cleaner_mutex); > btrfs_run_delayed_iputs(root); > - btrfs_clean_old_snapshots(root); > mutex_unlock(&root->fs_info->cleaner_mutex); > + wake_up_process(root->fs_info->cleaner_kthread);I am probably missing something, but if the cleaner wakes up here, won''t it attempt cleaning the next snap? Because I don''t see the cleaner checking anywhere that we are unmounting. Or at this point dead_roots is supposed to be empty?> > /* wait until ongoing cleanup work done */ > down_write(&root->fs_info->cleanup_work_sem); > diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c > index ba5a321..ab6a718 100644 > --- a/fs/btrfs/relocation.c > +++ b/fs/btrfs/relocation.c > @@ -4060,10 +4060,7 @@ int btrfs_relocate_block_group(struct btrfs_root *extent_root, u64 group_start) > > while (1) { > mutex_lock(&fs_info->cleaner_mutex); > - > - btrfs_clean_old_snapshots(fs_info->tree_root); > ret = relocate_block_group(rc); > - > mutex_unlock(&fs_info->cleaner_mutex); > if (ret < 0) { > err = ret; > diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c > index 361fb7d..f1e3606 100644 > --- a/fs/btrfs/transaction.c > +++ b/fs/btrfs/transaction.c > @@ -895,7 +895,7 @@ static noinline int commit_cowonly_roots(struct btrfs_trans_handle *trans, > int btrfs_add_dead_root(struct btrfs_root *root) > { > spin_lock(&root->fs_info->trans_lock); > - list_add(&root->root_list, &root->fs_info->dead_roots); > + list_add_tail(&root->root_list, &root->fs_info->dead_roots); > spin_unlock(&root->fs_info->trans_lock); > return 0; > } > @@ -1783,31 +1783,50 @@ cleanup_transaction: > } > > /* > - * interface function to delete all the snapshots we have scheduled for deletion > + * return < 0 if error > + * 0 if there are no more dead_roots at the time of call > + * 1 there are more to be processed, call me again > + * > + * The return value indicates there are certainly more snapshots to delete, but > + * if there comes a new one during processing, it may return 0. We don''t mind, > + * because btrfs_commit_super will poke cleaner thread and it will process it a > + * few seconds later. > */ > -int btrfs_clean_old_snapshots(struct btrfs_root *root) > +int btrfs_clean_one_deleted_snapshot(struct btrfs_root *root) > { > - LIST_HEAD(list); > + int ret; > + int run_again = 1; > struct btrfs_fs_info *fs_info = root->fs_info; > > + if (root->fs_info->sb->s_flags & MS_RDONLY) { > + pr_debug(G "btrfs: cleaner called for RO fs!\n"); > + return 0; > + } > + > spin_lock(&fs_info->trans_lock); > - list_splice_init(&fs_info->dead_roots, &list); > + if (list_empty(&fs_info->dead_roots)) { > + spin_unlock(&fs_info->trans_lock); > + return 0; > + } > + root = list_first_entry(&fs_info->dead_roots, > + struct btrfs_root, root_list); > + list_del(&root->root_list); > spin_unlock(&fs_info->trans_lock); > > - while (!list_empty(&list)) { > - int ret; > - > - root = list_entry(list.next, struct btrfs_root, root_list); > - list_del(&root->root_list); > + pr_debug("btrfs: cleaner removing %llu\n", > + (unsigned long long)root->objectid); > > - btrfs_kill_all_delayed_nodes(root); > + btrfs_kill_all_delayed_nodes(root); > > - if (btrfs_header_backref_rev(root->node) < > - BTRFS_MIXED_BACKREF_REV) > - ret = btrfs_drop_snapshot(root, NULL, 0, 0); > - else > - ret =btrfs_drop_snapshot(root, NULL, 1, 0); > - BUG_ON(ret < 0); > - } > - return 0; > + if (btrfs_header_backref_rev(root->node) < > + BTRFS_MIXED_BACKREF_REV) > + ret = btrfs_drop_snapshot(root, NULL, 0, 0); > + else > + ret = btrfs_drop_snapshot(root, NULL, 1, 0); > + /* > + * If we encounter a transaction abort during snapshot cleaning, we > + * don''t want to crash here > + */ > + BUG_ON(ret < 0 && (ret != -EAGAIN || ret != -EROFS)); > + return run_again || ret == -EAGAIN;Can you tell me when btrfs_drop_snapshot is supposed to return EAGAIN?> } > diff --git a/fs/btrfs/transaction.h b/fs/btrfs/transaction.h > index 69700f7..f8e9583 100644 > --- a/fs/btrfs/transaction.h > +++ b/fs/btrfs/transaction.h > @@ -118,7 +118,7 @@ int btrfs_write_and_wait_transaction(struct btrfs_trans_handle *trans, > > int btrfs_add_dead_root(struct btrfs_root *root); > int btrfs_defrag_root(struct btrfs_root *root, int cacheonly); > -int btrfs_clean_old_snapshots(struct btrfs_root *root); > +int btrfs_clean_one_deleted_snapshot(struct btrfs_root *root); > int btrfs_commit_transaction(struct btrfs_trans_handle *trans, > struct btrfs_root *root); > int btrfs_commit_transaction_async(struct btrfs_trans_handle *trans, > -- > 1.7.9 >Thanks, Alex. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sun, Feb 17, 2013 at 09:55:23PM +0200, Alex Lyakas wrote:> > --- a/fs/btrfs/disk-io.c > > +++ b/fs/btrfs/disk-io.c > > @@ -1635,15 +1635,17 @@ static int cleaner_kthread(void *arg) > > struct btrfs_root *root = arg; > > > > do { > > + int again = 0; > > + > > if (!(root->fs_info->sb->s_flags & MS_RDONLY) && > > mutex_trylock(&root->fs_info->cleaner_mutex)) { > > btrfs_run_delayed_iputs(root); > > - btrfs_clean_old_snapshots(root); > > + again = btrfs_clean_one_deleted_snapshot(root); > > mutex_unlock(&root->fs_info->cleaner_mutex); > > btrfs_run_defrag_inodes(root->fs_info); > > } > > > > - if (!try_to_freeze()) { > > + if (!try_to_freeze() && !again) { > > set_current_state(TASK_INTERRUPTIBLE); > > if (!kthread_should_stop()) > > schedule(); > > @@ -3301,8 +3303,8 @@ int btrfs_commit_super(struct btrfs_root *root) > > > > mutex_lock(&root->fs_info->cleaner_mutex); > > btrfs_run_delayed_iputs(root); > > - btrfs_clean_old_snapshots(root); > > mutex_unlock(&root->fs_info->cleaner_mutex); > > + wake_up_process(root->fs_info->cleaner_kthread); > I am probably missing something, but if the cleaner wakes up here, > won''t it attempt cleaning the next snap? Because I don''t see the > cleaner checking anywhere that we are unmounting. Or at this point > dead_roots is supposed to be empty?No, you''re right, the check of umount semaphore is missing (was in the dusted patchset and was titled ''avoid cleaner deadlock'' which we solve now in another way, so I did not realize the patch is actually needed). So, this hunk should do it: --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -1627,11 +1627,13 @@ static int cleaner_kthread(void *arg) int again = 0; if (!(root->fs_info->sb->s_flags & MS_RDONLY) && + down_read_trylock(&root->fs_info->sb->s_umount) && mutex_trylock(&root->fs_info->cleaner_mutex)) { btrfs_run_delayed_iputs(root); again = btrfs_clean_one_deleted_snapshot(root); mutex_unlock(&root->fs_info->cleaner_mutex); btrfs_run_defrag_inodes(root->fs_info); + up_read(&root->fs_info->sb->s_umount); } if (!try_to_freeze() && !again) { --- Seems that also checking for btrfs_fs_closing != 0 would help here. And to the second part, no dead_roots is not supposed to be empty.> > @@ -1783,31 +1783,50 @@ cleanup_transaction: > > } > > > > /* > > - * interface function to delete all the snapshots we have scheduled for deletion > > + * return < 0 if error > > + * 0 if there are no more dead_roots at the time of call > > + * 1 there are more to be processed, call me again > > + * > > + * The return value indicates there are certainly more snapshots to delete, but > > + * if there comes a new one during processing, it may return 0. We don''t mind, > > + * because btrfs_commit_super will poke cleaner thread and it will process it a > > + * few seconds later. > > */ > > -int btrfs_clean_old_snapshots(struct btrfs_root *root) > > +int btrfs_clean_one_deleted_snapshot(struct btrfs_root *root) > > { > > - LIST_HEAD(list); > > + int ret; > > + int run_again = 1; > > struct btrfs_fs_info *fs_info = root->fs_info; > > > > + if (root->fs_info->sb->s_flags & MS_RDONLY) { > > + pr_debug(G "btrfs: cleaner called for RO fs!\n"); > > + return 0; > > + } > > + > > spin_lock(&fs_info->trans_lock); > > - list_splice_init(&fs_info->dead_roots, &list); > > + if (list_empty(&fs_info->dead_roots)) { > > + spin_unlock(&fs_info->trans_lock); > > + return 0; > > + } > > + root = list_first_entry(&fs_info->dead_roots, > > + struct btrfs_root, root_list); > > + list_del(&root->root_list); > > spin_unlock(&fs_info->trans_lock); > > > > - while (!list_empty(&list)) { > > - int ret; > > - > > - root = list_entry(list.next, struct btrfs_root, root_list); > > - list_del(&root->root_list); > > + pr_debug("btrfs: cleaner removing %llu\n", > > + (unsigned long long)root->objectid); > > > > - btrfs_kill_all_delayed_nodes(root); > > + btrfs_kill_all_delayed_nodes(root); > > > > - if (btrfs_header_backref_rev(root->node) < > > - BTRFS_MIXED_BACKREF_REV) > > - ret = btrfs_drop_snapshot(root, NULL, 0, 0); > > - else > > - ret =btrfs_drop_snapshot(root, NULL, 1, 0); > > - BUG_ON(ret < 0); > > - } > > - return 0; > > + if (btrfs_header_backref_rev(root->node) < > > + BTRFS_MIXED_BACKREF_REV) > > + ret = btrfs_drop_snapshot(root, NULL, 0, 0); > > + else > > + ret = btrfs_drop_snapshot(root, NULL, 1, 0); > > + /* > > + * If we encounter a transaction abort during snapshot cleaning, we > > + * don''t want to crash here > > + */ > > + BUG_ON(ret < 0 && (ret != -EAGAIN || ret != -EROFS)); > > + return run_again || ret == -EAGAIN; > Can you tell me when btrfs_drop_snapshot is supposed to return EAGAIN?That''s another inconsistency of this patch, sorry. --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -6976,6 +6976,12 @@ int btrfs_drop_snapshot(struct btrfs_root *root, wc->reada_count = BTRFS_NODEPTRS_PER_BLOCK(root); while (1) { + if (btrfs_fs_closing(root->fs_info)) { + printk(KERN_DEBUG "btrfs: drop snapshot early exit\n"); + err = -EAGAIN; + goto out_end_trans; + } + ret = walk_down_tree(trans, root, path, wc); if (ret < 0) { err = ret; --- ie. the main loop in drop_snapshot checks for fs going down and returns EAGAIN in that case. Initially the hunk was there, but drop_snapshot is also called from inside reloc loop and the callers do not expect to end it early. (Though it''s possible to enhance the exit paths, it''s beyond of this patch now.) Fortunatelly there''s the for_reloc argument in 6921 int btrfs_drop_snapshot(struct btrfs_root *root, 6922 struct btrfs_block_rsv *block_rsv, int update_ref, 6923 int for_reloc) 6924 { so we can safely exit early only if it''s not in reloc. Thanks for your comments, I''ll send updated version. david -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Each time pick one dead root from the list and let the caller know if it''s needed to continue. This should improve responsiveness during umount and balance which at some point wait for cleaning all currently queued dead roots. A new dead root is added to the end of the list, so the snapshots disappear in the order of deletion. Process snapshot cleaning is now done only from the cleaner thread and the others wake it if needed. Signed-off-by: David Sterba <dsterba@suse.cz> --- v1->v2: - added s_umount trylock in cleaner thread - added exit into drop_snapshot if fs is going down patch based on cmason/integration fs/btrfs/disk-io.c | 10 ++++++-- fs/btrfs/extent-tree.c | 8 ++++++ fs/btrfs/relocation.c | 3 -- fs/btrfs/transaction.c | 57 ++++++++++++++++++++++++++++++++---------------- fs/btrfs/transaction.h | 2 +- 5 files changed, 54 insertions(+), 26 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index eb7c143..cc85fc7 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -1652,15 +1652,19 @@ static int cleaner_kthread(void *arg) struct btrfs_root *root = arg; do { + int again = 0; + if (!(root->fs_info->sb->s_flags & MS_RDONLY) && + down_read_trylock(&root->fs_info->sb->s_umount) && mutex_trylock(&root->fs_info->cleaner_mutex)) { btrfs_run_delayed_iputs(root); - btrfs_clean_old_snapshots(root); + again = btrfs_clean_one_deleted_snapshot(root); mutex_unlock(&root->fs_info->cleaner_mutex); btrfs_run_defrag_inodes(root->fs_info); + up_read(&root->fs_info->sb->s_umount); } - if (!try_to_freeze()) { + if (!try_to_freeze() && !again) { set_current_state(TASK_INTERRUPTIBLE); if (!kthread_should_stop()) schedule(); @@ -3338,8 +3342,8 @@ int btrfs_commit_super(struct btrfs_root *root) mutex_lock(&root->fs_info->cleaner_mutex); btrfs_run_delayed_iputs(root); - btrfs_clean_old_snapshots(root); mutex_unlock(&root->fs_info->cleaner_mutex); + wake_up_process(root->fs_info->cleaner_kthread); /* wait until ongoing cleanup work done */ down_write(&root->fs_info->cleanup_work_sem); diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index d2b3a5e..0119ae7 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -7078,6 +7078,8 @@ static noinline int walk_up_tree(struct btrfs_trans_handle *trans, * reference count by one. if update_ref is true, this function * also make sure backrefs for the shared block and all lower level * blocks are properly updated. + * + * If called with for_reloc == 0, may exit early with -EAGAIN */ int btrfs_drop_snapshot(struct btrfs_root *root, struct btrfs_block_rsv *block_rsv, int update_ref, @@ -7179,6 +7181,12 @@ int btrfs_drop_snapshot(struct btrfs_root *root, wc->reada_count = BTRFS_NODEPTRS_PER_BLOCK(root); while (1) { + if (!for_reloc && btrfs_fs_closing(root->fs_info)) { + pr_debug("btrfs: drop snapshot early exit\n"); + err = -EAGAIN; + goto out_end_trans; + } + ret = walk_down_tree(trans, root, path, wc); if (ret < 0) { err = ret; diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c index ba5a321..ab6a718 100644 --- a/fs/btrfs/relocation.c +++ b/fs/btrfs/relocation.c @@ -4060,10 +4060,7 @@ int btrfs_relocate_block_group(struct btrfs_root *extent_root, u64 group_start) while (1) { mutex_lock(&fs_info->cleaner_mutex); - - btrfs_clean_old_snapshots(fs_info->tree_root); ret = relocate_block_group(rc); - mutex_unlock(&fs_info->cleaner_mutex); if (ret < 0) { err = ret; diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index a83d486..6b233c15 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -950,7 +950,7 @@ static noinline int commit_cowonly_roots(struct btrfs_trans_handle *trans, int btrfs_add_dead_root(struct btrfs_root *root) { spin_lock(&root->fs_info->trans_lock); - list_add(&root->root_list, &root->fs_info->dead_roots); + list_add_tail(&root->root_list, &root->fs_info->dead_roots); spin_unlock(&root->fs_info->trans_lock); return 0; } @@ -1858,31 +1858,50 @@ cleanup_transaction: } /* - * interface function to delete all the snapshots we have scheduled for deletion + * return < 0 if error + * 0 if there are no more dead_roots at the time of call + * 1 there are more to be processed, call me again + * + * The return value indicates there are certainly more snapshots to delete, but + * if there comes a new one during processing, it may return 0. We don''t mind, + * because btrfs_commit_super will poke cleaner thread and it will process it a + * few seconds later. */ -int btrfs_clean_old_snapshots(struct btrfs_root *root) +int btrfs_clean_one_deleted_snapshot(struct btrfs_root *root) { - LIST_HEAD(list); + int ret; + int run_again = 1; struct btrfs_fs_info *fs_info = root->fs_info; + if (root->fs_info->sb->s_flags & MS_RDONLY) { + pr_debug("btrfs: cleaner called for RO fs!\n"); + return 0; + } + spin_lock(&fs_info->trans_lock); - list_splice_init(&fs_info->dead_roots, &list); + if (list_empty(&fs_info->dead_roots)) { + spin_unlock(&fs_info->trans_lock); + return 0; + } + root = list_first_entry(&fs_info->dead_roots, + struct btrfs_root, root_list); + list_del(&root->root_list); spin_unlock(&fs_info->trans_lock); - while (!list_empty(&list)) { - int ret; - - root = list_entry(list.next, struct btrfs_root, root_list); - list_del(&root->root_list); + pr_debug("btrfs: cleaner removing %llu\n", + (unsigned long long)root->objectid); - btrfs_kill_all_delayed_nodes(root); + btrfs_kill_all_delayed_nodes(root); - if (btrfs_header_backref_rev(root->node) < - BTRFS_MIXED_BACKREF_REV) - ret = btrfs_drop_snapshot(root, NULL, 0, 0); - else - ret =btrfs_drop_snapshot(root, NULL, 1, 0); - BUG_ON(ret < 0); - } - return 0; + if (btrfs_header_backref_rev(root->node) < + BTRFS_MIXED_BACKREF_REV) + ret = btrfs_drop_snapshot(root, NULL, 0, 0); + else + ret = btrfs_drop_snapshot(root, NULL, 1, 0); + /* + * If we encounter a transaction abort during snapshot cleaning, we + * don''t want to crash here + */ + BUG_ON(ret < 0 && (ret != -EAGAIN || ret != -EROFS)); + return run_again || ret == -EAGAIN; } diff --git a/fs/btrfs/transaction.h b/fs/btrfs/transaction.h index 5afd7b1..52f6f25 100644 --- a/fs/btrfs/transaction.h +++ b/fs/btrfs/transaction.h @@ -121,7 +121,7 @@ int btrfs_write_and_wait_transaction(struct btrfs_trans_handle *trans, int btrfs_add_dead_root(struct btrfs_root *root); int btrfs_defrag_root(struct btrfs_root *root); -int btrfs_clean_old_snapshots(struct btrfs_root *root); +int btrfs_clean_one_deleted_snapshot(struct btrfs_root *root); int btrfs_commit_transaction(struct btrfs_trans_handle *trans, struct btrfs_root *root); int btrfs_commit_transaction_async(struct btrfs_trans_handle *trans, -- 1.7.9 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
The mail did not make it to the list on Monday, anyway, now I have a few testing results to share. The umount responsiveness is much improved, it took a few seconds when there were many roots to clean scheduled. With $ echo ''func btrfs_drop_snapshot +pf'' > /sys/debug/kernel/dynamic_debug/control $ echo ''func btrfs_clean_one_deleted_snapshot +pf'' > /sys/debug/kernel/dynamic_debug/control one can watch how it proceeds. After umount and and mount, the cleaning process starts again, after 30 seconds when transaction commit triggers. Next test was to let it run with balance (just ''fi balance'', no other options). Worked well, but I''ve hit an oops from balance cancel command that was triggered sometime during the balance. According to the log it happened right after the balance finished. CCing Ilya, I''m not sure if this is somehow induced by the patch. 3407 BUG_ON(fs_info->balance_ctl || atomic_read(&fs_info->balance_running)); 3408 atomic_dec(&fs_info->balance_cancel_req); 3409 mutex_unlock(&fs_info->balance_mutex); 3410 return 0; 3411 } ------------[ cut here ]------------ kernel BUG at fs/btrfs/volumes.c:3407! invalid opcode: 0000 [#1] SMP Modules linked in: btrfs aoe dm_crypt loop [last unloaded: btrfs] CPU 0 Pid: 19117, comm: btrfs Tainted: G W 3.8.0-default+ #267 Intel RIP: 0010:[<ffffffffa0160919>] [<ffffffffa0160919>] btrfs_cancel_balance+0x179/0x180 [btrfs] RSP: 0018:ffff88005fddfd78 EFLAGS: 00010286 RAX: ffff88005fddffd8 RBX: ffff88005adec000 RCX: 0000000000000006 RDX: 0000000000000001 RSI: ffff880027cc88c0 RDI: 2222222222222222 RBP: ffff88005fddfdd8 R08: 2222222222222222 R09: 2222222222222222 R10: 2222222222222222 R11: 0000000000000000 R12: ffff88005adeeac8 R13: ffff88005adeeb70 R14: ffff88005fddfd78 R15: ffff88005adeeb88 FS: 00007fe136daa740(0000) GS:ffff88007dc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00007ff04c655000 CR3: 000000005fc4b000 CR4: 00000000000007f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process btrfs (pid: 19117, threadinfo ffff88005fdde000, task ffff880027cc8240) Stack: 0000000000000000 ffff880027cc8240 ffffffff810745c0 ffff88005fddfd90 ffff88005fddfd90 ffffffff810a6c6d ffff88005fddfdc8 ffff88007724d6c0 ffff880055e6d000 0000000000000002 ffff880078ccc080 ffff880078ccc538 Call Trace: [<ffffffff810745c0>] ? wake_up_bit+0x40/0x40 [<ffffffff810a6c6d>] ? lock_release_holdtime+0x3d/0x1c0 [<ffffffffa016c8a3>] btrfs_ioctl+0x693/0x1d80 [btrfs] [<ffffffff81079723>] ? up_read+0x23/0x40 [<ffffffff8195e268>] ? __do_page_fault+0x238/0x590 [<ffffffff81087d1f>] ? local_clock+0x6f/0x80 [<ffffffff810a6909>] ? trace_hardirqs_off_caller+0x29/0xc0 [<ffffffff811708f8>] do_vfs_ioctl+0x98/0x560 [<ffffffff8195a906>] ? error_sti+0x5/0x6 [<ffffffff8195a458>] ? retint_swapgs+0x13/0x1b [<ffffffff81170e17>] sys_ioctl+0x57/0x90 [<ffffffff8139adce>] ? trace_hardirqs_on_thunk+0x3a/0x3f [<ffffffff81962e99>] system_call_fastpath+0x16/0x1b RIP [<ffffffffa0160919>] btrfs_cancel_balance+0x179/0x180 [btrfs] RSP <ffff88005fddfd78> ---[ end trace 930320f35566d010 ]--- -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi David, On Fri, Mar 1, 2013 at 6:17 PM, David Sterba <dsterba@suse.cz> wrote:> Each time pick one dead root from the list and let the caller know if > it''s needed to continue. This should improve responsiveness during > umount and balance which at some point wait for cleaning all currently > queued dead roots. > > A new dead root is added to the end of the list, so the snapshots > disappear in the order of deletion. > > Process snapshot cleaning is now done only from the cleaner thread and > the others wake it if needed. > > Signed-off-by: David Sterba <dsterba@suse.cz> > --- > > v1->v2: > - added s_umount trylock in cleaner thread > - added exit into drop_snapshot if fs is going down > > patch based on cmason/integration > > fs/btrfs/disk-io.c | 10 ++++++-- > fs/btrfs/extent-tree.c | 8 ++++++ > fs/btrfs/relocation.c | 3 -- > fs/btrfs/transaction.c | 57 ++++++++++++++++++++++++++++++++---------------- > fs/btrfs/transaction.h | 2 +- > 5 files changed, 54 insertions(+), 26 deletions(-) > > diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c > index eb7c143..cc85fc7 100644 > --- a/fs/btrfs/disk-io.c > +++ b/fs/btrfs/disk-io.c > @@ -1652,15 +1652,19 @@ static int cleaner_kthread(void *arg) > struct btrfs_root *root = arg; > > do { > + int again = 0; > + > if (!(root->fs_info->sb->s_flags & MS_RDONLY) && > + down_read_trylock(&root->fs_info->sb->s_umount) && > mutex_trylock(&root->fs_info->cleaner_mutex)) { > btrfs_run_delayed_iputs(root); > - btrfs_clean_old_snapshots(root); > + again = btrfs_clean_one_deleted_snapshot(root); > mutex_unlock(&root->fs_info->cleaner_mutex); > btrfs_run_defrag_inodes(root->fs_info); > + up_read(&root->fs_info->sb->s_umount); > } > > - if (!try_to_freeze()) { > + if (!try_to_freeze() && !again) { > set_current_state(TASK_INTERRUPTIBLE); > if (!kthread_should_stop()) > schedule(); > @@ -3338,8 +3342,8 @@ int btrfs_commit_super(struct btrfs_root *root) > > mutex_lock(&root->fs_info->cleaner_mutex); > btrfs_run_delayed_iputs(root); > - btrfs_clean_old_snapshots(root); > mutex_unlock(&root->fs_info->cleaner_mutex); > + wake_up_process(root->fs_info->cleaner_kthread); > > /* wait until ongoing cleanup work done */ > down_write(&root->fs_info->cleanup_work_sem); > diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c > index d2b3a5e..0119ae7 100644 > --- a/fs/btrfs/extent-tree.c > +++ b/fs/btrfs/extent-tree.c > @@ -7078,6 +7078,8 @@ static noinline int walk_up_tree(struct btrfs_trans_handle *trans, > * reference count by one. if update_ref is true, this function > * also make sure backrefs for the shared block and all lower level > * blocks are properly updated. > + * > + * If called with for_reloc == 0, may exit early with -EAGAIN > */ > int btrfs_drop_snapshot(struct btrfs_root *root, > struct btrfs_block_rsv *block_rsv, int update_ref, > @@ -7179,6 +7181,12 @@ int btrfs_drop_snapshot(struct btrfs_root *root, > wc->reada_count = BTRFS_NODEPTRS_PER_BLOCK(root); > > while (1) { > + if (!for_reloc && btrfs_fs_closing(root->fs_info)) { > + pr_debug("btrfs: drop snapshot early exit\n"); > + err = -EAGAIN; > + goto out_end_trans; > + } > + > ret = walk_down_tree(trans, root, path, wc); > if (ret < 0) { > err = ret; > diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c > index ba5a321..ab6a718 100644 > --- a/fs/btrfs/relocation.c > +++ b/fs/btrfs/relocation.c > @@ -4060,10 +4060,7 @@ int btrfs_relocate_block_group(struct btrfs_root *extent_root, u64 group_start) > > while (1) { > mutex_lock(&fs_info->cleaner_mutex); > - > - btrfs_clean_old_snapshots(fs_info->tree_root); > ret = relocate_block_group(rc); > - > mutex_unlock(&fs_info->cleaner_mutex); > if (ret < 0) { > err = ret; > diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c > index a83d486..6b233c15 100644 > --- a/fs/btrfs/transaction.c > +++ b/fs/btrfs/transaction.c > @@ -950,7 +950,7 @@ static noinline int commit_cowonly_roots(struct btrfs_trans_handle *trans, > int btrfs_add_dead_root(struct btrfs_root *root) > { > spin_lock(&root->fs_info->trans_lock); > - list_add(&root->root_list, &root->fs_info->dead_roots); > + list_add_tail(&root->root_list, &root->fs_info->dead_roots); > spin_unlock(&root->fs_info->trans_lock); > return 0; > } > @@ -1858,31 +1858,50 @@ cleanup_transaction: > } > > /* > - * interface function to delete all the snapshots we have scheduled for deletion > + * return < 0 if error > + * 0 if there are no more dead_roots at the time of call > + * 1 there are more to be processed, call me again > + * > + * The return value indicates there are certainly more snapshots to delete, but > + * if there comes a new one during processing, it may return 0. We don''t mind, > + * because btrfs_commit_super will poke cleaner thread and it will process it a > + * few seconds later. > */ > -int btrfs_clean_old_snapshots(struct btrfs_root *root) > +int btrfs_clean_one_deleted_snapshot(struct btrfs_root *root) > { > - LIST_HEAD(list); > + int ret; > + int run_again = 1; > struct btrfs_fs_info *fs_info = root->fs_info; > > + if (root->fs_info->sb->s_flags & MS_RDONLY) { > + pr_debug("btrfs: cleaner called for RO fs!\n"); > + return 0; > + } > + > spin_lock(&fs_info->trans_lock); > - list_splice_init(&fs_info->dead_roots, &list); > + if (list_empty(&fs_info->dead_roots)) { > + spin_unlock(&fs_info->trans_lock); > + return 0; > + } > + root = list_first_entry(&fs_info->dead_roots, > + struct btrfs_root, root_list); > + list_del(&root->root_list); > spin_unlock(&fs_info->trans_lock); > > - while (!list_empty(&list)) { > - int ret; > - > - root = list_entry(list.next, struct btrfs_root, root_list); > - list_del(&root->root_list); > + pr_debug("btrfs: cleaner removing %llu\n", > + (unsigned long long)root->objectid); > > - btrfs_kill_all_delayed_nodes(root); > + btrfs_kill_all_delayed_nodes(root); > > - if (btrfs_header_backref_rev(root->node) < > - BTRFS_MIXED_BACKREF_REV) > - ret = btrfs_drop_snapshot(root, NULL, 0, 0); > - else > - ret =btrfs_drop_snapshot(root, NULL, 1, 0); > - BUG_ON(ret < 0); > - } > - return 0; > + if (btrfs_header_backref_rev(root->node) < > + BTRFS_MIXED_BACKREF_REV) > + ret = btrfs_drop_snapshot(root, NULL, 0, 0); > + else > + ret = btrfs_drop_snapshot(root, NULL, 1, 0); > + /* > + * If we encounter a transaction abort during snapshot cleaning, we > + * don''t want to crash here > + */ > + BUG_ON(ret < 0 && (ret != -EAGAIN || ret != -EROFS)); > + return run_again || ret == -EAGAIN;I think that this expression will always evaluate to "true", because "run_again" is set to 1 and never reset. So now if btrfs_drop_snapshot() returns -EAGAIN, then btrfs_clean_one_deleted_snapshot() will return 1, thus telling the caller "call me again", like your comment says. However, in this case we are unmounting, so is this ok? Or this is just to avoid the cleaner going to sleep?> } > diff --git a/fs/btrfs/transaction.h b/fs/btrfs/transaction.h > index 5afd7b1..52f6f25 100644 > --- a/fs/btrfs/transaction.h > +++ b/fs/btrfs/transaction.h > @@ -121,7 +121,7 @@ int btrfs_write_and_wait_transaction(struct btrfs_trans_handle *trans, > > int btrfs_add_dead_root(struct btrfs_root *root); > int btrfs_defrag_root(struct btrfs_root *root); > -int btrfs_clean_old_snapshots(struct btrfs_root *root); > +int btrfs_clean_one_deleted_snapshot(struct btrfs_root *root); > int btrfs_commit_transaction(struct btrfs_trans_handle *trans, > struct btrfs_root *root); > int btrfs_commit_transaction_async(struct btrfs_trans_handle *trans, > -- > 1.7.9 >Also, I want to ask, hope this is not inappropriate. Do you also agree with Josef, that it''s ok for BTRFS_IOC_SNAP_DESTROY not to commit the transaction, but just to detach from it? Had we committed, we would have ensured that ORPHAN_ITEM is in the root tree, thus preventing from subvol to re-appear after crash. It seems a little inconsistent with snap creation, where not only the transaction is committed, but delalloc flush is performed to ensure that all data is on disk before creating the snap. Thanks, Alex. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Mar 01, 2013 at 05:30:57PM +0100, David Sterba wrote:> The mail did not make it to the list on Monday, anyway, now I have a > few testing results to share. > > The umount responsiveness is much improved, it took a few seconds when > there were many roots to clean scheduled. > > With > > $ echo ''func btrfs_drop_snapshot +pf'' > /sys/debug/kernel/dynamic_debug/control > $ echo ''func btrfs_clean_one_deleted_snapshot +pf'' > /sys/debug/kernel/dynamic_debug/control > > one can watch how it proceeds. > > After umount and and mount, the cleaning process starts again, after 30 > seconds when transaction commit triggers. > > Next test was to let it run with balance (just ''fi balance'', no other > options). Worked well, but I''ve hit an oops from balance cancel command that > was triggered sometime during the balance. According to the log it happened > right after the balance finished. CCing Ilya, I''m not sure if this is somehow > induced by the patch. > > 3407 BUG_ON(fs_info->balance_ctl || atomic_read(&fs_info->balance_running));Were you by any chance running this on top of a for-linus that included a raid56 merge commit? If so, this is result of a raid56 mismerge -- fs_info->balance_ctl is supposed to be cleared in __cancel_balance(), a call to which was accidentally nuked. Thanks, Ilya -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Mar 06, 2013 at 11:08:57AM +0200, Ilya Dryomov wrote:> On Fri, Mar 01, 2013 at 05:30:57PM +0100, David Sterba wrote: > > 3407 BUG_ON(fs_info->balance_ctl || atomic_read(&fs_info->balance_running)); > > Were you by any chance running this on top of a for-linus that included > a raid56 merge commit? If so, this is result of a raid56 mismerge -- > fs_info->balance_ctl is supposed to be cleared in __cancel_balance(), a > call to which was accidentally nuked.I''ve been testing either plain next/master or based on linus'' tree, so yes, this merge was there and the missing cancel explains the BUG_ON. david -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, Mar 02, 2013 at 11:32:07PM +0200, Alex Lyakas wrote:> On Fri, Mar 1, 2013 at 6:17 PM, David Sterba <dsterba@suse.cz> wrote: > > - btrfs_kill_all_delayed_nodes(root); > > + btrfs_kill_all_delayed_nodes(root); > > > > - if (btrfs_header_backref_rev(root->node) < > > - BTRFS_MIXED_BACKREF_REV) > > - ret = btrfs_drop_snapshot(root, NULL, 0, 0); > > - else > > - ret =btrfs_drop_snapshot(root, NULL, 1, 0); > > - BUG_ON(ret < 0); > > - } > > - return 0; > > + if (btrfs_header_backref_rev(root->node) < > > + BTRFS_MIXED_BACKREF_REV) > > + ret = btrfs_drop_snapshot(root, NULL, 0, 0); > > + else > > + ret = btrfs_drop_snapshot(root, NULL, 1, 0); > > + /* > > + * If we encounter a transaction abort during snapshot cleaning, we > > + * don''t want to crash here > > + */ > > + BUG_ON(ret < 0 && (ret != -EAGAIN || ret != -EROFS)); > > + return run_again || ret == -EAGAIN; > I think that this expression will always evaluate to "true", because > "run_again" is set to 1 and never reset.That''s right, I probably had some intentions in the past, but now it''s just superflous.> So now if btrfs_drop_snapshot() returns -EAGAIN, then > btrfs_clean_one_deleted_snapshot() will return 1, thus telling the > caller "call me again", like your comment says. However, in this case > we are unmounting, so is this ok? Or this is just to avoid the cleaner > going to sleep?close_ctree calls kthread_stop(cleaner) which would wake cleaner anyway. The cases when we want send cleaner to sleep are when there''s nothing to do or the read-only case which is just a safety check. From that point, the last statement should be ''return 1'' and run_again removed.> Also, I want to ask, hope this is not inappropriate. Do you also agree > with Josef, that it''s ok for BTRFS_IOC_SNAP_DESTROY not to commit the > transaction, but just to detach from it? Had we committed, we would > have ensured that ORPHAN_ITEM is in the root tree, thus preventing > from subvol to re-appear after crash. > It seems a little inconsistent with snap creation, where not only the > transaction is committed, but delalloc flush is performed to ensure > that all data is on disk before creating the snap.That''s another question, can you please point me to the thread where this was discussed? thanks, david -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Mar 06, 2013 at 11:09:15AM -0700, David Sterba wrote:> On Sat, Mar 02, 2013 at 11:32:07PM +0200, Alex Lyakas wrote: > > On Fri, Mar 1, 2013 at 6:17 PM, David Sterba <dsterba@suse.cz> wrote: > > > - btrfs_kill_all_delayed_nodes(root); > > > + btrfs_kill_all_delayed_nodes(root); > > > > > > - if (btrfs_header_backref_rev(root->node) < > > > - BTRFS_MIXED_BACKREF_REV) > > > - ret = btrfs_drop_snapshot(root, NULL, 0, 0); > > > - else > > > - ret =btrfs_drop_snapshot(root, NULL, 1, 0); > > > - BUG_ON(ret < 0); > > > - } > > > - return 0; > > > + if (btrfs_header_backref_rev(root->node) < > > > + BTRFS_MIXED_BACKREF_REV) > > > + ret = btrfs_drop_snapshot(root, NULL, 0, 0); > > > + else > > > + ret = btrfs_drop_snapshot(root, NULL, 1, 0); > > > + /* > > > + * If we encounter a transaction abort during snapshot cleaning, we > > > + * don''t want to crash here > > > + */ > > > + BUG_ON(ret < 0 && (ret != -EAGAIN || ret != -EROFS)); > > > + return run_again || ret == -EAGAIN; > > I think that this expression will always evaluate to "true", because > > "run_again" is set to 1 and never reset. > > That''s right, I probably had some intentions in the past, but now it''s > just superflous. > > > So now if btrfs_drop_snapshot() returns -EAGAIN, then > > btrfs_clean_one_deleted_snapshot() will return 1, thus telling the > > caller "call me again", like your comment says. However, in this case > > we are unmounting, so is this ok? Or this is just to avoid the cleaner > > going to sleep? > > close_ctree calls kthread_stop(cleaner) which would wake cleaner anyway. > > The cases when we want send cleaner to sleep are when there''s nothing to > do or the read-only case which is just a safety check. > > From that point, the last statement should be ''return 1'' and run_again > removed. > > > Also, I want to ask, hope this is not inappropriate. Do you also agree > > with Josef, that it''s ok for BTRFS_IOC_SNAP_DESTROY not to commit the > > transaction, but just to detach from it? Had we committed, we would > > have ensured that ORPHAN_ITEM is in the root tree, thus preventing > > from subvol to re-appear after crash. > > It seems a little inconsistent with snap creation, where not only the > > transaction is committed, but delalloc flush is performed to ensure > > that all data is on disk before creating the snap. > > That''s another question, can you please point me to the thread where > this was discussed?That''s a really old one. The original snapshot code expected people to run sync first, but that''s not very user friendly. The idea is that if you write a file and then take a snapshot, that file should be in the snapshot. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Mar 06, 2013 at 10:12:11PM -0500, Chris Mason wrote:> > > Also, I want to ask, hope this is not inappropriate. Do you also agree > > > with Josef, that it''s ok for BTRFS_IOC_SNAP_DESTROY not to commit the > > > transaction, but just to detach from it? Had we committed, we would > > > have ensured that ORPHAN_ITEM is in the root tree, thus preventing > > > from subvol to re-appear after crash. > > > It seems a little inconsistent with snap creation, where not only the > > > transaction is committed, but delalloc flush is performed to ensure > > > that all data is on disk before creating the snap. > > > > That''s another question, can you please point me to the thread where > > this was discussed? > > That''s a really old one. The original snapshot code expected people to > run sync first, but that''s not very user friendly. The idea is that if > you write a file and then take a snapshot, that file should be in the > snapshot.The snapshot behaviour sounds ok to me. That a subvol/snapshot may appear after crash if transation commit did not happen does not feel so good. We know that the subvol is only scheduled for deletion and needs to be processed by cleaner. From that point I''d rather see the commit to happen to avoid any unexpected surprises. A subvolume that re-appears still holds the data references and consumes space although the user does not assume that. Automated snapshotting and deleting needs some guarantees about the behaviour and what to do after a crash. So now it has to process the backlog of previously deleted snapshots and verify that they''re not there, compared to "deleted -> will never appear, can forget about it". david -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi David, On Thu, Mar 7, 2013 at 1:55 PM, David Sterba <dsterba@suse.cz> wrote:> On Wed, Mar 06, 2013 at 10:12:11PM -0500, Chris Mason wrote: >> > > Also, I want to ask, hope this is not inappropriate. Do you also agree >> > > with Josef, that it''s ok for BTRFS_IOC_SNAP_DESTROY not to commit the >> > > transaction, but just to detach from it? Had we committed, we would >> > > have ensured that ORPHAN_ITEM is in the root tree, thus preventing >> > > from subvol to re-appear after crash. >> > > It seems a little inconsistent with snap creation, where not only the >> > > transaction is committed, but delalloc flush is performed to ensure >> > > that all data is on disk before creating the snap. >> > >> > That''s another question, can you please point me to the thread where >> > this was discussed?http://www.spinics.net/lists/linux-btrfs/msg22256.html>> >> That''s a really old one. The original snapshot code expected people to >> run sync first, but that''s not very user friendly. The idea is that if >> you write a file and then take a snapshot, that file should be in the >> snapshot. > > The snapshot behaviour sounds ok to me. > > That a subvol/snapshot may appear after crash if transation commit did > not happen does not feel so good. We know that the subvol is only > scheduled for deletion and needs to be processed by cleaner. > > From that point I''d rather see the commit to happen to avoid any > unexpected surprises. A subvolume that re-appears still holds the data > references and consumes space although the user does not assume that. > > Automated snapshotting and deleting needs some guarantees about the > behaviour and what to do after a crash. So now it has to process the > backlog of previously deleted snapshots and verify that they''re not > there, compared to "deleted -> will never appear, can forget about it".Exactly. Currently, the user space has no idea when the deletion will start, or when it is completed (it has to track the ROOT_ITEM, drop progress, ORPHAN_ITEM etc.). That''s why I was thinking, that at least committing a transaction on snap_destroy could ensure that deletion will not be reverted. Thanks, Alex. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html