Delayed allocation ref mutexes are taken [1] inside btrfs_commit_transaction. A later call fails and jumps to the cleanup_transaction label (transaction.c:1501) with these mutexes still held causing deadlock [2] when they are reacquired. Either we can introduce an earlier label (cleanup_transaction_lock) and function to unlock these mutexes or can tweak btrfs_destroy_delayed_refs to conditionally use mutex_try_lock. What is the suggested approach? Thanks, Daniel --- [1] btrfs_commit_transaction -> btrfs_run_delayed_refs -> run_clustered_refs -> btrfs_delayed_ref_lock -> struct btrfs_delayed_ref_head -> mutex --- [2] btrfs bad tree block start 0 39845888 btrfs bad tree block start 0 39845888 btrfs: run_one_delayed_ref returned -5 WARNING: at fs/btrfs/super.c:219 __btrfs_abort_transaction+0xa6/0xc0 [btrfs]() Hardware name: Latitude E5420 btrfs: Transaction aborted Modules linked in: brd nls_iso8859_1 nls_cp437 vfat fat dm_crypt dm_mod kvm_intel kvm coretemp binfmt_misc microcode uvcvideo videobuf2_core videodev videobuf2_vmalloc videobuf2_memops iwlwifi btrfs i915 cfbcopyarea cfbimgblt cfbfillrect video usb_storage Pid: 14985, comm: btrfs-endio-wri Tainted: G W 3.4.0-rc6-debug #14 Call Trace: [<ffffffff8103c5ca>] warn_slowpath_common+0x7a/0xb0 [<ffffffff8103c6a1>] warn_slowpath_fmt+0x41/0x50 [<ffffffff8108e9cd>] ? __lock_release+0xad/0xd0 [<ffffffffa0094c76>] __btrfs_abort_transaction+0xa6/0xc0 [btrfs] [<ffffffffa00a87a6>] btrfs_run_delayed_refs+0x296/0x300 [btrfs] [<ffffffffa00b9ad7>] __btrfs_end_transaction+0xa7/0x360 [btrfs] [<ffffffffa00b9df0>] btrfs_end_transaction+0x10/0x20 [btrfs] [<ffffffffa00c049d>] btrfs_finish_ordered_io+0x17d/0x3b0 [btrfs] [<ffffffff8108f505>] ? trace_hardirqs_on_caller+0x105/0x190 [<ffffffffa00c06e5>] btrfs_writepage_end_io_hook+0x15/0x20 [btrfs] [<ffffffffa00dbbb8>] end_extent_writepage+0x58/0x100 [btrfs] [<ffffffffa00dbcc4>] end_bio_extent_writepage+0x64/0x90 [btrfs] [<ffffffff81147458>] bio_endio+0x18/0x30 [<ffffffffa00b1efc>] end_workqueue_fn+0x3c/0x50 [btrfs] [<ffffffffa00e8cc6>] worker_loop+0x86/0x330 [btrfs] [<ffffffffa00e8c40>] ? check_pending_worker_creates.isra.1+0xd0/0xd0 [btrfs] [<ffffffff8105da6e>] kthread+0x8e/0xa0 [<ffffffff815b1b94>] kernel_thread_helper+0x4/0x10 [<ffffffff815b0259>] ? retint_restore_args+0xe/0xe [<ffffffff8105d9e0>] ? __init_kthread_worker+0x70/0x70 [<ffffffff815b1b90>] ? gs_change+0xb/0xb ---[ end trace df06b72f93439fa3 ]--- BTRFS warning (device ram1): Aborting unused transaction. btrfs bad tree block start 0 39845888 btrfs bad tree block start 0 39845888 btrfs: run_one_delayed_ref returned -5 BTRFS error (device ram1) in btrfs_run_delayed_refs:2454: IO failure btrfs is forced readonly BTRFS warning (device ram1): Skipping commit of aborted transaction. ============================================[ INFO: possible recursive locking detected ] 3.4.0-rc6-debug #14 Tainted: G W --------------------------------------------- btrfs/18749 is trying to acquire lock: (&head_ref->mutex){+.+...}, at: [<ffffffffa00b22a9>] btrfs_destroy_delayed_refs.isra.96+0xf9/0x210 [btrfs] but task is already holding lock: (&head_ref->mutex){+.+...}, at: [<ffffffffa00fce07>] btrfs_delayed_ref_lock+0x37/0x140 [btrfs] other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&head_ref->mutex); lock(&head_ref->mutex); *** DEADLOCK *** May be due to missing lock nesting notation 3 locks held by btrfs/18749: #0: (&type->i_mutex_dir_key#4/1){+.+.+.}, at: [<ffffffffa00eb86e>] btrfs_mksubvol+0x4e/0x1a0 [btrfs] #1: (&fs_info->subvol_sem){++++..}, at: [<ffffffffa00eb927>] btrfs_mksubvol+0x107/0x1a0 [btrfs] #2: (&head_ref->mutex){+.+...}, at: [<ffffffffa00fce07>] btrfs_delayed_ref_lock+0x37/0x140 [btrfs] stack backtrace: Pid: 18749, comm: btrfs Tainted: G W 3.4.0-rc6-debug #14 Call Trace: [<ffffffff8108b913>] print_deadlock_bug+0xf3/0x100 [<ffffffff8108bb02>] check_deadlock.isra.29+0x1e2/0x1f0 [<ffffffff8108d443>] validate_chain.isra.33+0x383/0x510 [<ffffffff8108dff8>] __lock_acquire+0x388/0x900 [<ffffffff8108ea95>] lock_acquire+0x55/0x70 [<ffffffffa00b22a9>] ? btrfs_destroy_delayed_refs.isra.96+0xf9/0x210 [btrfs] [<ffffffff815ad38b>] mutex_lock_nested+0x6b/0x340 [<ffffffffa00b22a9>] ? btrfs_destroy_delayed_refs.isra.96+0xf9/0x210 [btrfs] [<ffffffff8108e9cd>] ? __lock_release+0xad/0xd0 [<ffffffffa00b22a9>] btrfs_destroy_delayed_refs.isra.96+0xf9/0x210 [btrfs] [<ffffffffa00b5682>] btrfs_cleanup_one_transaction+0x12/0x100 [btrfs] [<ffffffffa00b8976>] cleanup_transaction+0x76/0xf0 [btrfs] [<ffffffffa00b91f1>] btrfs_commit_transaction+0xf1/0x900 [btrfs] [<ffffffff8105e250>] ? __init_waitqueue_head+0x60/0x60 [<ffffffffa00eb7eb>] create_snapshot.isra.46+0x1ab/0x1e0 [btrfs] [<ffffffffa00eb955>] btrfs_mksubvol+0x135/0x1a0 [btrfs] [<ffffffff811158e0>] ? files_lglock_local_lock+0x70/0x70 [<ffffffffa00ebaea>] btrfs_ioctl_snap_create_transid+0x12a/0x190 [btrfs] [<ffffffffa00ec8b0>] btrfs_ioctl_snap_create_v2.constprop.57+0xe0/0xf0 [btrfs] [<ffffffff815ae241>] ? __schedule+0x351/0x8b0 [<ffffffffa00eee39>] btrfs_ioctl+0x409/0x770 [btrfs] [<ffffffff81128767>] do_vfs_ioctl+0x87/0x340 [<ffffffff81128a6a>] sys_ioctl+0x4a/0x80 [<ffffffff815b09a2>] system_call_fastpath+0x16/0x1b -- Daniel J Blueman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 05/08/2012 04:56 PM, Daniel J Blueman wrote:> Delayed allocation ref mutexes are taken [1] inside > btrfs_commit_transaction. A later call fails and jumps to the > cleanup_transaction label (transaction.c:1501) with these mutexes > still held causing deadlock [2] when they are reacquired. > > Either we can introduce an earlier label (cleanup_transaction_lock) > and function to unlock these mutexes or can tweak > btrfs_destroy_delayed_refs to conditionally use mutex_try_lock. > > What is the suggested approach? >Hi Daniel, I prefer mutex_try_lock, just as other places do. You can give it a try. :) thanks, liubo> Thanks, > Daniel > > --- [1] > > btrfs_commit_transaction -> btrfs_run_delayed_refs -> > run_clustered_refs -> btrfs_delayed_ref_lock -> struct > btrfs_delayed_ref_head -> mutex > > --- [2] > > btrfs bad tree block start 0 39845888 > btrfs bad tree block start 0 39845888 > btrfs: run_one_delayed_ref returned -5 > WARNING: at fs/btrfs/super.c:219 __btrfs_abort_transaction+0xa6/0xc0 [btrfs]() > Hardware name: Latitude E5420 > btrfs: Transaction aborted > Modules linked in: brd nls_iso8859_1 nls_cp437 vfat fat dm_crypt > dm_mod kvm_intel kvm coretemp binfmt_misc microcode uvcvideo > videobuf2_core videodev videobuf2_vmalloc videobuf2_memops iwlwifi > btrfs i915 cfbcopyarea cfbimgblt cfbfillrect video usb_storage > Pid: 14985, comm: btrfs-endio-wri Tainted: G W 3.4.0-rc6-debug #14 > Call Trace: > [<ffffffff8103c5ca>] warn_slowpath_common+0x7a/0xb0 > [<ffffffff8103c6a1>] warn_slowpath_fmt+0x41/0x50 > [<ffffffff8108e9cd>] ? __lock_release+0xad/0xd0 > [<ffffffffa0094c76>] __btrfs_abort_transaction+0xa6/0xc0 [btrfs] > [<ffffffffa00a87a6>] btrfs_run_delayed_refs+0x296/0x300 [btrfs] > [<ffffffffa00b9ad7>] __btrfs_end_transaction+0xa7/0x360 [btrfs] > [<ffffffffa00b9df0>] btrfs_end_transaction+0x10/0x20 [btrfs] > [<ffffffffa00c049d>] btrfs_finish_ordered_io+0x17d/0x3b0 [btrfs] > [<ffffffff8108f505>] ? trace_hardirqs_on_caller+0x105/0x190 > [<ffffffffa00c06e5>] btrfs_writepage_end_io_hook+0x15/0x20 [btrfs] > [<ffffffffa00dbbb8>] end_extent_writepage+0x58/0x100 [btrfs] > [<ffffffffa00dbcc4>] end_bio_extent_writepage+0x64/0x90 [btrfs] > [<ffffffff81147458>] bio_endio+0x18/0x30 > [<ffffffffa00b1efc>] end_workqueue_fn+0x3c/0x50 [btrfs] > [<ffffffffa00e8cc6>] worker_loop+0x86/0x330 [btrfs] > [<ffffffffa00e8c40>] ? check_pending_worker_creates.isra.1+0xd0/0xd0 [btrfs] > [<ffffffff8105da6e>] kthread+0x8e/0xa0 > [<ffffffff815b1b94>] kernel_thread_helper+0x4/0x10 > [<ffffffff815b0259>] ? retint_restore_args+0xe/0xe > [<ffffffff8105d9e0>] ? __init_kthread_worker+0x70/0x70 > [<ffffffff815b1b90>] ? gs_change+0xb/0xb > ---[ end trace df06b72f93439fa3 ]--- > BTRFS warning (device ram1): Aborting unused transaction. > btrfs bad tree block start 0 39845888 > btrfs bad tree block start 0 39845888 > btrfs: run_one_delayed_ref returned -5 > BTRFS error (device ram1) in btrfs_run_delayed_refs:2454: IO failure > btrfs is forced readonly > BTRFS warning (device ram1): Skipping commit of aborted transaction. > > ============================================> [ INFO: possible recursive locking detected ] > 3.4.0-rc6-debug #14 Tainted: G W > --------------------------------------------- > btrfs/18749 is trying to acquire lock: > (&head_ref->mutex){+.+...}, at: [<ffffffffa00b22a9>] > btrfs_destroy_delayed_refs.isra.96+0xf9/0x210 [btrfs] > > but task is already holding lock: > (&head_ref->mutex){+.+...}, at: [<ffffffffa00fce07>] > btrfs_delayed_ref_lock+0x37/0x140 [btrfs] > > other info that might help us debug this: > Possible unsafe locking scenario: > > CPU0 > ---- > lock(&head_ref->mutex); > lock(&head_ref->mutex); > > *** DEADLOCK *** > > May be due to missing lock nesting notation > > 3 locks held by btrfs/18749: > #0: (&type->i_mutex_dir_key#4/1){+.+.+.}, at: [<ffffffffa00eb86e>] > btrfs_mksubvol+0x4e/0x1a0 [btrfs] > #1: (&fs_info->subvol_sem){++++..}, at: [<ffffffffa00eb927>] > btrfs_mksubvol+0x107/0x1a0 [btrfs] > #2: (&head_ref->mutex){+.+...}, at: [<ffffffffa00fce07>] > btrfs_delayed_ref_lock+0x37/0x140 [btrfs] > > stack backtrace: > Pid: 18749, comm: btrfs Tainted: G W 3.4.0-rc6-debug #14 > Call Trace: > [<ffffffff8108b913>] print_deadlock_bug+0xf3/0x100 > [<ffffffff8108bb02>] check_deadlock.isra.29+0x1e2/0x1f0 > [<ffffffff8108d443>] validate_chain.isra.33+0x383/0x510 > [<ffffffff8108dff8>] __lock_acquire+0x388/0x900 > [<ffffffff8108ea95>] lock_acquire+0x55/0x70 > [<ffffffffa00b22a9>] ? btrfs_destroy_delayed_refs.isra.96+0xf9/0x210 [btrfs] > [<ffffffff815ad38b>] mutex_lock_nested+0x6b/0x340 > [<ffffffffa00b22a9>] ? btrfs_destroy_delayed_refs.isra.96+0xf9/0x210 [btrfs] > [<ffffffff8108e9cd>] ? __lock_release+0xad/0xd0 > [<ffffffffa00b22a9>] btrfs_destroy_delayed_refs.isra.96+0xf9/0x210 [btrfs] > [<ffffffffa00b5682>] btrfs_cleanup_one_transaction+0x12/0x100 [btrfs] > [<ffffffffa00b8976>] cleanup_transaction+0x76/0xf0 [btrfs] > [<ffffffffa00b91f1>] btrfs_commit_transaction+0xf1/0x900 [btrfs] > [<ffffffff8105e250>] ? __init_waitqueue_head+0x60/0x60 > [<ffffffffa00eb7eb>] create_snapshot.isra.46+0x1ab/0x1e0 [btrfs] > [<ffffffffa00eb955>] btrfs_mksubvol+0x135/0x1a0 [btrfs] > [<ffffffff811158e0>] ? files_lglock_local_lock+0x70/0x70 > [<ffffffffa00ebaea>] btrfs_ioctl_snap_create_transid+0x12a/0x190 [btrfs] > [<ffffffffa00ec8b0>] btrfs_ioctl_snap_create_v2.constprop.57+0xe0/0xf0 [btrfs] > [<ffffffff815ae241>] ? __schedule+0x351/0x8b0 > [<ffffffffa00eee39>] btrfs_ioctl+0x409/0x770 [btrfs] > [<ffffffff81128767>] do_vfs_ioctl+0x87/0x340 > [<ffffffff81128a6a>] sys_ioctl+0x4a/0x80 > [<ffffffff815b09a2>] system_call_fastpath+0x16/0x1b-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On tue, 8 May 2012 16:56:27 +0800, Daniel J Blueman wrote:> Delayed allocation ref mutexes are taken [1] inside > btrfs_commit_transaction. A later call fails and jumps to the > cleanup_transaction label (transaction.c:1501) with these mutexes > still held causing deadlock [2] when they are reacquired. > > Either we can introduce an earlier label (cleanup_transaction_lock) > and function to unlock these mutexes or can tweak > btrfs_destroy_delayed_refs to conditionally use mutex_try_lock. > > What is the suggested approach?I think we can unlock those mutexes at the suitable place. Could you try this patch? Title:[PATCH] Btrfs: fix deadlock when the process of delayed refs fails Delayed ref mutexes are taken inside btrfs_commit_transaction. A later call fails and jumps to the cleanup_transaction label with these mutexes still held causing deadlock when they are reacquired. Fix this problem by unlocking those mutexes at the suitable place. Reported-by: Daniel J Blueman <daniel@quora.org> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> --- fs/btrfs/delayed-ref.c | 8 ++++++++ fs/btrfs/delayed-ref.h | 7 +++++++ fs/btrfs/extent-tree.c | 24 ++++++++++++++++++------ 3 files changed, 33 insertions(+), 6 deletions(-) diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c index 69f22e3..bc43c89 100644 --- a/fs/btrfs/delayed-ref.c +++ b/fs/btrfs/delayed-ref.c @@ -310,6 +310,14 @@ again: return 1; } +void btrfs_release_ref_cluster(struct list_head *cluster) +{ + struct list_head *pos, *q; + + list_for_each_safe(pos, q, cluster) + list_del_init(pos); +} + /* * helper function to update an extent delayed ref in the * rbtree. existing and update must both have the same diff --git a/fs/btrfs/delayed-ref.h b/fs/btrfs/delayed-ref.h index d8f244d..bd9c316 100644 --- a/fs/btrfs/delayed-ref.h +++ b/fs/btrfs/delayed-ref.h @@ -192,8 +192,15 @@ struct btrfs_delayed_ref_head * btrfs_find_delayed_ref_head(struct btrfs_trans_handle *trans, u64 bytenr); int btrfs_delayed_ref_lock(struct btrfs_trans_handle *trans, struct btrfs_delayed_ref_head *head); +static inline void btrfs_delayed_ref_unlock(struct btrfs_trans_handle *trans, + struct btrfs_delayed_ref_head *head) +{ + mutex_unlock(&head->mutex); +} + int btrfs_find_ref_cluster(struct btrfs_trans_handle *trans, struct list_head *cluster, u64 search_start); +void btrfs_release_ref_cluster(struct list_head *cluster); struct seq_list { struct list_head list; diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 49fd7b6..973379a 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -2157,7 +2157,6 @@ static int run_one_delayed_ref(struct btrfs_trans_handle *trans, node->num_bytes); } } - mutex_unlock(&head->mutex); return ret; } @@ -2302,14 +2301,13 @@ static noinline int run_clustered_refs(struct btrfs_trans_handle *trans, if (ret) { printk(KERN_DEBUG "btrfs: run_delayed_extent_op returned %d\n", ret); spin_lock(&delayed_refs->lock); + btrfs_delayed_ref_unlock(trans, + locked_ref); return ret; } goto next; } - - list_del_init(&locked_ref->cluster); - locked_ref = NULL; } ref->in_tree = 0; @@ -2325,11 +2323,24 @@ static noinline int run_clustered_refs(struct btrfs_trans_handle *trans, ret = run_one_delayed_ref(trans, root, ref, extent_op, must_insert_reserved); - - btrfs_put_delayed_ref(ref); kfree(extent_op); count++; + /* + * If this node is a head, we will pick the next head to deal + * with. If there is something wrong when we process the + * delayed ref, we will end our operation. So in these two + * cases, we have to unlock the head and drop it from the + * cluster list before we release it though the code is ugly. + */ + if (btrfs_delayed_ref_is_head(ref) || ret) { + btrfs_delayed_ref_unlock(trans, locked_ref); + list_del_init(&locked_ref->cluster); + locked_ref = NULL; + } + + btrfs_put_delayed_ref(ref); + if (ret) { printk(KERN_DEBUG "btrfs: run_one_delayed_ref returned %d\n", ret); spin_lock(&delayed_refs->lock); @@ -2450,6 +2461,7 @@ again: ret = run_clustered_refs(trans, root, &cluster); if (ret < 0) { + btrfs_release_ref_cluster(&cluster); spin_unlock(&delayed_refs->lock); btrfs_abort_transaction(trans, root, ret); return ret; -- 1.7.6.5 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Any comment for the following patch? On wed, 09 May 2012 11:39:36 +0800, Miao Xie wrote:> On tue, 8 May 2012 16:56:27 +0800, Daniel J Blueman wrote: >> Delayed allocation ref mutexes are taken [1] inside >> btrfs_commit_transaction. A later call fails and jumps to the >> cleanup_transaction label (transaction.c:1501) with these mutexes >> still held causing deadlock [2] when they are reacquired. >> >> Either we can introduce an earlier label (cleanup_transaction_lock) >> and function to unlock these mutexes or can tweak >> btrfs_destroy_delayed_refs to conditionally use mutex_try_lock. >> >> What is the suggested approach? > > I think we can unlock those mutexes at the suitable place. > Could you try this patch? > > Title:[PATCH] Btrfs: fix deadlock when the process of delayed refs fails > > Delayed ref mutexes are taken inside btrfs_commit_transaction. A later call > fails and jumps to the cleanup_transaction label with these mutexes still held > causing deadlock when they are reacquired. > > Fix this problem by unlocking those mutexes at the suitable place. > > Reported-by: Daniel J Blueman <daniel@quora.org> > Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> > --- > fs/btrfs/delayed-ref.c | 8 ++++++++ > fs/btrfs/delayed-ref.h | 7 +++++++ > fs/btrfs/extent-tree.c | 24 ++++++++++++++++++------ > 3 files changed, 33 insertions(+), 6 deletions(-) > > diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c > index 69f22e3..bc43c89 100644 > --- a/fs/btrfs/delayed-ref.c > +++ b/fs/btrfs/delayed-ref.c > @@ -310,6 +310,14 @@ again: > return 1; > } > > +void btrfs_release_ref_cluster(struct list_head *cluster) > +{ > + struct list_head *pos, *q; > + > + list_for_each_safe(pos, q, cluster) > + list_del_init(pos); > +} > + > /* > * helper function to update an extent delayed ref in the > * rbtree. existing and update must both have the same > diff --git a/fs/btrfs/delayed-ref.h b/fs/btrfs/delayed-ref.h > index d8f244d..bd9c316 100644 > --- a/fs/btrfs/delayed-ref.h > +++ b/fs/btrfs/delayed-ref.h > @@ -192,8 +192,15 @@ struct btrfs_delayed_ref_head * > btrfs_find_delayed_ref_head(struct btrfs_trans_handle *trans, u64 bytenr); > int btrfs_delayed_ref_lock(struct btrfs_trans_handle *trans, > struct btrfs_delayed_ref_head *head); > +static inline void btrfs_delayed_ref_unlock(struct btrfs_trans_handle *trans, > + struct btrfs_delayed_ref_head *head) > +{ > + mutex_unlock(&head->mutex); > +} > + > int btrfs_find_ref_cluster(struct btrfs_trans_handle *trans, > struct list_head *cluster, u64 search_start); > +void btrfs_release_ref_cluster(struct list_head *cluster); > > struct seq_list { > struct list_head list; > diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c > index 49fd7b6..973379a 100644 > --- a/fs/btrfs/extent-tree.c > +++ b/fs/btrfs/extent-tree.c > @@ -2157,7 +2157,6 @@ static int run_one_delayed_ref(struct btrfs_trans_handle *trans, > node->num_bytes); > } > } > - mutex_unlock(&head->mutex); > return ret; > } > > @@ -2302,14 +2301,13 @@ static noinline int run_clustered_refs(struct btrfs_trans_handle *trans, > if (ret) { > printk(KERN_DEBUG "btrfs: run_delayed_extent_op returned %d\n", ret); > spin_lock(&delayed_refs->lock); > + btrfs_delayed_ref_unlock(trans, > + locked_ref); > return ret; > } > > goto next; > } > - > - list_del_init(&locked_ref->cluster); > - locked_ref = NULL; > } > > ref->in_tree = 0; > @@ -2325,11 +2323,24 @@ static noinline int run_clustered_refs(struct btrfs_trans_handle *trans, > > ret = run_one_delayed_ref(trans, root, ref, extent_op, > must_insert_reserved); > - > - btrfs_put_delayed_ref(ref); > kfree(extent_op); > count++; > > + /* > + * If this node is a head, we will pick the next head to deal > + * with. If there is something wrong when we process the > + * delayed ref, we will end our operation. So in these two > + * cases, we have to unlock the head and drop it from the > + * cluster list before we release it though the code is ugly. > + */ > + if (btrfs_delayed_ref_is_head(ref) || ret) { > + btrfs_delayed_ref_unlock(trans, locked_ref); > + list_del_init(&locked_ref->cluster); > + locked_ref = NULL; > + } > + > + btrfs_put_delayed_ref(ref); > + > if (ret) { > printk(KERN_DEBUG "btrfs: run_one_delayed_ref returned %d\n", ret); > spin_lock(&delayed_refs->lock); > @@ -2450,6 +2461,7 @@ again: > > ret = run_clustered_refs(trans, root, &cluster); > if (ret < 0) { > + btrfs_release_ref_cluster(&cluster); > spin_unlock(&delayed_refs->lock); > btrfs_abort_transaction(trans, root, ret); > return ret;-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Maio, On 17 May 2012 13:49, Miao Xie <miaox@cn.fujitsu.com> wrote:> Any comment for the following patch?Thanks for your patch previously. I''ll find some time on the weekend to get it tested and get back to you, as I''ve been saturated over the last couple of weeks. If interested, I can share my btrfs testathon script as it seems handy for QA? Thanks, Daniel> On wed, 09 May 2012 11:39:36 +0800, Miao Xie wrote: >> On tue, 8 May 2012 16:56:27 +0800, Daniel J Blueman wrote: >>> Delayed allocation ref mutexes are taken [1] inside >>> btrfs_commit_transaction. A later call fails and jumps to the >>> cleanup_transaction label (transaction.c:1501) with these mutexes >>> still held causing deadlock [2] when they are reacquired. >>> >>> Either we can introduce an earlier label (cleanup_transaction_lock) >>> and function to unlock these mutexes or can tweak >>> btrfs_destroy_delayed_refs to conditionally use mutex_try_lock. >>> >>> What is the suggested approach? >> >> I think we can unlock those mutexes at the suitable place. >> Could you try this patch? >> >> Title:[PATCH] Btrfs: fix deadlock when the process of delayed refs fails >> >> Delayed ref mutexes are taken inside btrfs_commit_transaction. A later call >> fails and jumps to the cleanup_transaction label with these mutexes still held >> causing deadlock when they are reacquired. >> >> Fix this problem by unlocking those mutexes at the suitable place. >> >> Reported-by: Daniel J Blueman <daniel@quora.org> >> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> >> --- >> fs/btrfs/delayed-ref.c | 8 ++++++++ >> fs/btrfs/delayed-ref.h | 7 +++++++ >> fs/btrfs/extent-tree.c | 24 ++++++++++++++++++------ >> 3 files changed, 33 insertions(+), 6 deletions(-) >> >> diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c >> index 69f22e3..bc43c89 100644 >> --- a/fs/btrfs/delayed-ref.c >> +++ b/fs/btrfs/delayed-ref.c >> @@ -310,6 +310,14 @@ again: >> return 1; >> } >> >> +void btrfs_release_ref_cluster(struct list_head *cluster) >> +{ >> + struct list_head *pos, *q; >> + >> + list_for_each_safe(pos, q, cluster) >> + list_del_init(pos); >> +} >> + >> /* >> * helper function to update an extent delayed ref in the >> * rbtree. existing and update must both have the same >> diff --git a/fs/btrfs/delayed-ref.h b/fs/btrfs/delayed-ref.h >> index d8f244d..bd9c316 100644 >> --- a/fs/btrfs/delayed-ref.h >> +++ b/fs/btrfs/delayed-ref.h >> @@ -192,8 +192,15 @@ struct btrfs_delayed_ref_head * >> btrfs_find_delayed_ref_head(struct btrfs_trans_handle *trans, u64 bytenr); >> int btrfs_delayed_ref_lock(struct btrfs_trans_handle *trans, >> struct btrfs_delayed_ref_head *head); >> +static inline void btrfs_delayed_ref_unlock(struct btrfs_trans_handle *trans, >> + struct btrfs_delayed_ref_head *head) >> +{ >> + mutex_unlock(&head->mutex); >> +} >> + >> int btrfs_find_ref_cluster(struct btrfs_trans_handle *trans, >> struct list_head *cluster, u64 search_start); >> +void btrfs_release_ref_cluster(struct list_head *cluster); >> >> struct seq_list { >> struct list_head list; >> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c >> index 49fd7b6..973379a 100644 >> --- a/fs/btrfs/extent-tree.c >> +++ b/fs/btrfs/extent-tree.c >> @@ -2157,7 +2157,6 @@ static int run_one_delayed_ref(struct btrfs_trans_handle *trans, >> node->num_bytes); >> } >> } >> - mutex_unlock(&head->mutex); >> return ret; >> } >> >> @@ -2302,14 +2301,13 @@ static noinline int run_clustered_refs(struct btrfs_trans_handle *trans, >> if (ret) { >> printk(KERN_DEBUG "btrfs: run_delayed_extent_op returned %d\n", ret); >> spin_lock(&delayed_refs->lock); >> + btrfs_delayed_ref_unlock(trans, >> + locked_ref); >> return ret; >> } >> >> goto next; >> } >> - >> - list_del_init(&locked_ref->cluster); >> - locked_ref = NULL; >> } >> >> ref->in_tree = 0; >> @@ -2325,11 +2323,24 @@ static noinline int run_clustered_refs(struct btrfs_trans_handle *trans, >> >> ret = run_one_delayed_ref(trans, root, ref, extent_op, >> must_insert_reserved); >> - >> - btrfs_put_delayed_ref(ref); >> kfree(extent_op); >> count++; >> >> + /* >> + * If this node is a head, we will pick the next head to deal >> + * with. If there is something wrong when we process the >> + * delayed ref, we will end our operation. So in these two >> + * cases, we have to unlock the head and drop it from the >> + * cluster list before we release it though the code is ugly. >> + */ >> + if (btrfs_delayed_ref_is_head(ref) || ret) { >> + btrfs_delayed_ref_unlock(trans, locked_ref); >> + list_del_init(&locked_ref->cluster); >> + locked_ref = NULL; >> + } >> + >> + btrfs_put_delayed_ref(ref); >> + >> if (ret) { >> printk(KERN_DEBUG "btrfs: run_one_delayed_ref returned %d\n", ret); >> spin_lock(&delayed_refs->lock); >> @@ -2450,6 +2461,7 @@ again: >> >> ret = run_clustered_refs(trans, root, &cluster); >> if (ret < 0) { >> + btrfs_release_ref_cluster(&cluster); >> spin_unlock(&delayed_refs->lock); >> btrfs_abort_transaction(trans, root, ret); >> return ret; >-- Daniel J Blueman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html