thr3ads.net - Btrfs devel - [PATCH] Btrfs: fix heavy delalloc related deadlock [Aug 2013]

If this information is useful, please help other people find it:
Share via:

Josef Bacik

2013-Aug-14 15:41 UTC

[PATCH] Btrfs: fix heavy delalloc related deadlock

I added a patch where we started taking the ordered operations mutex when we
waited on ordered extents.  We need this because we splice the list and process
it, so if a flusher came in during this scenario it would think the list was
empty and we''d usually get an early ENOSPC.  The problem with this is
that this
lock is used in transaction committing.  So we end up with something like this

Transaction commit
	-> wait on writers

Delalloc flusher
	-> run_ordered_operations (holds mutex)
		->wait for filemap-flush to do its thing

flush task
	-> cow_file_range
		->wait on btrfs_join_transaction because we''re commiting

some other task
	-> commit_transaction because we notice trans->transaction->flush is
set
		-> run_ordered_operations (hang on mutex)

We need to disentangle the ordered operations flushing from the delalloc
flushing, since they are separate things.  This solves the deadlock issue I was
seeing.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
---
 fs/btrfs/ctree.h        |    7 +++++++
 fs/btrfs/disk-io.c      |    1 +
 fs/btrfs/ordered-data.c |    4 ++--
 3 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index ea4cc16..d79e32c 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1418,6 +1418,13 @@ struct btrfs_fs_info {
 	 * before jumping into the main commit.
 	 */
 	struct mutex ordered_operations_mutex;
+
+	/*
+	 * Same as ordered_operations_mutex except this is for ordered extents
+	 * and not the operations.
+	 */
+	struct mutex ordered_extent_flush_mutex;
+
 	struct rw_semaphore extent_commit_sem;
 
 	struct rw_semaphore cleanup_work_sem;
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index c82025d..880dcde 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2288,6 +2288,7 @@ int open_ctree(struct super_block *sb,
 
 
 	mutex_init(&fs_info->ordered_operations_mutex);
+	mutex_init(&fs_info->ordered_extent_flush_mutex);
 	mutex_init(&fs_info->tree_log_mutex);
 	mutex_init(&fs_info->chunk_mutex);
 	mutex_init(&fs_info->transaction_kthread_mutex);
diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c
index 8136982..b52b2c4 100644
--- a/fs/btrfs/ordered-data.c
+++ b/fs/btrfs/ordered-data.c
@@ -671,7 +671,7 @@ int btrfs_run_ordered_operations(struct btrfs_trans_handle
*trans,
 	INIT_LIST_HEAD(&splice);
 	INIT_LIST_HEAD(&works);
 
-	mutex_lock(&root->fs_info->ordered_operations_mutex);
+	mutex_lock(&root->fs_info->ordered_extent_flush_mutex);
 	spin_lock(&root->fs_info->ordered_root_lock);
 	list_splice_init(&cur_trans->ordered_operations, &splice);
 	while (!list_empty(&splice)) {
@@ -718,7 +718,7 @@ out:
 		list_del_init(&work->list);
 		btrfs_wait_and_free_delalloc_work(work);
 	}
-	mutex_unlock(&root->fs_info->ordered_operations_mutex);
+	mutex_unlock(&root->fs_info->ordered_extent_flush_mutex);
 	return ret;
 }
 
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Josef Bacik

2013-Aug-14 19:28 UTC

head link

[PATCH] Btrfs: fix heavy delalloc related deadlock

I added a patch where we started taking the ordered operations mutex when we
waited on ordered extents.  We need this because we splice the list and process
it, so if a flusher came in during this scenario it would think the list was
empty and we''d usually get an early ENOSPC.  The problem with this is
that this
lock is used in transaction committing.  So we end up with something like this

Transaction commit
	-> wait on writers

Delalloc flusher
	-> run_ordered_operations (holds mutex)
		->wait for filemap-flush to do its thing

flush task
	-> cow_file_range
		->wait on btrfs_join_transaction because we''re commiting

some other task
	-> commit_transaction because we notice trans->transaction->flush is
set
		-> run_ordered_operations (hang on mutex)

We need to disentangle the ordered operations flushing from the delalloc
flushing, since they are separate things.  This solves the deadlock issue I was
seeing.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
---
 fs/btrfs/ctree.h        |    7 +++++++
 fs/btrfs/disk-io.c      |    1 +
 fs/btrfs/ordered-data.c |    4 ++--
 3 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 0632832..063e485 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1413,6 +1413,13 @@ struct btrfs_fs_info {
 	 * before jumping into the main commit.
 	 */
 	struct mutex ordered_operations_mutex;
+
+	/*
+	 * Same as ordered_operations_mutex except this is for ordered extents
+	 * and not the operations.
+	 */
+	struct mutex ordered_extent_flush_mutex;
+
 	struct rw_semaphore extent_commit_sem;
 
 	struct rw_semaphore cleanup_work_sem;
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 5de9ad7..3b12c26 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2276,6 +2276,7 @@ int open_ctree(struct super_block *sb,
 
 
 	mutex_init(&fs_info->ordered_operations_mutex);
+	mutex_init(&fs_info->ordered_extent_flush_mutex);
 	mutex_init(&fs_info->tree_log_mutex);
 	mutex_init(&fs_info->chunk_mutex);
 	mutex_init(&fs_info->transaction_kthread_mutex);
diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c
index 8136982..b52b2c4 100644
--- a/fs/btrfs/ordered-data.c
+++ b/fs/btrfs/ordered-data.c
@@ -671,7 +671,7 @@ int btrfs_run_ordered_operations(struct btrfs_trans_handle
*trans,
 	INIT_LIST_HEAD(&splice);
 	INIT_LIST_HEAD(&works);
 
-	mutex_lock(&root->fs_info->ordered_operations_mutex);
+	mutex_lock(&root->fs_info->ordered_extent_flush_mutex);
 	spin_lock(&root->fs_info->ordered_root_lock);
 	list_splice_init(&cur_trans->ordered_operations, &splice);
 	while (!list_empty(&splice)) {
@@ -718,7 +718,7 @@ out:
 		list_del_init(&work->list);
 		btrfs_wait_and_free_delalloc_work(work);
 	}
-	mutex_unlock(&root->fs_info->ordered_operations_mutex);
+	mutex_unlock(&root->fs_info->ordered_extent_flush_mutex);
 	return ret;
 }
 
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Miao Xie

2013-Aug-19 02:31 UTC

head link

Re: [PATCH] Btrfs: fix heavy delalloc related deadlock

On wed, 14 Aug 2013 11:41:00 -0400, Josef Bacik wrote:> I added a patch where we started taking the ordered operations mutex when
we
> waited on ordered extents.  We need this because we splice the list and
process
> it, so if a flusher came in during this scenario it would think the list
was
> empty and we''d usually get an early ENOSPC.  The problem with this
is that this
> lock is used in transaction committing.  So we end up with something like
this
> 
> Transaction commit
> 	-> wait on writers
> 
> Delalloc flusher
> 	-> run_ordered_operations (holds mutex)
> 		->wait for filemap-flush to do its thing
> 
> flush task
> 	-> cow_file_range
> 		->wait on btrfs_join_transaction because we''re commiting
> 
> some other task
> 	-> commit_transaction because we notice trans->transaction->flush
is set
> 		-> run_ordered_operations (hang on mutex)
Sorry, I can not understand this explanation. As far as I know, if the flush
task
waits on btrfs_join_transaction(), it means the transaction is under commit
(state = TRANS_STATE_COMMIT_DOING), and all the external
writers(TRANS_START/TRANS_ATTACH/
TRANS_USERSPACE) have quitted the current transaction, so no one would try to
call
run_ordered_operations().

Could you show us the reproduce steps?

Thanks
Miao
> 
> We need to disentangle the ordered operations flushing from the delalloc
> flushing, since they are separate things.  This solves the deadlock issue I
was
> seeing.  Thanks,
> 
> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
> ---
>  fs/btrfs/ctree.h        |    7 +++++++
>  fs/btrfs/disk-io.c      |    1 +
>  fs/btrfs/ordered-data.c |    4 ++--
>  3 files changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index ea4cc16..d79e32c 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -1418,6 +1418,13 @@ struct btrfs_fs_info {
>  	 * before jumping into the main commit.
>  	 */
>  	struct mutex ordered_operations_mutex;
> +
> +	/*
> +	 * Same as ordered_operations_mutex except this is for ordered extents
> +	 * and not the operations.
> +	 */
> +	struct mutex ordered_extent_flush_mutex;
> +
>  	struct rw_semaphore extent_commit_sem;
>  
>  	struct rw_semaphore cleanup_work_sem;
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index c82025d..880dcde 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -2288,6 +2288,7 @@ int open_ctree(struct super_block *sb,
>  
>  
>  	mutex_init(&fs_info->ordered_operations_mutex);
> +	mutex_init(&fs_info->ordered_extent_flush_mutex);
>  	mutex_init(&fs_info->tree_log_mutex);
>  	mutex_init(&fs_info->chunk_mutex);
>  	mutex_init(&fs_info->transaction_kthread_mutex);
> diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c
> index 8136982..b52b2c4 100644
> --- a/fs/btrfs/ordered-data.c
> +++ b/fs/btrfs/ordered-data.c
> @@ -671,7 +671,7 @@ int btrfs_run_ordered_operations(struct
btrfs_trans_handle *trans,
>  	INIT_LIST_HEAD(&splice);
>  	INIT_LIST_HEAD(&works);
>  
> -	mutex_lock(&root->fs_info->ordered_operations_mutex);
> +	mutex_lock(&root->fs_info->ordered_extent_flush_mutex);
>  	spin_lock(&root->fs_info->ordered_root_lock);
>  	list_splice_init(&cur_trans->ordered_operations, &splice);
>  	while (!list_empty(&splice)) {
> @@ -718,7 +718,7 @@ out:
>  		list_del_init(&work->list);
>  		btrfs_wait_and_free_delalloc_work(work);
>  	}
> -	mutex_unlock(&root->fs_info->ordered_operations_mutex);
> +	mutex_unlock(&root->fs_info->ordered_extent_flush_mutex);
>  	return ret;
>  }
>  
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Josef Bacik

2013-Aug-19 12:49 UTC

head link

Re: [PATCH] Btrfs: fix heavy delalloc related deadlock

On Mon, Aug 19, 2013 at 10:31:15AM +0800, Miao Xie
wrote:> On wed, 14 Aug 2013 11:41:00 -0400, Josef Bacik wrote:
> > I added a patch where we started taking the ordered operations mutex
when we
> > waited on ordered extents.  We need this because we splice the list
and process
> > it, so if a flusher came in during this scenario it would think the
list was
> > empty and we''d usually get an early ENOSPC.  The problem with
this is that this
> > lock is used in transaction committing.  So we end up with something
like this
> > 
> > Transaction commit
> > 	-> wait on writers
> > 
> > Delalloc flusher
> > 	-> run_ordered_operations (holds mutex)
> > 		->wait for filemap-flush to do its thing
> > 
> > flush task
> > 	-> cow_file_range
> > 		->wait on btrfs_join_transaction because we''re commiting
> > 
> > some other task
> > 	-> commit_transaction because we notice
trans->transaction->flush is set
> > 		-> run_ordered_operations (hang on mutex)
> 
> Sorry, I can not understand this explanation. As far as I know, if the
flush task
> waits on btrfs_join_transaction(), it means the transaction is under commit
> (state = TRANS_STATE_COMMIT_DOING), and all the external
writers(TRANS_START/TRANS_ATTACH/
> TRANS_USERSPACE) have quitted the current transaction, so no one would try
to call
> run_ordered_operations().
> 
> Could you show us the reproduce steps?
> 
Sorry I wrote the wrong thing for the delalloc flusher, that should be

  ->btrfs_wait_ordered_extents (holds ordered operations mutex)
	-> wait for filemap-flush to do its thing

That should make it clearer.  I reproduced it running xfstests generic/224.
Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Miao Xie

2013-Aug-21 06:31 UTC

head link

Re: [PATCH] Btrfs: fix heavy delalloc related deadlock

Josef

On mon, 19 Aug 2013 08:49:52 -0400, Josef Bacik wrote:> On Mon, Aug 19, 2013 at 10:31:15AM +0800, Miao Xie wrote:
>> On wed, 14 Aug 2013 11:41:00 -0400, Josef Bacik wrote:
>>> I added a patch where we started taking the ordered operations
mutex when we
>>> waited on ordered extents.  We need this because we splice the list
and process
>>> it, so if a flusher came in during this scenario it would think the
list was
>>> empty and we''d usually get an early ENOSPC.  The problem
with this is that this
>>> lock is used in transaction committing.  So we end up with
something like this
>>>
>>> Transaction commit
>>> 	-> wait on writers
>>>
>>> Delalloc flusher
>>> 	-> run_ordered_operations (holds mutex)
>>> 		->wait for filemap-flush to do its thing
>>>
>>> flush task
>>> 	-> cow_file_range
>>> 		->wait on btrfs_join_transaction because we''re
commiting
>>>
>>> some other task
>>> 	-> commit_transaction because we notice
trans->transaction->flush is set
>>> 		-> run_ordered_operations (hang on mutex)
>>
>> Sorry, I can not understand this explanation. As far as I know, if the
flush task
>> waits on btrfs_join_transaction(), it means the transaction is under
commit
>> (state = TRANS_STATE_COMMIT_DOING), and all the external
writers(TRANS_START/TRANS_ATTACH/
>> TRANS_USERSPACE) have quitted the current transaction, so no one would
try to call
>> run_ordered_operations().
>>
>> Could you show us the reproduce steps?
>>
> 
> Sorry I wrote the wrong thing for the delalloc flusher, that should be
> 
>   ->btrfs_wait_ordered_extents (holds ordered operations mutex)
> 	-> wait for filemap-flush to do its thing
> 
> That should make it clearer.  I reproduced it running xfstests generic/224.
> Thanks,
Your patch can fix the above deadlock problem. And this problem also happens on
the old kernel, so it is better to send it to the stable kernel mail list, and
please
add
	Reviewed-by: Miao Xie <miaox@cn.fujitsu.com>

By the way, I found the "some other tasks" you said above are tasks
that start
TRANS_JOIN transaction handles, if we don''t use
btrfs_join_transaction/btrfs_commit_transaction
at the same time, we can also avoid the above deadlock. And besides that, I
think
the TRANS_JOIN handle should not be committed because the TRANS_JOIN handle can
grab the current transaction even it is going to be committed, it is error prone
if
we commit a TRANS_JOIN handle when the transaction is going to be committed.
And in the most cases that we need commit the transaction, we just want to
commit
the current transaction, but don''t want to start a new transaction and
then commit it,
so in those cases, the TRANS_JOIN is not suitable.

In short, we need clean up the code that use
btrfs_join_transaction/btrfs_commit_transaction
at the same time.

Thanks
Miao
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Josef Bacik

2013-Aug-21 13:13 UTC

head link

Re: [PATCH] Btrfs: fix heavy delalloc related deadlock

On Wed, Aug 21, 2013 at 02:31:29PM +0800, Miao Xie
wrote:> Josef
> 
> On mon, 19 Aug 2013 08:49:52 -0400, Josef Bacik wrote:
> > On Mon, Aug 19, 2013 at 10:31:15AM +0800, Miao Xie wrote:
> >> On wed, 14 Aug 2013 11:41:00 -0400, Josef Bacik wrote:
> >>> I added a patch where we started taking the ordered operations
mutex when we
> >>> waited on ordered extents.  We need this because we splice the
list and process
> >>> it, so if a flusher came in during this scenario it would
think the list was
> >>> empty and we''d usually get an early ENOSPC.  The
problem with this is that this
> >>> lock is used in transaction committing.  So we end up with
something like this
> >>>
> >>> Transaction commit
> >>> 	-> wait on writers
> >>>
> >>> Delalloc flusher
> >>> 	-> run_ordered_operations (holds mutex)
> >>> 		->wait for filemap-flush to do its thing
> >>>
> >>> flush task
> >>> 	-> cow_file_range
> >>> 		->wait on btrfs_join_transaction because we''re
commiting
> >>>
> >>> some other task
> >>> 	-> commit_transaction because we notice
trans->transaction->flush is set
> >>> 		-> run_ordered_operations (hang on mutex)
> >>
> >> Sorry, I can not understand this explanation. As far as I know, if
the flush task
> >> waits on btrfs_join_transaction(), it means the transaction is
under commit
> >> (state = TRANS_STATE_COMMIT_DOING), and all the external
writers(TRANS_START/TRANS_ATTACH/
> >> TRANS_USERSPACE) have quitted the current transaction, so no one
would try to call
> >> run_ordered_operations().
> >>
> >> Could you show us the reproduce steps?
> >>
> > 
> > Sorry I wrote the wrong thing for the delalloc flusher, that should be
> > 
> >   ->btrfs_wait_ordered_extents (holds ordered operations mutex)
> > 	-> wait for filemap-flush to do its thing
> > 
> > That should make it clearer.  I reproduced it running xfstests
generic/224.
> > Thanks,
> 
> Your patch can fix the above deadlock problem. And this problem also
happens on
> the old kernel, so it is better to send it to the stable kernel mail list,
and please
> add
> 	Reviewed-by: Miao Xie <miaox@cn.fujitsu.com>
> 
> By the way, I found the "some other tasks" you said above are
tasks that start
> TRANS_JOIN transaction handles, if we don''t use
btrfs_join_transaction/btrfs_commit_transaction
> at the same time, we can also avoid the above deadlock. And besides that, I
think
> the TRANS_JOIN handle should not be committed because the TRANS_JOIN handle
can
> grab the current transaction even it is going to be committed, it is error
prone if
> we commit a TRANS_JOIN handle when the transaction is going to be
committed.
> And in the most cases that we need commit the transaction, we just want to
commit
> the current transaction, but don''t want to start a new transaction
and then commit it,
> so in those cases, the TRANS_JOIN is not suitable.
> 
> In short, we need clean up the code that use
btrfs_join_transaction/btrfs_commit_transaction
> at the same time.
>
Agreed I was going through and changing everybody who did this to use the attach
barrier thing you rigged up, and then there was some send thing and I got
distracted.  I''ll go through and finish that work up (the no join in
cow_file_range was part of that work as well).  Thanks,

Josef 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Btrfs devel - Aug 2013 - [PATCH] Btrfs: fix heavy delalloc related deadlock

[PATCH] Btrfs: fix heavy delalloc related deadlock

[PATCH] Btrfs: fix heavy delalloc related deadlock

Re: [PATCH] Btrfs: fix heavy delalloc related deadlock

Re: [PATCH] Btrfs: fix heavy delalloc related deadlock

Re: [PATCH] Btrfs: fix heavy delalloc related deadlock

Re: [PATCH] Btrfs: fix heavy delalloc related deadlock