thr3ads.net - Btrfs devel - [PATCH] Btrfs: complete page writeback before doing ordered extents [Apr 2012]

If this information is useful, please help other people find it:
Share via:

Josef Bacik

2012-Apr-23 17:33 UTC

[PATCH] Btrfs: complete page writeback before doing ordered extents

We can deadlock waiting for pages to end writeback because we are doing an
allocation while hold a tree lock since the ordered extent stuff will
require tree locks.  A quick easy way to fix this is to end page writeback
before we do our ordered io stuff, which works fine since we don''t
really
need the page for this to work.  Eventually we want to make this work happen
as soon as the io is completed and then push the ordered extent stuff off to
a worker thread, but at this stage we need this deadlock fixed with changing
as little as possible.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
---
 fs/btrfs/compression.c  |    3 +--
 fs/btrfs/extent_io.c    |   27 +++++++++++----------------
 fs/btrfs/ordered-data.c |    4 +++-
 3 files changed, 15 insertions(+), 19 deletions(-)

diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index d11afa67..6a456d4 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -286,13 +286,12 @@ static void end_compressed_bio_write(struct bio *bio, int
err)
 	inode = cb->inode;
 	tree = &BTRFS_I(inode)->io_tree;
 	cb->compressed_pages[0]->mapping = cb->inode->i_mapping;
+	end_compressed_writeback(inode, cb->start, cb->len);
 	tree->ops->writepage_end_io_hook(cb->compressed_pages[0],
 					 cb->start,
 					 cb->start + cb->len - 1,
 					 NULL, 1);
 	cb->compressed_pages[0]->mapping = NULL;
-
-	end_compressed_writeback(inode, cb->start, cb->len);
 	/* note, our inode could be gone now */
 
 	/*
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 7c501d3..308c1c2 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -1806,16 +1806,6 @@ static void check_page_locked(struct extent_io_tree
*tree, struct page *page)
 }
 
 /*
- * helper function to end page writeback if all the extents
- * in the tree for that page are done with writeback
- */
-static void check_page_writeback(struct extent_io_tree *tree,
-				 struct page *page)
-{
-	end_page_writeback(page);
-}
-
-/*
  * When IO fails, either with EIO or csum verification fails, we
  * try other mirrors that might have a good copy of the data.  This
  * io_failure_record is used to record state as we go through all the
@@ -2206,6 +2196,13 @@ int end_extent_writepage(struct page *page, int err, u64
start, u64 end)
 	struct extent_io_tree *tree;
 	int ret;
 
+	/*
+	 * Don''t want the page disappearing out from under us after we clear
+	 * writeback.
+	 */
+	page_cache_get(page);
+	end_page_writeback(page);
+
 	tree = &BTRFS_I(page->mapping->host)->io_tree;
 
 	if (tree->ops && tree->ops->writepage_end_io_hook) {
@@ -2220,8 +2217,10 @@ int end_extent_writepage(struct page *page, int err, u64
start, u64 end)
 		ret = tree->ops->writepage_io_failed_hook(NULL, page,
 						 start, end, NULL);
 		/* Writeback already completed */
-		if (ret == 0)
+		if (ret == 0) {
+			page_cache_release(page);
 			return 1;
+		}
 	}
 
 	if (!uptodate) {
@@ -2229,6 +2228,7 @@ int end_extent_writepage(struct page *page, int err, u64
start, u64 end)
 		ClearPageUptodate(page);
 		SetPageError(page);
 	}
+	page_cache_release(page);
 	return 0;
 }
 
@@ -2267,11 +2267,6 @@ static void end_bio_extent_writepage(struct bio *bio, int
err)
 
 		if (end_extent_writepage(page, err, start, end))
 			continue;
-
-		if (whole_page)
-			end_page_writeback(page);
-		else
-			check_page_writeback(tree, page);
 	} while (bvec >= bio->bi_io_vec);
 
 	bio_put(bio);
diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c
index bbf6d0d..acfb360 100644
--- a/fs/btrfs/ordered-data.c
+++ b/fs/btrfs/ordered-data.c
@@ -196,7 +196,7 @@ static int __btrfs_add_ordered_extent(struct inode *inode,
u64 file_offset,
 	entry->len = len;
 	entry->disk_len = disk_len;
 	entry->bytes_left = len;
-	entry->inode = inode;
+	entry->inode = igrab(inode);
 	entry->compress_type = compress_type;
 	if (type != BTRFS_ORDERED_IO_DONE && type != BTRFS_ORDERED_COMPLETE)
 		set_bit(type, &entry->flags);
@@ -399,6 +399,8 @@ void btrfs_put_ordered_extent(struct btrfs_ordered_extent
*entry)
 	trace_btrfs_ordered_extent_put(entry->inode, entry);
 
 	if (atomic_dec_and_test(&entry->refs)) {
+		if (entry->inode)
+			btrfs_add_delayed_iput(entry->inode);
 		while (!list_empty(&entry->list)) {
 			cur = entry->list.next;
 			sum = list_entry(cur, struct btrfs_ordered_sum, list);
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Liu Bo

2012-Apr-24 01:50 UTC

head link

Re: [PATCH] Btrfs: complete page writeback before doing ordered extents

On 04/24/2012 01:33 AM, Josef Bacik wrote:
> We can deadlock waiting for pages to end writeback because we are doing an
> allocation while hold a tree lock since the ordered extent stuff will
> require tree locks.  A quick easy way to fix this is to end page writeback
> before we do our ordered io stuff, which works fine since we don''t
really
> need the page for this to work.  Eventually we want to make this work
happen
> as soon as the io is completed and then push the ordered extent stuff off
to
> a worker thread, but at this stage we need this deadlock fixed with
changing
> as little as possible.  Thanks,
> 

Hi Josef,

I''m ok with the patch, but could you show us more details about the
deadlock between allocation and endio?

thanks,
liubo
> Signed-off-by: Josef Bacik <josef@redhat.com>
> ---
>  fs/btrfs/compression.c  |    3 +--
>  fs/btrfs/extent_io.c    |   27 +++++++++++----------------
>  fs/btrfs/ordered-data.c |    4 +++-
>  3 files changed, 15 insertions(+), 19 deletions(-)
> 
> diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
> index d11afa67..6a456d4 100644
> --- a/fs/btrfs/compression.c
> +++ b/fs/btrfs/compression.c
> @@ -286,13 +286,12 @@ static void end_compressed_bio_write(struct bio *bio,
int err)
>  	inode = cb->inode;
>  	tree = &BTRFS_I(inode)->io_tree;
>  	cb->compressed_pages[0]->mapping = cb->inode->i_mapping;
> +	end_compressed_writeback(inode, cb->start, cb->len);
>  	tree->ops->writepage_end_io_hook(cb->compressed_pages[0],
>  					 cb->start,
>  					 cb->start + cb->len - 1,
>  					 NULL, 1);
>  	cb->compressed_pages[0]->mapping = NULL;
> -
> -	end_compressed_writeback(inode, cb->start, cb->len);
>  	/* note, our inode could be gone now */
>  
>  	/*
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 7c501d3..308c1c2 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -1806,16 +1806,6 @@ static void check_page_locked(struct extent_io_tree
*tree, struct page *page)
>  }
>  
>  /*
> - * helper function to end page writeback if all the extents
> - * in the tree for that page are done with writeback
> - */
> -static void check_page_writeback(struct extent_io_tree *tree,
> -				 struct page *page)
> -{
> -	end_page_writeback(page);
> -}
> -
> -/*
>   * When IO fails, either with EIO or csum verification fails, we
>   * try other mirrors that might have a good copy of the data.  This
>   * io_failure_record is used to record state as we go through all the
> @@ -2206,6 +2196,13 @@ int end_extent_writepage(struct page *page, int err,
u64 start, u64 end)
>  	struct extent_io_tree *tree;
>  	int ret;
>  
> +	/*
> +	 * Don''t want the page disappearing out from under us after we
clear
> +	 * writeback.
> +	 */
> +	page_cache_get(page);
> +	end_page_writeback(page);
> +
>  	tree = &BTRFS_I(page->mapping->host)->io_tree;
>  
>  	if (tree->ops && tree->ops->writepage_end_io_hook) {
> @@ -2220,8 +2217,10 @@ int end_extent_writepage(struct page *page, int err,
u64 start, u64 end)
>  		ret = tree->ops->writepage_io_failed_hook(NULL, page,
>  						 start, end, NULL);
>  		/* Writeback already completed */
> -		if (ret == 0)
> +		if (ret == 0) {
> +			page_cache_release(page);
>  			return 1;
> +		}
>  	}
>  
>  	if (!uptodate) {
> @@ -2229,6 +2228,7 @@ int end_extent_writepage(struct page *page, int err,
u64 start, u64 end)
>  		ClearPageUptodate(page);
>  		SetPageError(page);
>  	}
> +	page_cache_release(page);
>  	return 0;
>  }
>  
> @@ -2267,11 +2267,6 @@ static void end_bio_extent_writepage(struct bio
*bio, int err)
>  
>  		if (end_extent_writepage(page, err, start, end))
>  			continue;
> -
> -		if (whole_page)
> -			end_page_writeback(page);
> -		else
> -			check_page_writeback(tree, page);
>  	} while (bvec >= bio->bi_io_vec);
>  
>  	bio_put(bio);
> diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c
> index bbf6d0d..acfb360 100644
> --- a/fs/btrfs/ordered-data.c
> +++ b/fs/btrfs/ordered-data.c
> @@ -196,7 +196,7 @@ static int __btrfs_add_ordered_extent(struct inode
*inode, u64 file_offset,
>  	entry->len = len;
>  	entry->disk_len = disk_len;
>  	entry->bytes_left = len;
> -	entry->inode = inode;
> +	entry->inode = igrab(inode);
>  	entry->compress_type = compress_type;
>  	if (type != BTRFS_ORDERED_IO_DONE && type !=
BTRFS_ORDERED_COMPLETE)
>  		set_bit(type, &entry->flags);
> @@ -399,6 +399,8 @@ void btrfs_put_ordered_extent(struct
btrfs_ordered_extent *entry)
>  	trace_btrfs_ordered_extent_put(entry->inode, entry);
>  
>  	if (atomic_dec_and_test(&entry->refs)) {
> +		if (entry->inode)
> +			btrfs_add_delayed_iput(entry->inode);
>  		while (!list_empty(&entry->list)) {
>  			cur = entry->list.next;
>  			sum = list_entry(cur, struct btrfs_ordered_sum, list);

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Chris Mason

2012-Apr-24 14:15 UTC

head link

Re: [PATCH] Btrfs: complete page writeback before doing ordered extents

On Tue, Apr 24, 2012 at 09:50:39AM +0800, Liu Bo wrote:> On 04/24/2012 01:33 AM, Josef Bacik wrote:
> 
> > We can deadlock waiting for pages to end writeback because we are
doing an
> > allocation while hold a tree lock since the ordered extent stuff will
> > require tree locks.  A quick easy way to fix this is to end page
writeback
> > before we do our ordered io stuff, which works fine since we
don''t really
> > need the page for this to work.  Eventually we want to make this work
happen
> > as soon as the io is completed and then push the ordered extent stuff
off to
> > a worker thread, but at this stage we need this deadlock fixed with
changing
> > as little as possible.  Thanks,
> > 
> 
> 
> Hi Josef,
> 
> I''m ok with the patch, but could you show us more details about
the deadlock between allocation and endio?
Josef and I have been talking about this one off-list for a while. 
It''s
a deadlock I tracked down in my overnight stress runs.

Basically what we have is the io-less dirty throttling code saying there
are too many pages in writeback, and so new allocations are backing up
and waiting for pages to leave writeback.

But the pages can''t leave writeback because we''re waiting on
more memory
to complete the metadata changes at endio time.  Strictly speaking the
VM is doing something wrong here, our NOFS allocations shouldn''t be
waiting for writeback to finish.

But, strictly speaking we''re doing something wrong too, we''re
doing too
many allocations with pages tied up in writeback.

So this splits the page from the metadata changes.  We''re still doing
the metadata changes after the IO is complete, but we''re doing them
after we''ve let the pages go.

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Liu Bo

2012-Apr-25 07:52 UTC

head link

Re: [PATCH] Btrfs: complete page writeback before doing ordered extents

On 04/24/2012 10:15 PM, Chris Mason wrote:
> On Tue, Apr 24, 2012 at 09:50:39AM +0800, Liu Bo wrote:
>> On 04/24/2012 01:33 AM, Josef Bacik wrote:
>>
>>> We can deadlock waiting for pages to end writeback because we are
doing an
>>> allocation while hold a tree lock since the ordered extent stuff
will
>>> require tree locks.  A quick easy way to fix this is to end page
writeback
>>> before we do our ordered io stuff, which works fine since we
don''t really
>>> need the page for this to work.  Eventually we want to make this
work happen
>>> as soon as the io is completed and then push the ordered extent
stuff off to
>>> a worker thread, but at this stage we need this deadlock fixed with
changing
>>> as little as possible.  Thanks,
>>>
>>
>> Hi Josef,
>>
>> I''m ok with the patch, but could you show us more details
about the deadlock between allocation and endio?
> 
> Josef and I have been talking about this one off-list for a while. 
It''s
> a deadlock I tracked down in my overnight stress runs.
> 
> Basically what we have is the io-less dirty throttling code saying there
> are too many pages in writeback, and so new allocations are backing up
> and waiting for pages to leave writeback.
> 
> But the pages can''t leave writeback because we''re waiting
on more memory
> to complete the metadata changes at endio time.  Strictly speaking the
> VM is doing something wrong here, our NOFS allocations shouldn''t
be
> waiting for writeback to finish.
> 
> But, strictly speaking we''re doing something wrong too,
we''re doing too
> many allocations with pages tied up in writeback.
> 
> So this splits the page from the metadata changes.  We''re still
doing
> the metadata changes after the IO is complete, but we''re doing
them
> after we''ve let the pages go.
> 
> -chris
> 

Now it''s clear, thanks for the explanation. :)

thanks,
liubo
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Btrfs devel - Apr 2012 - [PATCH] Btrfs: complete page writeback before doing ordered extents

[PATCH] Btrfs: complete page writeback before doing ordered extents

Re: [PATCH] Btrfs: complete page writeback before doing ordered extents

Re: [PATCH] Btrfs: complete page writeback before doing ordered extents

Re: [PATCH] Btrfs: complete page writeback before doing ordered extents