thr3ads.net - Ocfs2 devel - [Ocfs2-devel] [PATCH v6 01/19] mm: Return void from various readahead functions [Feb 2020]

If this information is useful, please help other people find it:
Share via:

John Hubbard

2020-Feb-18 21:05 UTC

[Ocfs2-devel] [PATCH v6 01/19] mm: Return void from various readahead functions

On 2/17/20 10:45 AM, Matthew Wilcox wrote:> From: "Matthew Wilcox (Oracle)" <willy at infradead.org>
> 
> ondemand_readahead has two callers, neither of which use the return value.
> That means that both ra_submit and __do_page_cache_readahead() can return
> void, and we don't need to worry that a present page in the readahead
> window causes us to return a smaller nr_pages than we ought to have.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy at infradead.org>
> ---
>  mm/internal.h  |  8 ++++----
>  mm/readahead.c | 24 ++++++++++--------------
>  2 files changed, 14 insertions(+), 18 deletions(-)

This is an easy review and obviously correct, so:

    Reviewed-by: John Hubbard <jhubbard at nvidia.com>


Thoughts for the future of the API:

I will add that I could envision another patchset that went in the
opposite direction, and attempted to preserve the information about
how many pages were successfully read ahead. And that would be nice
to have (at least IMHO), even all the way out to the syscall level,
especially for the readahead syscall.

Of course, vague opinions about how the API might be improved are less
pressing than cleaning up the code now--I'm just bringing this up because
I suspect some people will wonder, "wouldn't it be helpful if I the 
syscall would tell me what happened here? Success (returning 0) doesn't
necessarily mean any pages were even read ahead." It just seems worth 
mentioning.


thanks,
-- 
John Hubbard
NVIDIA
> 
> diff --git a/mm/internal.h b/mm/internal.h
> index 3cf20ab3ca01..f779f058118b 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -49,18 +49,18 @@ void unmap_page_range(struct mmu_gather *tlb,
>  			     unsigned long addr, unsigned long end,
>  			     struct zap_details *details);
>  
> -extern unsigned int __do_page_cache_readahead(struct address_space
*mapping,
> +extern void __do_page_cache_readahead(struct address_space *mapping,
>  		struct file *filp, pgoff_t offset, unsigned long nr_to_read,
>  		unsigned long lookahead_size);
>  
>  /*
>   * Submit IO for the read-ahead request in file_ra_state.
>   */
> -static inline unsigned long ra_submit(struct file_ra_state *ra,
> +static inline void ra_submit(struct file_ra_state *ra,
>  		struct address_space *mapping, struct file *filp)
>  {
> -	return __do_page_cache_readahead(mapping, filp,
> -					ra->start, ra->size, ra->async_size);
> +	__do_page_cache_readahead(mapping, filp,
> +			ra->start, ra->size, ra->async_size);
>  }
>  
>  /*
> diff --git a/mm/readahead.c b/mm/readahead.c
> index 2fe72cd29b47..8ce46d69e6ae 100644
> --- a/mm/readahead.c
> +++ b/mm/readahead.c
> @@ -149,10 +149,8 @@ static int read_pages(struct address_space *mapping,
struct file *filp,
>   * the pages first, then submits them for I/O. This avoids the very bad
>   * behaviour which would occur if page allocations are causing VM
writeback.
>   * We really don't want to intermingle reads and writes like that.
> - *
> - * Returns the number of pages requested, or the maximum amount of I/O
allowed.
>   */
> -unsigned int __do_page_cache_readahead(struct address_space *mapping,
> +void __do_page_cache_readahead(struct address_space *mapping,
>  		struct file *filp, pgoff_t offset, unsigned long nr_to_read,
>  		unsigned long lookahead_size)
>  {
> @@ -166,7 +164,7 @@ unsigned int __do_page_cache_readahead(struct
address_space *mapping,
>  	gfp_t gfp_mask = readahead_gfp_mask(mapping);
>  
>  	if (isize == 0)
> -		goto out;
> +		return;
>  
>  	end_index = ((isize - 1) >> PAGE_SHIFT);
>  
> @@ -211,8 +209,6 @@ unsigned int __do_page_cache_readahead(struct
address_space *mapping,
>  	if (nr_pages)
>  		read_pages(mapping, filp, &page_pool, nr_pages, gfp_mask);
>  	BUG_ON(!list_empty(&page_pool));
> -out:
> -	return nr_pages;
>  }
>  
>  /*
> @@ -378,11 +374,10 @@ static int try_context_readahead(struct address_space
*mapping,
>  /*
>   * A minimal readahead algorithm for trivial sequential/random reads.
>   */
> -static unsigned long
> -ondemand_readahead(struct address_space *mapping,
> -		   struct file_ra_state *ra, struct file *filp,
> -		   bool hit_readahead_marker, pgoff_t offset,
> -		   unsigned long req_size)
> +static void ondemand_readahead(struct address_space *mapping,
> +		struct file_ra_state *ra, struct file *filp,
> +		bool hit_readahead_marker, pgoff_t offset,
> +		unsigned long req_size)
>  {
>  	struct backing_dev_info *bdi = inode_to_bdi(mapping->host);
>  	unsigned long max_pages = ra->ra_pages;
> @@ -428,7 +423,7 @@ ondemand_readahead(struct address_space *mapping,
>  		rcu_read_unlock();
>  
>  		if (!start || start - offset > max_pages)
> -			return 0;
> +			return;
>  
>  		ra->start = start;
>  		ra->size = start - offset;	/* old async_size */
> @@ -464,7 +459,8 @@ ondemand_readahead(struct address_space *mapping,
>  	 * standalone, small random read
>  	 * Read as is, and do not pollute the readahead state.
>  	 */
> -	return __do_page_cache_readahead(mapping, filp, offset, req_size, 0);
> +	__do_page_cache_readahead(mapping, filp, offset, req_size, 0);
> +	return;
>  
>  initial_readahead:
>  	ra->start = offset;
> @@ -489,7 +485,7 @@ ondemand_readahead(struct address_space *mapping,
>  		}
>  	}
>  
> -	return ra_submit(ra, mapping, filp);
> +	ra_submit(ra, mapping, filp);
>  }
>  
>  /**
>

Matthew Wilcox

2020-Feb-18 21:21 UTC

head link

[Ocfs2-devel] [PATCH v6 01/19] mm: Return void from various readahead functions

On Tue, Feb 18, 2020 at 01:05:29PM -0800, John Hubbard
wrote:> This is an easy review and obviously correct, so:
> 
>     Reviewed-by: John Hubbard <jhubbard at nvidia.com>
Thanks
> Thoughts for the future of the API:
> 
> I will add that I could envision another patchset that went in the
> opposite direction, and attempted to preserve the information about
> how many pages were successfully read ahead. And that would be nice
> to have (at least IMHO), even all the way out to the syscall level,
> especially for the readahead syscall.
Right, and that was where I went initially.  It turns out to be a
non-trivial aount of work to do the book-keeping to find out how many
pages were _attempted_, and since we don't wait for the I/O to complete,
we don't know how many _succeeded_, and we also don't know how many
weren't attempted because they were already there, and how many weren't
attempted because somebody else has raced with us and is going to attempt
them themselves, and how many weren't attempted because we just ran out
of memory, and decided to give up.

Also, we don't know how many pages were successfully read, and then the
system decided to evict before the program found out how many were read,
let alone before it did any action based on that.

So, given all that complexity, and the fact that nobody actually does
anything with the limited and incorrect information we tried to provide
today, I think it's fair to say that anybody who wants to start to do
anything with that information can delve into all the complexity around
"what number should we return, and what does it really mean".  In the
meantime, let's just ditch the complexity and pretense that this number
means anything.

Ocfs2 devel - Feb 2020 - [PATCH v6 01/19] mm: Return void from various readahead functions

[Ocfs2-devel] [PATCH v6 01/19] mm: Return void from various readahead functions

[Ocfs2-devel] [PATCH v6 01/19] mm: Return void from various readahead functions