john.hubbard at gmail.com
2019-Jul-24 04:25 UTC
[PATCH 00/12] block/bio, fs: convert put_page() to put_user_page*()
From: John Hubbard <jhubbard at nvidia.com> Hi, This is mostly Jerome's work, converting the block/bio and related areas to call put_user_page*() instead of put_page(). Because I've changed Jerome's patches, in some cases significantly, I'd like to get his feedback before we actually leave him listed as the author (he might want to disown some or all of these). I added a new patch, in order to make this work with Christoph Hellwig's recent overhaul to bio_release_pages(): "block: bio_release_pages: use flags arg instead of bool". I've started the series with a patch that I've posted in another series ("mm/gup: add make_dirty arg to put_user_pages_dirty_lock()"[1]), because I'm not sure which of these will go in first, and this allows each to stand alone. Testing: not much beyond build and boot testing has been done yet. And I'm not set up to even exercise all of it (especially the IB parts) at run time. Anyway, changes here are: * Store, in the iov_iter, a "came from gup (get_user_pages)" parameter. Then, use the new iov_iter_get_pages_use_gup() to retrieve it when it is time to release the pages. That allows choosing between put_page() and put_user_page*(). * Pass in one more piece of information to bio_release_pages: a "from_gup" parameter. Similar use as above. * Change the block layer, and several file systems, to use put_user_page*(). [1] https://lore.kernel.org/r/20190724012606.25844-2-jhubbard at nvidia.com And please note the correction email that I posted as a follow-up, if you're looking closely at that patch. :) The fixed version is included here. John Hubbard (3): mm/gup: add make_dirty arg to put_user_pages_dirty_lock() block: bio_release_pages: use flags arg instead of bool fs/ceph: fix a build warning: returning a value from void function J?r?me Glisse (9): iov_iter: add helper to test if an iter would use GUP v2 block: bio_release_pages: convert put_page() to put_user_page*() block_dev: convert put_page() to put_user_page*() fs/nfs: convert put_page() to put_user_page*() vhost-scsi: convert put_page() to put_user_page*() fs/cifs: convert put_page() to put_user_page*() fs/fuse: convert put_page() to put_user_page*() fs/ceph: convert put_page() to put_user_page*() 9p/net: convert put_page() to put_user_page*() block/bio.c | 81 ++++++++++++--- drivers/infiniband/core/umem.c | 5 +- drivers/infiniband/hw/hfi1/user_pages.c | 5 +- drivers/infiniband/hw/qib/qib_user_pages.c | 5 +- drivers/infiniband/hw/usnic/usnic_uiom.c | 5 +- drivers/infiniband/sw/siw/siw_mem.c | 8 +- drivers/vhost/scsi.c | 13 ++- fs/block_dev.c | 22 +++- fs/ceph/debugfs.c | 2 +- fs/ceph/file.c | 62 ++++++++--- fs/cifs/cifsglob.h | 3 + fs/cifs/file.c | 22 +++- fs/cifs/misc.c | 19 +++- fs/direct-io.c | 2 +- fs/fuse/dev.c | 22 +++- fs/fuse/file.c | 53 +++++++--- fs/nfs/direct.c | 10 +- include/linux/bio.h | 22 +++- include/linux/mm.h | 5 +- include/linux/uio.h | 11 ++ mm/gup.c | 115 +++++++++------------ net/9p/trans_common.c | 14 ++- net/9p/trans_common.h | 3 +- net/9p/trans_virtio.c | 18 +++- 24 files changed, 357 insertions(+), 170 deletions(-) -- 2.22.0
john.hubbard at gmail.com
2019-Jul-24 04:25 UTC
[PATCH 01/12] mm/gup: add make_dirty arg to put_user_pages_dirty_lock()
From: John Hubbard <jhubbard at nvidia.com> Provide more capable variation of put_user_pages_dirty_lock(), and delete put_user_pages_dirty(). This is based on the following: 1. Lots of call sites become simpler if a bool is passed into put_user_page*(), instead of making the call site choose which put_user_page*() variant to call. 2. Christoph Hellwig's observation that set_page_dirty_lock() is usually correct, and set_page_dirty() is usually a bug, or at least questionable, within a put_user_page*() calling chain. This leads to the following API choices: * put_user_pages_dirty_lock(page, npages, make_dirty) * There is no put_user_pages_dirty(). You have to hand code that, in the rare case that it's required. Cc: Matthew Wilcox <willy at infradead.org> Cc: Jan Kara <jack at suse.cz> Cc: Christoph Hellwig <hch at lst.de> Cc: Ira Weiny <ira.weiny at intel.com> Cc: Jason Gunthorpe <jgg at ziepe.ca> Signed-off-by: John Hubbard <jhubbard at nvidia.com> --- drivers/infiniband/core/umem.c | 5 +- drivers/infiniband/hw/hfi1/user_pages.c | 5 +- drivers/infiniband/hw/qib/qib_user_pages.c | 5 +- drivers/infiniband/hw/usnic/usnic_uiom.c | 5 +- drivers/infiniband/sw/siw/siw_mem.c | 8 +- include/linux/mm.h | 5 +- mm/gup.c | 115 +++++++++------------ 7 files changed, 58 insertions(+), 90 deletions(-) diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c index 08da840ed7ee..965cf9dea71a 100644 --- a/drivers/infiniband/core/umem.c +++ b/drivers/infiniband/core/umem.c @@ -54,10 +54,7 @@ static void __ib_umem_release(struct ib_device *dev, struct ib_umem *umem, int d for_each_sg_page(umem->sg_head.sgl, &sg_iter, umem->sg_nents, 0) { page = sg_page_iter_page(&sg_iter); - if (umem->writable && dirty) - put_user_pages_dirty_lock(&page, 1); - else - put_user_page(page); + put_user_pages_dirty_lock(&page, 1, umem->writable && dirty); } sg_free_table(&umem->sg_head); diff --git a/drivers/infiniband/hw/hfi1/user_pages.c b/drivers/infiniband/hw/hfi1/user_pages.c index b89a9b9aef7a..469acb961fbd 100644 --- a/drivers/infiniband/hw/hfi1/user_pages.c +++ b/drivers/infiniband/hw/hfi1/user_pages.c @@ -118,10 +118,7 @@ int hfi1_acquire_user_pages(struct mm_struct *mm, unsigned long vaddr, size_t np void hfi1_release_user_pages(struct mm_struct *mm, struct page **p, size_t npages, bool dirty) { - if (dirty) - put_user_pages_dirty_lock(p, npages); - else - put_user_pages(p, npages); + put_user_pages_dirty_lock(p, npages, dirty); if (mm) { /* during close after signal, mm can be NULL */ atomic64_sub(npages, &mm->pinned_vm); diff --git a/drivers/infiniband/hw/qib/qib_user_pages.c b/drivers/infiniband/hw/qib/qib_user_pages.c index bfbfbb7e0ff4..6bf764e41891 100644 --- a/drivers/infiniband/hw/qib/qib_user_pages.c +++ b/drivers/infiniband/hw/qib/qib_user_pages.c @@ -40,10 +40,7 @@ static void __qib_release_user_pages(struct page **p, size_t num_pages, int dirty) { - if (dirty) - put_user_pages_dirty_lock(p, num_pages); - else - put_user_pages(p, num_pages); + put_user_pages_dirty_lock(p, num_pages, dirty); } /** diff --git a/drivers/infiniband/hw/usnic/usnic_uiom.c b/drivers/infiniband/hw/usnic/usnic_uiom.c index 0b0237d41613..62e6ffa9ad78 100644 --- a/drivers/infiniband/hw/usnic/usnic_uiom.c +++ b/drivers/infiniband/hw/usnic/usnic_uiom.c @@ -75,10 +75,7 @@ static void usnic_uiom_put_pages(struct list_head *chunk_list, int dirty) for_each_sg(chunk->page_list, sg, chunk->nents, i) { page = sg_page(sg); pa = sg_phys(sg); - if (dirty) - put_user_pages_dirty_lock(&page, 1); - else - put_user_page(page); + put_user_pages_dirty_lock(&page, 1, dirty); usnic_dbg("pa: %pa\n", &pa); } kfree(chunk); diff --git a/drivers/infiniband/sw/siw/siw_mem.c b/drivers/infiniband/sw/siw/siw_mem.c index 67171c82b0c4..358d440efa11 100644 --- a/drivers/infiniband/sw/siw/siw_mem.c +++ b/drivers/infiniband/sw/siw/siw_mem.c @@ -65,13 +65,7 @@ static void siw_free_plist(struct siw_page_chunk *chunk, int num_pages, { struct page **p = chunk->plist; - while (num_pages--) { - if (!PageDirty(*p) && dirty) - put_user_pages_dirty_lock(p, 1); - else - put_user_page(*p); - p++; - } + put_user_pages_dirty_lock(chunk->plist, num_pages, dirty); } void siw_umem_release(struct siw_umem *umem, bool dirty) diff --git a/include/linux/mm.h b/include/linux/mm.h index 0334ca97c584..9759b6a24420 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1057,8 +1057,9 @@ static inline void put_user_page(struct page *page) put_page(page); } -void put_user_pages_dirty(struct page **pages, unsigned long npages); -void put_user_pages_dirty_lock(struct page **pages, unsigned long npages); +void put_user_pages_dirty_lock(struct page **pages, unsigned long npages, + bool make_dirty); + void put_user_pages(struct page **pages, unsigned long npages); #if defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP) diff --git a/mm/gup.c b/mm/gup.c index 98f13ab37bac..7fefd7ab02c4 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -29,85 +29,70 @@ struct follow_page_context { unsigned int page_mask; }; -typedef int (*set_dirty_func_t)(struct page *page); - -static void __put_user_pages_dirty(struct page **pages, - unsigned long npages, - set_dirty_func_t sdf) -{ - unsigned long index; - - for (index = 0; index < npages; index++) { - struct page *page = compound_head(pages[index]); - - /* - * Checking PageDirty at this point may race with - * clear_page_dirty_for_io(), but that's OK. Two key cases: - * - * 1) This code sees the page as already dirty, so it skips - * the call to sdf(). That could happen because - * clear_page_dirty_for_io() called page_mkclean(), - * followed by set_page_dirty(). However, now the page is - * going to get written back, which meets the original - * intention of setting it dirty, so all is well: - * clear_page_dirty_for_io() goes on to call - * TestClearPageDirty(), and write the page back. - * - * 2) This code sees the page as clean, so it calls sdf(). - * The page stays dirty, despite being written back, so it - * gets written back again in the next writeback cycle. - * This is harmless. - */ - if (!PageDirty(page)) - sdf(page); - - put_user_page(page); - } -} - /** - * put_user_pages_dirty() - release and dirty an array of gup-pinned pages - * @pages: array of pages to be marked dirty and released. + * put_user_pages_dirty_lock() - release and optionally dirty gup-pinned pages + * @pages: array of pages to be maybe marked dirty, and definitely released. * @npages: number of pages in the @pages array. + * @make_dirty: whether to mark the pages dirty * * "gup-pinned page" refers to a page that has had one of the get_user_pages() * variants called on that page. * * For each page in the @pages array, make that page (or its head page, if a - * compound page) dirty, if it was previously listed as clean. Then, release - * the page using put_user_page(). + * compound page) dirty, if @make_dirty is true, and if the page was previously + * listed as clean. In any case, releases all pages using put_user_page(), + * possibly via put_user_pages(), for the non-dirty case. * * Please see the put_user_page() documentation for details. * - * set_page_dirty(), which does not lock the page, is used here. - * Therefore, it is the caller's responsibility to ensure that this is - * safe. If not, then put_user_pages_dirty_lock() should be called instead. + * set_page_dirty_lock() is used internally. If instead, set_page_dirty() is + * required, then the caller should a) verify that this is really correct, + * because _lock() is usually required, and b) hand code it: + * set_page_dirty_lock(), put_user_page(). * */ -void put_user_pages_dirty(struct page **pages, unsigned long npages) +void put_user_pages_dirty_lock(struct page **pages, unsigned long npages, + bool make_dirty) { - __put_user_pages_dirty(pages, npages, set_page_dirty); -} -EXPORT_SYMBOL(put_user_pages_dirty); + unsigned long index; -/** - * put_user_pages_dirty_lock() - release and dirty an array of gup-pinned pages - * @pages: array of pages to be marked dirty and released. - * @npages: number of pages in the @pages array. - * - * For each page in the @pages array, make that page (or its head page, if a - * compound page) dirty, if it was previously listed as clean. Then, release - * the page using put_user_page(). - * - * Please see the put_user_page() documentation for details. - * - * This is just like put_user_pages_dirty(), except that it invokes - * set_page_dirty_lock(), instead of set_page_dirty(). - * - */ -void put_user_pages_dirty_lock(struct page **pages, unsigned long npages) -{ - __put_user_pages_dirty(pages, npages, set_page_dirty_lock); + /* + * TODO: this can be optimized for huge pages: if a series of pages is + * physically contiguous and part of the same compound page, then a + * single operation to the head page should suffice. + */ + + if (!make_dirty) { + put_user_pages(pages, npages); + return; + } + + for (index = 0; index < npages; index++) { + struct page *page = compound_head(pages[index]); + /* + * Checking PageDirty at this point may race with + * clear_page_dirty_for_io(), but that's OK. Two key + * cases: + * + * 1) This code sees the page as already dirty, so it + * skips the call to set_page_dirty(). That could happen + * because clear_page_dirty_for_io() called + * page_mkclean(), followed by set_page_dirty(). + * However, now the page is going to get written back, + * which meets the original intention of setting it + * dirty, so all is well: clear_page_dirty_for_io() goes + * on to call TestClearPageDirty(), and write the page + * back. + * + * 2) This code sees the page as clean, so it calls + * set_page_dirty(). The page stays dirty, despite being + * written back, so it gets written back again in the + * next writeback cycle. This is harmless. + */ + if (!PageDirty(page)) + set_page_dirty_lock(page); + put_user_page(page); + } } EXPORT_SYMBOL(put_user_pages_dirty_lock); -- 2.22.0
john.hubbard at gmail.com
2019-Jul-24 04:25 UTC
[PATCH 02/12] iov_iter: add helper to test if an iter would use GUP v2
From: J?r?me Glisse <jglisse at redhat.com> Add a helper to test if call to iov_iter_get_pages*() with a given iter would result in calls to GUP (get_user_pages*()). We want to use different tracking of page references if they are coming from GUP (get_user_pages*()) and thus we need to know when GUP is used for a given iter. Changes since J?r?me's original patch: * iov_iter_get_pages_use_gup(): do not return true for the ITER_PIPE case, because iov_iter_get_pages() calls pipe_get_pages(), which in turn uses get_page(), not get_user_pages(). * Remove some obsolete code, as part of rebasing onto Linux 5.3. * Fix up the kerneldoc comment to "Return:" rather than "Returns:", and a few other grammatical tweaks. Signed-off-by: J?r?me Glisse <jglisse at redhat.com> Signed-off-by: John Hubbard <jhubbard at nvidia.com> Cc: linux-fsdevel at vger.kernel.org Cc: linux-block at vger.kernel.org Cc: linux-mm at kvack.org Cc: John Hubbard <jhubbard at nvidia.com> Cc: Jan Kara <jack at suse.cz> Cc: Dan Williams <dan.j.williams at intel.com> Cc: Alexander Viro <viro at zeniv.linux.org.uk> Cc: Johannes Thumshirn <jthumshirn at suse.de> Cc: Christoph Hellwig <hch at lst.de> Cc: Jens Axboe <axboe at kernel.dk> Cc: Ming Lei <ming.lei at redhat.com> Cc: Dave Chinner <david at fromorbit.com> Cc: Jason Gunthorpe <jgg at ziepe.ca> Cc: Matthew Wilcox <willy at infradead.org> --- include/linux/uio.h | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/include/linux/uio.h b/include/linux/uio.h index ab5f523bc0df..2a179af8e5a7 100644 --- a/include/linux/uio.h +++ b/include/linux/uio.h @@ -86,6 +86,17 @@ static inline unsigned char iov_iter_rw(const struct iov_iter *i) return i->type & (READ | WRITE); } +/** + * iov_iter_get_pages_use_gup - report if iov_iter_get_pages(i) uses GUP + * @i: iterator + * Return: true if a call to iov_iter_get_pages*() with the iter provided in + * the argument would result in the use of get_user_pages*() + */ +static inline bool iov_iter_get_pages_use_gup(const struct iov_iter *i) +{ + return iov_iter_type(i) == ITER_IOVEC; +} + /* * Total number of bytes covered by an iovec. * -- 2.22.0
john.hubbard at gmail.com
2019-Jul-24 04:25 UTC
[PATCH 03/12] block: bio_release_pages: use flags arg instead of bool
From: John Hubbard <jhubbard at nvidia.com> In commit d241a95f3514 ("block: optionally mark pages dirty in bio_release_pages"), new "bool mark_dirty" argument was added to bio_release_pages. In upcoming work, another bool argument (to indicate that the pages came from get_user_pages) is going to be added. That's one bool too many, because it's not desirable have calls of the form: foo(true, false, true, etc); In order to prepare for that, change the argument from a bool, to a typesafe (enum-based) flags argument. Cc: Christoph Hellwig <hch at infradead.org> Cc: J?r?me Glisse <jglisse at redhat.com> Cc: Minwoo Im <minwoo.im.dev at gmail.com> Cc: Jens Axboe <axboe at kernel.dk> Signed-off-by: John Hubbard <jhubbard at nvidia.com> --- block/bio.c | 12 ++++++------ fs/block_dev.c | 4 ++-- fs/direct-io.c | 2 +- include/linux/bio.h | 13 ++++++++++++- 4 files changed, 21 insertions(+), 10 deletions(-) diff --git a/block/bio.c b/block/bio.c index 299a0e7651ec..7675e2de509d 100644 --- a/block/bio.c +++ b/block/bio.c @@ -833,7 +833,7 @@ int bio_add_page(struct bio *bio, struct page *page, } EXPORT_SYMBOL(bio_add_page); -void bio_release_pages(struct bio *bio, bool mark_dirty) +void bio_release_pages(struct bio *bio, enum bio_rp_flags_t flags) { struct bvec_iter_all iter_all; struct bio_vec *bvec; @@ -842,7 +842,7 @@ void bio_release_pages(struct bio *bio, bool mark_dirty) return; bio_for_each_segment_all(bvec, bio, iter_all) { - if (mark_dirty && !PageCompound(bvec->bv_page)) + if ((flags & BIO_RP_MARK_DIRTY) && !PageCompound(bvec->bv_page)) set_page_dirty_lock(bvec->bv_page); put_page(bvec->bv_page); } @@ -1421,7 +1421,7 @@ struct bio *bio_map_user_iov(struct request_queue *q, return bio; out_unmap: - bio_release_pages(bio, false); + bio_release_pages(bio, BIO_RP_NORMAL); bio_put(bio); return ERR_PTR(ret); } @@ -1437,7 +1437,7 @@ struct bio *bio_map_user_iov(struct request_queue *q, */ void bio_unmap_user(struct bio *bio) { - bio_release_pages(bio, bio_data_dir(bio) == READ); + bio_release_pages(bio, bio_rp_dirty_flag(bio_data_dir(bio) == READ)); bio_put(bio); bio_put(bio); } @@ -1683,7 +1683,7 @@ static void bio_dirty_fn(struct work_struct *work) while ((bio = next) != NULL) { next = bio->bi_private; - bio_release_pages(bio, true); + bio_release_pages(bio, BIO_RP_MARK_DIRTY); bio_put(bio); } } @@ -1699,7 +1699,7 @@ void bio_check_pages_dirty(struct bio *bio) goto defer; } - bio_release_pages(bio, false); + bio_release_pages(bio, BIO_RP_NORMAL); bio_put(bio); return; defer: diff --git a/fs/block_dev.c b/fs/block_dev.c index 4707dfff991b..9fe6616f8788 100644 --- a/fs/block_dev.c +++ b/fs/block_dev.c @@ -259,7 +259,7 @@ __blkdev_direct_IO_simple(struct kiocb *iocb, struct iov_iter *iter, } __set_current_state(TASK_RUNNING); - bio_release_pages(&bio, should_dirty); + bio_release_pages(&bio, bio_rp_dirty_flag(should_dirty)); if (unlikely(bio.bi_status)) ret = blk_status_to_errno(bio.bi_status); @@ -329,7 +329,7 @@ static void blkdev_bio_end_io(struct bio *bio) if (should_dirty) { bio_check_pages_dirty(bio); } else { - bio_release_pages(bio, false); + bio_release_pages(bio, BIO_RP_NORMAL); bio_put(bio); } } diff --git a/fs/direct-io.c b/fs/direct-io.c index ae196784f487..423ef431ddda 100644 --- a/fs/direct-io.c +++ b/fs/direct-io.c @@ -551,7 +551,7 @@ static blk_status_t dio_bio_complete(struct dio *dio, struct bio *bio) if (dio->is_async && should_dirty) { bio_check_pages_dirty(bio); /* transfers ownership */ } else { - bio_release_pages(bio, should_dirty); + bio_release_pages(bio, bio_rp_dirty_flag(should_dirty)); bio_put(bio); } return err; diff --git a/include/linux/bio.h b/include/linux/bio.h index 3cdb84cdc488..2715e55679c1 100644 --- a/include/linux/bio.h +++ b/include/linux/bio.h @@ -440,7 +440,18 @@ bool __bio_try_merge_page(struct bio *bio, struct page *page, void __bio_add_page(struct bio *bio, struct page *page, unsigned int len, unsigned int off); int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter); -void bio_release_pages(struct bio *bio, bool mark_dirty); + +enum bio_rp_flags_t { + BIO_RP_NORMAL = 0, + BIO_RP_MARK_DIRTY = 1, +}; + +static inline enum bio_rp_flags_t bio_rp_dirty_flag(bool mark_dirty) +{ + return mark_dirty ? BIO_RP_MARK_DIRTY : BIO_RP_NORMAL; +} + +void bio_release_pages(struct bio *bio, enum bio_rp_flags_t flags); struct rq_map_data; extern struct bio *bio_map_user_iov(struct request_queue *, struct iov_iter *, gfp_t); -- 2.22.0
john.hubbard at gmail.com
2019-Jul-24 04:25 UTC
[PATCH 04/12] block: bio_release_pages: convert put_page() to put_user_page*()
From: J?r?me Glisse <jglisse at redhat.com> For pages that were retained via get_user_pages*(), release those pages via the new put_user_page*() routines, instead of via put_page() or release_pages(). This is part a tree-wide conversion, as described in commit fc1d8e7cca2d ("mm: introduce put_user_page*(), placeholder versions"). Changes from J?r?me's original patch: * reworked to be compatible with recent bio_release_pages() changes, * refactored slightly to remove some code duplication, * use an approach that changes fewer bio_check_pages_dirty() callers. Signed-off-by: J?r?me Glisse <jglisse at redhat.com> Signed-off-by: John Hubbard <jhubbard at nvidia.com> Cc: Christoph Hellwig <hch at infradead.org> Cc: Minwoo Im <minwoo.im.dev at gmail.com> Cc: Jens Axboe <axboe at kernel.dk> --- block/bio.c | 60 ++++++++++++++++++++++++++++++++++++--------- include/linux/bio.h | 1 + 2 files changed, 49 insertions(+), 12 deletions(-) diff --git a/block/bio.c b/block/bio.c index 7675e2de509d..74f9eba2583b 100644 --- a/block/bio.c +++ b/block/bio.c @@ -844,7 +844,11 @@ void bio_release_pages(struct bio *bio, enum bio_rp_flags_t flags) bio_for_each_segment_all(bvec, bio, iter_all) { if ((flags & BIO_RP_MARK_DIRTY) && !PageCompound(bvec->bv_page)) set_page_dirty_lock(bvec->bv_page); - put_page(bvec->bv_page); + + if (flags & BIO_RP_FROM_GUP) + put_user_page(bvec->bv_page); + else + put_page(bvec->bv_page); } } @@ -1667,28 +1671,50 @@ static void bio_dirty_fn(struct work_struct *work); static DECLARE_WORK(bio_dirty_work, bio_dirty_fn); static DEFINE_SPINLOCK(bio_dirty_lock); static struct bio *bio_dirty_list; +static struct bio *bio_gup_dirty_list; -/* - * This runs in process context - */ -static void bio_dirty_fn(struct work_struct *work) +static void __bio_dirty_fn(struct work_struct *work, + struct bio **dirty_list, + enum bio_rp_flags_t flags) { struct bio *bio, *next; spin_lock_irq(&bio_dirty_lock); - next = bio_dirty_list; - bio_dirty_list = NULL; + next = *dirty_list; + *dirty_list = NULL; spin_unlock_irq(&bio_dirty_lock); while ((bio = next) != NULL) { next = bio->bi_private; - bio_release_pages(bio, BIO_RP_MARK_DIRTY); + bio_release_pages(bio, BIO_RP_MARK_DIRTY | flags); bio_put(bio); } } -void bio_check_pages_dirty(struct bio *bio) +/* + * This runs in process context + */ +static void bio_dirty_fn(struct work_struct *work) +{ + __bio_dirty_fn(work, &bio_dirty_list, BIO_RP_NORMAL); + __bio_dirty_fn(work, &bio_gup_dirty_list, BIO_RP_FROM_GUP); +} + +/** + * __bio_check_pages_dirty() - queue up pages on a workqueue to dirty them + * @bio: the bio struct containing the pages we should dirty + * @from_gup: did the pages in the bio came from GUP (get_user_pages*()) + * + * This will go over all pages in the bio, and for each non dirty page, the + * bio is added to a list of bio's that need to get their pages dirtied. + * + * We also need to know if the pages in the bio are coming from GUP or not, + * as GUPed pages need to be released via put_user_page(), instead of + * put_page(). Please see Documentation/vm/get_user_pages.rst for details + * on that. + */ +void __bio_check_pages_dirty(struct bio *bio, bool from_gup) { struct bio_vec *bvec; unsigned long flags; @@ -1699,17 +1725,27 @@ void bio_check_pages_dirty(struct bio *bio) goto defer; } - bio_release_pages(bio, BIO_RP_NORMAL); + bio_release_pages(bio, from_gup ? BIO_RP_FROM_GUP : BIO_RP_NORMAL); bio_put(bio); return; defer: spin_lock_irqsave(&bio_dirty_lock, flags); - bio->bi_private = bio_dirty_list; - bio_dirty_list = bio; + if (from_gup) { + bio->bi_private = bio_gup_dirty_list; + bio_gup_dirty_list = bio; + } else { + bio->bi_private = bio_dirty_list; + bio_dirty_list = bio; + } spin_unlock_irqrestore(&bio_dirty_lock, flags); schedule_work(&bio_dirty_work); } +void bio_check_pages_dirty(struct bio *bio) +{ + __bio_check_pages_dirty(bio, false); +} + void update_io_ticks(struct hd_struct *part, unsigned long now) { unsigned long stamp; diff --git a/include/linux/bio.h b/include/linux/bio.h index 2715e55679c1..d68a40c2c9d4 100644 --- a/include/linux/bio.h +++ b/include/linux/bio.h @@ -444,6 +444,7 @@ int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter); enum bio_rp_flags_t { BIO_RP_NORMAL = 0, BIO_RP_MARK_DIRTY = 1, + BIO_RP_FROM_GUP = 2, }; static inline enum bio_rp_flags_t bio_rp_dirty_flag(bool mark_dirty) -- 2.22.0
john.hubbard at gmail.com
2019-Jul-24 04:25 UTC
[PATCH 05/12] block_dev: convert put_page() to put_user_page*()
From: J?r?me Glisse <jglisse at redhat.com> For pages that were retained via get_user_pages*(), release those pages via the new put_user_page*() routines, instead of via put_page() or release_pages(). This is part a tree-wide conversion, as described in commit fc1d8e7cca2d ("mm: introduce put_user_page*(), placeholder versions"). Changes from J?r?me's original patch: * reworked to be compatible with recent bio_release_pages() changes. Signed-off-by: J?r?me Glisse <jglisse at redhat.com> Signed-off-by: John Hubbard <jhubbard at nvidia.com> Cc: linux-fsdevel at vger.kernel.org Cc: linux-block at vger.kernel.org Cc: linux-mm at kvack.org Cc: Jan Kara <jack at suse.cz> Cc: Dan Williams <dan.j.williams at intel.com> Cc: Alexander Viro <viro at zeniv.linux.org.uk> Cc: Johannes Thumshirn <jthumshirn at suse.de> Cc: Christoph Hellwig <hch at lst.de> Cc: Jens Axboe <axboe at kernel.dk> Cc: Ming Lei <ming.lei at redhat.com> Cc: Dave Chinner <david at fromorbit.com> Cc: Jason Gunthorpe <jgg at ziepe.ca> Cc: Matthew Wilcox <willy at infradead.org> Cc: Boaz Harrosh <boaz at plexistor.com> --- block/bio.c | 13 +++++++++++++ fs/block_dev.c | 22 +++++++++++++++++----- include/linux/bio.h | 8 ++++++++ 3 files changed, 38 insertions(+), 5 deletions(-) diff --git a/block/bio.c b/block/bio.c index 74f9eba2583b..3b9f66e64bc1 100644 --- a/block/bio.c +++ b/block/bio.c @@ -1746,6 +1746,19 @@ void bio_check_pages_dirty(struct bio *bio) __bio_check_pages_dirty(bio, false); } +enum bio_rp_flags_t bio_rp_flags(struct iov_iter *iter, bool mark_dirty) +{ + enum bio_rp_flags_t flags = BIO_RP_NORMAL; + + if (mark_dirty) + flags |= BIO_RP_MARK_DIRTY; + + if (iov_iter_get_pages_use_gup(iter)) + flags |= BIO_RP_FROM_GUP; + + return flags; +} + void update_io_ticks(struct hd_struct *part, unsigned long now) { unsigned long stamp; diff --git a/fs/block_dev.c b/fs/block_dev.c index 9fe6616f8788..d53abaf31e54 100644 --- a/fs/block_dev.c +++ b/fs/block_dev.c @@ -259,7 +259,7 @@ __blkdev_direct_IO_simple(struct kiocb *iocb, struct iov_iter *iter, } __set_current_state(TASK_RUNNING); - bio_release_pages(&bio, bio_rp_dirty_flag(should_dirty)); + bio_release_pages(&bio, bio_rp_flags(iter, should_dirty)); if (unlikely(bio.bi_status)) ret = blk_status_to_errno(bio.bi_status); @@ -295,7 +295,7 @@ static int blkdev_iopoll(struct kiocb *kiocb, bool wait) return blk_poll(q, READ_ONCE(kiocb->ki_cookie), wait); } -static void blkdev_bio_end_io(struct bio *bio) +static void _blkdev_bio_end_io(struct bio *bio, bool from_gup) { struct blkdev_dio *dio = bio->bi_private; bool should_dirty = dio->should_dirty; @@ -327,13 +327,23 @@ static void blkdev_bio_end_io(struct bio *bio) } if (should_dirty) { - bio_check_pages_dirty(bio); + __bio_check_pages_dirty(bio, from_gup); } else { - bio_release_pages(bio, BIO_RP_NORMAL); + bio_release_pages(bio, bio_rp_gup_flag(from_gup)); bio_put(bio); } } +static void blkdev_bio_end_io(struct bio *bio) +{ + _blkdev_bio_end_io(bio, false); +} + +static void blkdev_bio_from_gup_end_io(struct bio *bio) +{ + _blkdev_bio_end_io(bio, true); +} + static ssize_t __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter, int nr_pages) { @@ -380,7 +390,9 @@ __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter, int nr_pages) bio->bi_iter.bi_sector = pos >> 9; bio->bi_write_hint = iocb->ki_hint; bio->bi_private = dio; - bio->bi_end_io = blkdev_bio_end_io; + bio->bi_end_io = iov_iter_get_pages_use_gup(iter) ? + blkdev_bio_from_gup_end_io : + blkdev_bio_end_io; bio->bi_ioprio = iocb->ki_ioprio; ret = bio_iov_iter_get_pages(bio, iter); diff --git a/include/linux/bio.h b/include/linux/bio.h index d68a40c2c9d4..b9460d1a4679 100644 --- a/include/linux/bio.h +++ b/include/linux/bio.h @@ -452,6 +452,13 @@ static inline enum bio_rp_flags_t bio_rp_dirty_flag(bool mark_dirty) return mark_dirty ? BIO_RP_MARK_DIRTY : BIO_RP_NORMAL; } +static inline enum bio_rp_flags_t bio_rp_gup_flag(bool from_gup) +{ + return from_gup ? BIO_RP_FROM_GUP : BIO_RP_NORMAL; +} + +enum bio_rp_flags_t bio_rp_flags(struct iov_iter *iter, bool mark_dirty); + void bio_release_pages(struct bio *bio, enum bio_rp_flags_t flags); struct rq_map_data; extern struct bio *bio_map_user_iov(struct request_queue *, @@ -463,6 +470,7 @@ extern struct bio *bio_copy_kern(struct request_queue *, void *, unsigned int, gfp_t, int); extern void bio_set_pages_dirty(struct bio *bio); extern void bio_check_pages_dirty(struct bio *bio); +void __bio_check_pages_dirty(struct bio *bio, bool from_gup); void generic_start_io_acct(struct request_queue *q, int op, unsigned long sectors, struct hd_struct *part); -- 2.22.0
john.hubbard at gmail.com
2019-Jul-24 04:25 UTC
[PATCH 06/12] fs/nfs: convert put_page() to put_user_page*()
From: J?r?me Glisse <jglisse at redhat.com> For pages that were retained via get_user_pages*(), release those pages via the new put_user_page*() routines, instead of via put_page() or release_pages(). This is part a tree-wide conversion, as described in commit fc1d8e7cca2d ("mm: introduce put_user_page*(), placeholder versions"). Signed-off-by: J?r?me Glisse <jglisse at redhat.com> Signed-off-by: John Hubbard <jhubbard at nvidia.com> Cc: linux-fsdevel at vger.kernel.org Cc: linux-block at vger.kernel.org Cc: linux-mm at kvack.org Cc: linux-nfs at vger.kernel.org Cc: Jan Kara <jack at suse.cz> Cc: Dan Williams <dan.j.williams at intel.com> Cc: Alexander Viro <viro at zeniv.linux.org.uk> Cc: Johannes Thumshirn <jthumshirn at suse.de> Cc: Christoph Hellwig <hch at lst.de> Cc: Jens Axboe <axboe at kernel.dk> Cc: Ming Lei <ming.lei at redhat.com> Cc: Dave Chinner <david at fromorbit.com> Cc: Jason Gunthorpe <jgg at ziepe.ca> Cc: Matthew Wilcox <willy at infradead.org> Cc: Boaz Harrosh <boaz at plexistor.com> Cc: Trond Myklebust <trond.myklebust at hammerspace.com> Cc: Anna Schumaker <anna.schumaker at netapp.com> --- fs/nfs/direct.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c index 0cb442406168..35f30fe2900f 100644 --- a/fs/nfs/direct.c +++ b/fs/nfs/direct.c @@ -512,7 +512,10 @@ static ssize_t nfs_direct_read_schedule_iovec(struct nfs_direct_req *dreq, pos += req_len; dreq->bytes_left -= req_len; } - nfs_direct_release_pages(pagevec, npages); + if (iov_iter_get_pages_use_gup(iter)) + put_user_pages(pagevec, npages); + else + nfs_direct_release_pages(pagevec, npages); kvfree(pagevec); if (result < 0) break; @@ -935,7 +938,10 @@ static ssize_t nfs_direct_write_schedule_iovec(struct nfs_direct_req *dreq, pos += req_len; dreq->bytes_left -= req_len; } - nfs_direct_release_pages(pagevec, npages); + if (iov_iter_get_pages_use_gup(iter)) + put_user_pages(pagevec, npages); + else + nfs_direct_release_pages(pagevec, npages); kvfree(pagevec); if (result < 0) break; -- 2.22.0
john.hubbard at gmail.com
2019-Jul-24 04:25 UTC
[PATCH 07/12] vhost-scsi: convert put_page() to put_user_page*()
From: J?r?me Glisse <jglisse at redhat.com> For pages that were retained via get_user_pages*(), release those pages via the new put_user_page*() routines, instead of via put_page(). This is part a tree-wide conversion, as described in commit fc1d8e7cca2d ("mm: introduce put_user_page*(), placeholder versions"). Changes from J?r?me's original patch: * Changed a WARN_ON to a BUG_ON. Signed-off-by: J?r?me Glisse <jglisse at redhat.com> Signed-off-by: John Hubbard <jhubbard at nvidia.com> Cc: virtualization at lists.linux-foundation.org Cc: linux-fsdevel at vger.kernel.org Cc: linux-block at vger.kernel.org Cc: linux-mm at kvack.org Cc: Jan Kara <jack at suse.cz> Cc: Dan Williams <dan.j.williams at intel.com> Cc: Alexander Viro <viro at zeniv.linux.org.uk> Cc: Johannes Thumshirn <jthumshirn at suse.de> Cc: Christoph Hellwig <hch at lst.de> Cc: Jens Axboe <axboe at kernel.dk> Cc: Ming Lei <ming.lei at redhat.com> Cc: Dave Chinner <david at fromorbit.com> Cc: Jason Gunthorpe <jgg at ziepe.ca> Cc: Matthew Wilcox <willy at infradead.org> Cc: Boaz Harrosh <boaz at plexistor.com> Cc: Miklos Szeredi <miklos at szeredi.hu> Cc: "Michael S. Tsirkin" <mst at redhat.com> Cc: Jason Wang <jasowang at redhat.com> Cc: Paolo Bonzini <pbonzini at redhat.com> Cc: Stefan Hajnoczi <stefanha at redhat.com> --- drivers/vhost/scsi.c | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c index a9caf1bc3c3e..282565ab5e3f 100644 --- a/drivers/vhost/scsi.c +++ b/drivers/vhost/scsi.c @@ -329,11 +329,11 @@ static void vhost_scsi_release_cmd(struct se_cmd *se_cmd) if (tv_cmd->tvc_sgl_count) { for (i = 0; i < tv_cmd->tvc_sgl_count; i++) - put_page(sg_page(&tv_cmd->tvc_sgl[i])); + put_user_page(sg_page(&tv_cmd->tvc_sgl[i])); } if (tv_cmd->tvc_prot_sgl_count) { for (i = 0; i < tv_cmd->tvc_prot_sgl_count; i++) - put_page(sg_page(&tv_cmd->tvc_prot_sgl[i])); + put_user_page(sg_page(&tv_cmd->tvc_prot_sgl[i])); } vhost_scsi_put_inflight(tv_cmd->inflight); @@ -630,6 +630,13 @@ vhost_scsi_map_to_sgl(struct vhost_scsi_cmd *cmd, size_t offset; unsigned int npages = 0; + /* + * Here in all cases we should have an IOVEC which use GUP. If that is + * not the case then we will wrongly call put_user_page() and the page + * refcount will go wrong (this is in vhost_scsi_release_cmd()) + */ + WARN_ON(!iov_iter_get_pages_use_gup(iter)); + bytes = iov_iter_get_pages(iter, pages, LONG_MAX, VHOST_SCSI_PREALLOC_UPAGES, &offset); /* No pages were pinned */ @@ -681,7 +688,7 @@ vhost_scsi_iov_to_sgl(struct vhost_scsi_cmd *cmd, bool write, while (p < sg) { struct page *page = sg_page(p++); if (page) - put_page(page); + put_user_page(page); } return ret; } -- 2.22.0
john.hubbard at gmail.com
2019-Jul-24 04:25 UTC
[PATCH 08/12] fs/cifs: convert put_page() to put_user_page*()
From: J?r?me Glisse <jglisse at redhat.com> For pages that were retained via get_user_pages*(), release those pages via the new put_user_page*() routines, instead of via put_page(). This is part a tree-wide conversion, as described in commit fc1d8e7cca2d ("mm: introduce put_user_page*(), placeholder versions"). Signed-off-by: J?r?me Glisse <jglisse at redhat.com> Signed-off-by: John Hubbard <jhubbard at nvidia.com> Cc: linux-fsdevel at vger.kernel.org Cc: linux-block at vger.kernel.org Cc: linux-mm at kvack.org Cc: linux-cifs at vger.kernel.org Cc: Jan Kara <jack at suse.cz> Cc: Dan Williams <dan.j.williams at intel.com> Cc: Alexander Viro <viro at zeniv.linux.org.uk> Cc: Johannes Thumshirn <jthumshirn at suse.de> Cc: Christoph Hellwig <hch at lst.de> Cc: Jens Axboe <axboe at kernel.dk> Cc: Ming Lei <ming.lei at redhat.com> Cc: Dave Chinner <david at fromorbit.com> Cc: Jason Gunthorpe <jgg at ziepe.ca> Cc: Matthew Wilcox <willy at infradead.org> Cc: Boaz Harrosh <boaz at plexistor.com> Cc: Steve French <sfrench at samba.org> --- fs/cifs/cifsglob.h | 3 +++ fs/cifs/file.c | 22 +++++++++++++++++----- fs/cifs/misc.c | 19 +++++++++++++++---- 3 files changed, 35 insertions(+), 9 deletions(-) diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h index fe610e7e3670..e95cb82bfa50 100644 --- a/fs/cifs/cifsglob.h +++ b/fs/cifs/cifsglob.h @@ -1283,6 +1283,7 @@ struct cifs_aio_ctx { * If yes, iter is a copy of the user passed iov_iter */ bool direct_io; + bool from_gup; }; struct cifs_readdata; @@ -1317,6 +1318,7 @@ struct cifs_readdata { struct cifs_credits credits; unsigned int nr_pages; struct page **pages; + bool from_gup; }; struct cifs_writedata; @@ -1343,6 +1345,7 @@ struct cifs_writedata { struct cifs_credits credits; unsigned int nr_pages; struct page **pages; + bool from_gup; }; /* diff --git a/fs/cifs/file.c b/fs/cifs/file.c index 97090693d182..84fa7e0a578f 100644 --- a/fs/cifs/file.c +++ b/fs/cifs/file.c @@ -2571,8 +2571,13 @@ cifs_uncached_writedata_release(struct kref *refcount) struct cifs_writedata, refcount); kref_put(&wdata->ctx->refcount, cifs_aio_ctx_release); - for (i = 0; i < wdata->nr_pages; i++) - put_page(wdata->pages[i]); + if (wdata->from_gup) { + for (i = 0; i < wdata->nr_pages; i++) + put_user_page(wdata->pages[i]); + } else { + for (i = 0; i < wdata->nr_pages; i++) + put_page(wdata->pages[i]); + } cifs_writedata_release(refcount); } @@ -2781,7 +2786,7 @@ cifs_write_from_iter(loff_t offset, size_t len, struct iov_iter *from, break; } - + wdata->from_gup = iov_iter_get_pages_use_gup(from); wdata->page_offset = start; wdata->tailsz nr_pages > 1 ? @@ -2797,6 +2802,7 @@ cifs_write_from_iter(loff_t offset, size_t len, struct iov_iter *from, add_credits_and_wake_if(server, credits, 0); break; } + wdata->from_gup = false; rc = cifs_write_allocate_pages(wdata->pages, nr_pages); if (rc) { @@ -3238,8 +3244,12 @@ cifs_uncached_readdata_release(struct kref *refcount) unsigned int i; kref_put(&rdata->ctx->refcount, cifs_aio_ctx_release); - for (i = 0; i < rdata->nr_pages; i++) { - put_page(rdata->pages[i]); + if (rdata->from_gup) { + for (i = 0; i < rdata->nr_pages; i++) + put_user_page(rdata->pages[i]); + } else { + for (i = 0; i < rdata->nr_pages; i++) + put_page(rdata->pages[i]); } cifs_readdata_release(refcount); } @@ -3502,6 +3512,7 @@ cifs_send_async_read(loff_t offset, size_t len, struct cifsFileInfo *open_file, break; } + rdata->from_gup = iov_iter_get_pages_use_gup(&direct_iov); npages = (cur_len + start + PAGE_SIZE-1) / PAGE_SIZE; rdata->page_offset = start; rdata->tailsz = npages > 1 ? @@ -3519,6 +3530,7 @@ cifs_send_async_read(loff_t offset, size_t len, struct cifsFileInfo *open_file, rc = -ENOMEM; break; } + rdata->from_gup = false; rc = cifs_read_allocate_pages(rdata, npages); if (rc) { diff --git a/fs/cifs/misc.c b/fs/cifs/misc.c index f383877a6511..5a04c34fea05 100644 --- a/fs/cifs/misc.c +++ b/fs/cifs/misc.c @@ -822,10 +822,18 @@ cifs_aio_ctx_release(struct kref *refcount) if (ctx->bv) { unsigned i; - for (i = 0; i < ctx->npages; i++) { - if (ctx->should_dirty) - set_page_dirty(ctx->bv[i].bv_page); - put_page(ctx->bv[i].bv_page); + if (ctx->from_gup) { + for (i = 0; i < ctx->npages; i++) { + if (ctx->should_dirty) + set_page_dirty(ctx->bv[i].bv_page); + put_user_page(ctx->bv[i].bv_page); + } + } else { + for (i = 0; i < ctx->npages; i++) { + if (ctx->should_dirty) + set_page_dirty(ctx->bv[i].bv_page); + put_page(ctx->bv[i].bv_page); + } } kvfree(ctx->bv); } @@ -881,6 +889,9 @@ setup_aio_ctx_iter(struct cifs_aio_ctx *ctx, struct iov_iter *iter, int rw) saved_len = count; + /* This is only use by cifs_aio_ctx_release() */ + ctx->from_gup = iov_iter_get_pages_use_gup(iter); + while (count && npages < max_pages) { rc = iov_iter_get_pages(iter, pages, count, max_pages, &start); if (rc < 0) { -- 2.22.0
john.hubbard at gmail.com
2019-Jul-24 04:25 UTC
[PATCH 09/12] fs/fuse: convert put_page() to put_user_page*()
From: J?r?me Glisse <jglisse at redhat.com> For pages that were retained via get_user_pages*(), release those pages via the new put_user_page*() routines, instead of via put_page(). This is part a tree-wide conversion, as described in commit fc1d8e7cca2d ("mm: introduce put_user_page*(), placeholder versions"). Changes from J?r?me's original patch: * Use the enhanced put_user_pages_dirty_lock(). Signed-off-by: J?r?me Glisse <jglisse at redhat.com> Signed-off-by: John Hubbard <jhubbard at nvidia.com> Cc: linux-fsdevel at vger.kernel.org Cc: linux-block at vger.kernel.org Cc: linux-mm at kvack.org Cc: Jan Kara <jack at suse.cz> Cc: Dan Williams <dan.j.williams at intel.com> Cc: Alexander Viro <viro at zeniv.linux.org.uk> Cc: Johannes Thumshirn <jthumshirn at suse.de> Cc: Christoph Hellwig <hch at lst.de> Cc: Jens Axboe <axboe at kernel.dk> Cc: Ming Lei <ming.lei at redhat.com> Cc: Dave Chinner <david at fromorbit.com> Cc: Jason Gunthorpe <jgg at ziepe.ca> Cc: Matthew Wilcox <willy at infradead.org> Cc: Boaz Harrosh <boaz at plexistor.com> Cc: Miklos Szeredi <miklos at szeredi.hu> --- fs/fuse/dev.c | 22 +++++++++++++++++---- fs/fuse/file.c | 53 +++++++++++++++++++++++++++++++++++++------------- 2 files changed, 57 insertions(+), 18 deletions(-) diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c index ea8237513dfa..8ef65c9cd3f6 100644 --- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -780,6 +780,7 @@ struct fuse_copy_state { unsigned len; unsigned offset; unsigned move_pages:1; + bool from_gup; }; static void fuse_copy_init(struct fuse_copy_state *cs, int write, @@ -800,13 +801,22 @@ static void fuse_copy_finish(struct fuse_copy_state *cs) buf->len = PAGE_SIZE - cs->len; cs->currbuf = NULL; } else if (cs->pg) { - if (cs->write) { - flush_dcache_page(cs->pg); - set_page_dirty_lock(cs->pg); + if (cs->from_gup) { + if (cs->write) { + flush_dcache_page(cs->pg); + put_user_pages_dirty_lock(&cs->pg, 1, true); + } else + put_user_page(cs->pg); + } else { + if (cs->write) { + flush_dcache_page(cs->pg); + set_page_dirty_lock(cs->pg); + } + put_page(cs->pg); } - put_page(cs->pg); } cs->pg = NULL; + cs->from_gup = false; } /* @@ -834,6 +844,7 @@ static int fuse_copy_fill(struct fuse_copy_state *cs) BUG_ON(!cs->nr_segs); cs->currbuf = buf; cs->pg = buf->page; + cs->from_gup = false; cs->offset = buf->offset; cs->len = buf->len; cs->pipebufs++; @@ -851,6 +862,7 @@ static int fuse_copy_fill(struct fuse_copy_state *cs) buf->len = 0; cs->currbuf = buf; + cs->from_gup = false; cs->pg = page; cs->offset = 0; cs->len = PAGE_SIZE; @@ -866,6 +878,7 @@ static int fuse_copy_fill(struct fuse_copy_state *cs) cs->len = err; cs->offset = off; cs->pg = page; + cs->from_gup = iov_iter_get_pages_use_gup(cs->iter); iov_iter_advance(cs->iter, err); } @@ -1000,6 +1013,7 @@ static int fuse_try_move_page(struct fuse_copy_state *cs, struct page **pagep) unlock_page(newpage); out_fallback: cs->pg = buf->page; + cs->from_gup = false; cs->offset = buf->offset; err = lock_request(cs->req); diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 5ae2828beb00..c34c22ac5b22 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -543,12 +543,20 @@ void fuse_read_fill(struct fuse_req *req, struct file *file, loff_t pos, req->out.args[0].size = count; } -static void fuse_release_user_pages(struct fuse_req *req, bool should_dirty) +static void fuse_release_user_pages(struct fuse_req *req, bool should_dirty, + bool from_gup) { unsigned i; + if (from_gup) { + put_user_pages_dirty_lock(req->pages, req->num_pages, + should_dirty); + return; + } + for (i = 0; i < req->num_pages; i++) { struct page *page = req->pages[i]; + if (should_dirty) set_page_dirty_lock(page); put_page(page); @@ -621,12 +629,13 @@ static void fuse_aio_complete(struct fuse_io_priv *io, int err, ssize_t pos) kref_put(&io->refcnt, fuse_io_release); } -static void fuse_aio_complete_req(struct fuse_conn *fc, struct fuse_req *req) +static void _fuse_aio_complete_req(struct fuse_conn *fc, struct fuse_req *req, + bool from_gup) { struct fuse_io_priv *io = req->io; ssize_t pos = -1; - fuse_release_user_pages(req, io->should_dirty); + fuse_release_user_pages(req, io->should_dirty, from_gup); if (io->write) { if (req->misc.write.in.size != req->misc.write.out.size) @@ -641,8 +650,18 @@ static void fuse_aio_complete_req(struct fuse_conn *fc, struct fuse_req *req) fuse_aio_complete(io, req->out.h.error, pos); } +static void fuse_aio_from_gup_complete_req(struct fuse_conn *fc, struct fuse_req *req) +{ + _fuse_aio_complete_req(fc, req, true); +} + +static void fuse_aio_complete_req(struct fuse_conn *fc, struct fuse_req *req) +{ + _fuse_aio_complete_req(fc, req, false); +} + static size_t fuse_async_req_send(struct fuse_conn *fc, struct fuse_req *req, - size_t num_bytes, struct fuse_io_priv *io) + size_t num_bytes, struct fuse_io_priv *io, bool from_gup) { spin_lock(&io->lock); kref_get(&io->refcnt); @@ -651,7 +670,8 @@ static size_t fuse_async_req_send(struct fuse_conn *fc, struct fuse_req *req, spin_unlock(&io->lock); req->io = io; - req->end = fuse_aio_complete_req; + req->end = from_gup ? fuse_aio_from_gup_complete_req : + fuse_aio_complete_req; __fuse_get_request(req); fuse_request_send_background(fc, req); @@ -660,7 +680,8 @@ static size_t fuse_async_req_send(struct fuse_conn *fc, struct fuse_req *req, } static size_t fuse_send_read(struct fuse_req *req, struct fuse_io_priv *io, - loff_t pos, size_t count, fl_owner_t owner) + loff_t pos, size_t count, fl_owner_t owner, + bool from_gup) { struct file *file = io->iocb->ki_filp; struct fuse_file *ff = file->private_data; @@ -675,7 +696,7 @@ static size_t fuse_send_read(struct fuse_req *req, struct fuse_io_priv *io, } if (io->async) - return fuse_async_req_send(fc, req, count, io); + return fuse_async_req_send(fc, req, count, io, from_gup); fuse_request_send(fc, req); return req->out.args[0].size; @@ -755,7 +776,7 @@ static int fuse_do_readpage(struct file *file, struct page *page) req->page_descs[0].length = count; init_sync_kiocb(&iocb, file); io = (struct fuse_io_priv) FUSE_IO_PRIV_SYNC(&iocb); - num_read = fuse_send_read(req, &io, pos, count, NULL); + num_read = fuse_send_read(req, &io, pos, count, NULL, false); err = req->out.h.error; if (!err) { @@ -976,7 +997,8 @@ static void fuse_write_fill(struct fuse_req *req, struct fuse_file *ff, } static size_t fuse_send_write(struct fuse_req *req, struct fuse_io_priv *io, - loff_t pos, size_t count, fl_owner_t owner) + loff_t pos, size_t count, fl_owner_t owner, + bool from_gup) { struct kiocb *iocb = io->iocb; struct file *file = iocb->ki_filp; @@ -996,7 +1018,7 @@ static size_t fuse_send_write(struct fuse_req *req, struct fuse_io_priv *io, } if (io->async) - return fuse_async_req_send(fc, req, count, io); + return fuse_async_req_send(fc, req, count, io, from_gup); fuse_request_send(fc, req); return req->misc.write.out.size; @@ -1031,7 +1053,7 @@ static size_t fuse_send_write_pages(struct fuse_req *req, struct kiocb *iocb, for (i = 0; i < req->num_pages; i++) fuse_wait_on_page_writeback(inode, req->pages[i]->index); - res = fuse_send_write(req, &io, pos, count, NULL); + res = fuse_send_write(req, &io, pos, count, NULL, false); offset = req->page_descs[0].offset; count = res; @@ -1351,6 +1373,7 @@ ssize_t fuse_direct_io(struct fuse_io_priv *io, struct iov_iter *iter, ssize_t res = 0; struct fuse_req *req; int err = 0; + bool from_gup = iov_iter_get_pages_use_gup(iter); if (io->async) req = fuse_get_req_for_background(fc, iov_iter_npages(iter, @@ -1384,13 +1407,15 @@ ssize_t fuse_direct_io(struct fuse_io_priv *io, struct iov_iter *iter, inarg = &req->misc.write.in; inarg->write_flags |= FUSE_WRITE_KILL_PRIV; } - nres = fuse_send_write(req, io, pos, nbytes, owner); + nres = fuse_send_write(req, io, pos, nbytes, owner, + from_gup); } else { - nres = fuse_send_read(req, io, pos, nbytes, owner); + nres = fuse_send_read(req, io, pos, nbytes, owner, + from_gup); } if (!io->async) - fuse_release_user_pages(req, io->should_dirty); + fuse_release_user_pages(req, io->should_dirty, from_gup); if (req->out.h.error) { err = req->out.h.error; break; -- 2.22.0
john.hubbard at gmail.com
2019-Jul-24 04:25 UTC
[PATCH 10/12] fs/ceph: convert put_page() to put_user_page*()
From: J?r?me Glisse <jglisse at redhat.com> For pages that were retained via get_user_pages*(), release those pages via the new put_user_page*() routines, instead of via put_page(). This is part a tree-wide conversion, as described in commit fc1d8e7cca2d ("mm: introduce put_user_page*(), placeholder versions"). Changes from J?r?me's original patch: * Use the enhanced put_user_pages_dirty_lock(). Signed-off-by: J?r?me Glisse <jglisse at redhat.com> Signed-off-by: John Hubbard <jhubbard at nvidia.com> Cc: linux-fsdevel at vger.kernel.org Cc: linux-block at vger.kernel.org Cc: linux-mm at kvack.org Cc: ceph-devel at vger.kernel.org Cc: Jan Kara <jack at suse.cz> Cc: Dan Williams <dan.j.williams at intel.com> Cc: Alexander Viro <viro at zeniv.linux.org.uk> Cc: Johannes Thumshirn <jthumshirn at suse.de> Cc: Christoph Hellwig <hch at lst.de> Cc: Jens Axboe <axboe at kernel.dk> Cc: Ming Lei <ming.lei at redhat.com> Cc: Dave Chinner <david at fromorbit.com> Cc: Jason Gunthorpe <jgg at ziepe.ca> Cc: Matthew Wilcox <willy at infradead.org> Cc: Boaz Harrosh <boaz at plexistor.com> Cc: "Yan, Zheng" <zyan at redhat.com> Cc: Sage Weil <sage at redhat.com> Cc: Ilya Dryomov <idryomov at gmail.com> --- fs/ceph/file.c | 62 ++++++++++++++++++++++++++++++++++++++------------ 1 file changed, 48 insertions(+), 14 deletions(-) diff --git a/fs/ceph/file.c b/fs/ceph/file.c index 685a03cc4b77..c628a1f96978 100644 --- a/fs/ceph/file.c +++ b/fs/ceph/file.c @@ -158,18 +158,26 @@ static ssize_t iter_get_bvecs_alloc(struct iov_iter *iter, size_t maxsize, return bytes; } -static void put_bvecs(struct bio_vec *bvecs, int num_bvecs, bool should_dirty) +static void put_bvecs(struct bio_vec *bv, int num_bvecs, bool should_dirty, + bool from_gup) { int i; + for (i = 0; i < num_bvecs; i++) { - if (bvecs[i].bv_page) { + if (!bv[i].bv_page) + continue; + + if (from_gup) { + put_user_pages_dirty_lock(&bv[i].bv_page, 1, + should_dirty); + } else { if (should_dirty) - set_page_dirty_lock(bvecs[i].bv_page); - put_page(bvecs[i].bv_page); + set_page_dirty_lock(bv[i].bv_page); + put_page(bv[i].bv_page); } } - kvfree(bvecs); + kvfree(bv); } /* @@ -730,6 +738,7 @@ struct ceph_aio_work { }; static void ceph_aio_retry_work(struct work_struct *work); +static void ceph_aio_from_gup_retry_work(struct work_struct *work); static void ceph_aio_complete(struct inode *inode, struct ceph_aio_request *aio_req) @@ -774,7 +783,7 @@ static void ceph_aio_complete(struct inode *inode, kfree(aio_req); } -static void ceph_aio_complete_req(struct ceph_osd_request *req) +static void _ceph_aio_complete_req(struct ceph_osd_request *req, bool from_gup) { int rc = req->r_result; struct inode *inode = req->r_inode; @@ -793,7 +802,9 @@ static void ceph_aio_complete_req(struct ceph_osd_request *req) aio_work = kmalloc(sizeof(*aio_work), GFP_NOFS); if (aio_work) { - INIT_WORK(&aio_work->work, ceph_aio_retry_work); + INIT_WORK(&aio_work->work, from_gup ? + ceph_aio_from_gup_retry_work : + ceph_aio_retry_work); aio_work->req = req; queue_work(ceph_inode_to_client(inode)->inode_wq, &aio_work->work); @@ -830,7 +841,7 @@ static void ceph_aio_complete_req(struct ceph_osd_request *req) } put_bvecs(osd_data->bvec_pos.bvecs, osd_data->num_bvecs, - aio_req->should_dirty); + aio_req->should_dirty, from_gup); ceph_osdc_put_request(req); if (rc < 0) @@ -840,7 +851,17 @@ static void ceph_aio_complete_req(struct ceph_osd_request *req) return; } -static void ceph_aio_retry_work(struct work_struct *work) +static void ceph_aio_complete_req(struct ceph_osd_request *req) +{ + _ceph_aio_complete_req(req, false); +} + +static void ceph_aio_from_gup_complete_req(struct ceph_osd_request *req) +{ + _ceph_aio_complete_req(req, true); +} + +static void _ceph_aio_retry_work(struct work_struct *work, bool from_gup) { struct ceph_aio_work *aio_work container_of(work, struct ceph_aio_work, work); @@ -891,7 +912,8 @@ static void ceph_aio_retry_work(struct work_struct *work) ceph_osdc_put_request(orig_req); - req->r_callback = ceph_aio_complete_req; + req->r_callback = from_gup ? ceph_aio_from_gup_complete_req : + ceph_aio_complete_req; req->r_inode = inode; req->r_priv = aio_req; @@ -899,13 +921,23 @@ static void ceph_aio_retry_work(struct work_struct *work) out: if (ret < 0) { req->r_result = ret; - ceph_aio_complete_req(req); + _ceph_aio_complete_req(req, from_gup); } ceph_put_snap_context(snapc); kfree(aio_work); } +static void ceph_aio_retry_work(struct work_struct *work) +{ + _ceph_aio_retry_work(work, false); +} + +static void ceph_aio_from_gup_retry_work(struct work_struct *work) +{ + _ceph_aio_retry_work(work, true); +} + static ssize_t ceph_direct_read_write(struct kiocb *iocb, struct iov_iter *iter, struct ceph_snap_context *snapc, @@ -927,6 +959,7 @@ ceph_direct_read_write(struct kiocb *iocb, struct iov_iter *iter, loff_t pos = iocb->ki_pos; bool write = iov_iter_rw(iter) == WRITE; bool should_dirty = !write && iter_is_iovec(iter); + bool from_gup = iov_iter_get_pages_use_gup(iter); if (write && ceph_snap(file_inode(file)) != CEPH_NOSNAP) return -EROFS; @@ -1023,7 +1056,8 @@ ceph_direct_read_write(struct kiocb *iocb, struct iov_iter *iter, aio_req->num_reqs++; atomic_inc(&aio_req->pending_reqs); - req->r_callback = ceph_aio_complete_req; + req->r_callback = !from_gup ? ceph_aio_complete_req : + ceph_aio_from_gup_complete_req; req->r_inode = inode; req->r_priv = aio_req; list_add_tail(&req->r_private_item, &aio_req->osd_reqs); @@ -1054,7 +1088,7 @@ ceph_direct_read_write(struct kiocb *iocb, struct iov_iter *iter, len = ret; } - put_bvecs(bvecs, num_pages, should_dirty); + put_bvecs(bvecs, num_pages, should_dirty, from_gup); ceph_osdc_put_request(req); if (ret < 0) break; @@ -1093,7 +1127,7 @@ ceph_direct_read_write(struct kiocb *iocb, struct iov_iter *iter, req, false); if (ret < 0) { req->r_result = ret; - ceph_aio_complete_req(req); + _ceph_aio_complete_req(req, from_gup); } } return -EIOCBQUEUED; -- 2.22.0
john.hubbard at gmail.com
2019-Jul-24 04:25 UTC
[PATCH 11/12] 9p/net: convert put_page() to put_user_page*()
From: J?r?me Glisse <jglisse at redhat.com> For pages that were retained via get_user_pages*(), release those pages via the new put_user_page*() routines, instead of via put_page(). This is part a tree-wide conversion, as described in commit fc1d8e7cca2d ("mm: introduce put_user_page*(), placeholder versions"). Signed-off-by: J?r?me Glisse <jglisse at redhat.com> Signed-off-by: John Hubbard <jhubbard at nvidia.com> Cc: linux-fsdevel at vger.kernel.org Cc: linux-block at vger.kernel.org Cc: linux-mm at kvack.org Cc: v9fs-developer at lists.sourceforge.net Cc: Jan Kara <jack at suse.cz> Cc: Dan Williams <dan.j.williams at intel.com> Cc: Alexander Viro <viro at zeniv.linux.org.uk> Cc: Johannes Thumshirn <jthumshirn at suse.de> Cc: Christoph Hellwig <hch at lst.de> Cc: Jens Axboe <axboe at kernel.dk> Cc: Ming Lei <ming.lei at redhat.com> Cc: Dave Chinner <david at fromorbit.com> Cc: Jason Gunthorpe <jgg at ziepe.ca> Cc: Matthew Wilcox <willy at infradead.org> Cc: Boaz Harrosh <boaz at plexistor.com> Cc: Eric Van Hensbergen <ericvh at gmail.com> Cc: Latchesar Ionkov <lucho at ionkov.net> Cc: Dominique Martinet <asmadeus at codewreck.org> --- net/9p/trans_common.c | 14 ++++++++++---- net/9p/trans_common.h | 3 ++- net/9p/trans_virtio.c | 18 +++++++++++++----- 3 files changed, 25 insertions(+), 10 deletions(-) diff --git a/net/9p/trans_common.c b/net/9p/trans_common.c index 3dff68f05fb9..e5c359c369a6 100644 --- a/net/9p/trans_common.c +++ b/net/9p/trans_common.c @@ -19,12 +19,18 @@ /** * p9_release_pages - Release pages after the transaction. */ -void p9_release_pages(struct page **pages, int nr_pages) +void p9_release_pages(struct page **pages, int nr_pages, bool from_gup) { int i; - for (i = 0; i < nr_pages; i++) - if (pages[i]) - put_page(pages[i]); + if (from_gup) { + for (i = 0; i < nr_pages; i++) + if (pages[i]) + put_user_page(pages[i]); + } else { + for (i = 0; i < nr_pages; i++) + if (pages[i]) + put_page(pages[i]); + } } EXPORT_SYMBOL(p9_release_pages); diff --git a/net/9p/trans_common.h b/net/9p/trans_common.h index c43babb3f635..dcf025867314 100644 --- a/net/9p/trans_common.h +++ b/net/9p/trans_common.h @@ -12,4 +12,5 @@ * */ -void p9_release_pages(struct page **, int); +void p9_release_pages(struct page **pages, int nr_pages, bool from_gup); + diff --git a/net/9p/trans_virtio.c b/net/9p/trans_virtio.c index a3cd90a74012..3714ca5ecdc2 100644 --- a/net/9p/trans_virtio.c +++ b/net/9p/trans_virtio.c @@ -306,11 +306,14 @@ static int p9_get_mapped_pages(struct virtio_chan *chan, struct iov_iter *data, int count, size_t *offs, - int *need_drop) + int *need_drop, + bool *from_gup) { int nr_pages; int err; + *from_gup = false; + if (!iov_iter_count(data)) return 0; @@ -332,6 +335,7 @@ static int p9_get_mapped_pages(struct virtio_chan *chan, *need_drop = 1; nr_pages = DIV_ROUND_UP(n + *offs, PAGE_SIZE); atomic_add(nr_pages, &vp_pinned); + *from_gup = iov_iter_get_pages_use_gup(data); return n; } else { /* kernel buffer, no need to pin pages */ @@ -397,13 +401,15 @@ p9_virtio_zc_request(struct p9_client *client, struct p9_req_t *req, size_t offs; int need_drop = 0; int kicked = 0; + bool in_from_gup, out_from_gup; p9_debug(P9_DEBUG_TRANS, "virtio request\n"); if (uodata) { __le32 sz; int n = p9_get_mapped_pages(chan, &out_pages, uodata, - outlen, &offs, &need_drop); + outlen, &offs, &need_drop, + &out_from_gup); if (n < 0) { err = n; goto err_out; @@ -422,7 +428,8 @@ p9_virtio_zc_request(struct p9_client *client, struct p9_req_t *req, memcpy(&req->tc.sdata[0], &sz, sizeof(sz)); } else if (uidata) { int n = p9_get_mapped_pages(chan, &in_pages, uidata, - inlen, &offs, &need_drop); + inlen, &offs, &need_drop, + &in_from_gup); if (n < 0) { err = n; goto err_out; @@ -504,11 +511,12 @@ p9_virtio_zc_request(struct p9_client *client, struct p9_req_t *req, err_out: if (need_drop) { if (in_pages) { - p9_release_pages(in_pages, in_nr_pages); + p9_release_pages(in_pages, in_nr_pages, in_from_gup); atomic_sub(in_nr_pages, &vp_pinned); } if (out_pages) { - p9_release_pages(out_pages, out_nr_pages); + p9_release_pages(out_pages, out_nr_pages, + out_from_gup); atomic_sub(out_nr_pages, &vp_pinned); } /* wakeup anybody waiting for slots to pin pages */ -- 2.22.0
john.hubbard at gmail.com
2019-Jul-24 04:25 UTC
[PATCH 12/12] fs/ceph: fix a build warning: returning a value from void function
From: John Hubbard <jhubbard at nvidia.com> Trivial build warning fix: don't return a value from a function whose type is "void". Signed-off-by: John Hubbard <jhubbard at nvidia.com> --- fs/ceph/debugfs.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/ceph/debugfs.c b/fs/ceph/debugfs.c index 2eb88ed22993..fa14c8e8761d 100644 --- a/fs/ceph/debugfs.c +++ b/fs/ceph/debugfs.c @@ -294,7 +294,7 @@ void ceph_fs_debugfs_init(struct ceph_fs_client *fsc) void ceph_fs_debugfs_init(struct ceph_fs_client *fsc) { - return 0; + return; } void ceph_fs_debugfs_cleanup(struct ceph_fs_client *fsc) -- 2.22.0
John Hubbard
2019-Jul-24 04:34 UTC
[PATCH 07/12] vhost-scsi: convert put_page() to put_user_page*()
On 7/23/19 9:25 PM, john.hubbard at gmail.com wrote:> From: J?r?me Glisse <jglisse at redhat.com> > > For pages that were retained via get_user_pages*(), release those pages > via the new put_user_page*() routines, instead of via put_page(). > > This is part a tree-wide conversion, as described in commit fc1d8e7cca2d > ("mm: introduce put_user_page*(), placeholder versions"). > > Changes from J?r?me's original patch: > > * Changed a WARN_ON to a BUG_ON. >Clearly, the above commit log has it backwards (this is quite my night for typos). Please read that as "changed a BUG_ON to a WARN_ON". I'll correct the commit description in next iteration of this patchset. ...> + /* > + * Here in all cases we should have an IOVEC which use GUP. If that is > + * not the case then we will wrongly call put_user_page() and the page > + * refcount will go wrong (this is in vhost_scsi_release_cmd()) > + */ > + WARN_ON(!iov_iter_get_pages_use_gup(iter)); > +... thanks, -- John Hubbard NVIDIA
Christoph Hellwig
2019-Jul-24 05:30 UTC
[PATCH 03/12] block: bio_release_pages: use flags arg instead of bool
On Tue, Jul 23, 2019 at 09:25:09PM -0700, john.hubbard at gmail.com wrote:> From: John Hubbard <jhubbard at nvidia.com> > > In commit d241a95f3514 ("block: optionally mark pages dirty in > bio_release_pages"), new "bool mark_dirty" argument was added to > bio_release_pages. > > In upcoming work, another bool argument (to indicate that the pages came > from get_user_pages) is going to be added. That's one bool too many, > because it's not desirable have calls of the form:All pages releases by bio_release_pages should come from get_get_user_pages, so I don't really see the point here.
Christoph Hellwig
2019-Jul-24 06:17 UTC
[PATCH 00/12] block/bio, fs: convert put_page() to put_user_page*()
On Tue, Jul 23, 2019 at 09:25:06PM -0700, john.hubbard at gmail.com wrote:> * Store, in the iov_iter, a "came from gup (get_user_pages)" parameter. > Then, use the new iov_iter_get_pages_use_gup() to retrieve it when > it is time to release the pages. That allows choosing between put_page() > and put_user_page*(). > > * Pass in one more piece of information to bio_release_pages: a "from_gup" > parameter. Similar use as above. > > * Change the block layer, and several file systems, to use > put_user_page*().I think we can do this in a simple and better way. We have 5 ITER_* types. Of those ITER_DISCARD as the name suggests never uses pages, so we can skip handling it. ITER_PIPE is rejected ?n the direct I/O path, which leaves us with three. Out of those ITER_BVEC needs a user page reference, so we want to call put_user_page* on it. ITER_BVEC always already has page reference, which means in the block direct I/O path path we alread don't take a page reference. We should extent that handling to all other calls of iov_iter_get_pages / iov_iter_get_pages_alloc. I think we should just reject ITER_KVEC for direct I/O as well as we have no users and it is rather pointless. Alternatively if we see a use for it the callers should always have a life page reference anyway (or might be on kmalloc memory), so we really should not take a reference either. In other words: the only time we should ever have to put a page in this patch is when they are user pages. We'll need to clean up various bits of code for that, but that can be done gradually before even getting to the actual put_user_pages conversion.
Michael S. Tsirkin
2019-Jul-24 08:07 UTC
[PATCH 07/12] vhost-scsi: convert put_page() to put_user_page*()
On Tue, Jul 23, 2019 at 09:25:13PM -0700, john.hubbard at gmail.com wrote:> From: J?r?me Glisse <jglisse at redhat.com> > > For pages that were retained via get_user_pages*(), release those pages > via the new put_user_page*() routines, instead of via put_page(). > > This is part a tree-wide conversion, as described in commit fc1d8e7cca2d > ("mm: introduce put_user_page*(), placeholder versions"). > > Changes from J?r?me's original patch: > > * Changed a WARN_ON to a BUG_ON. > > Signed-off-by: J?r?me Glisse <jglisse at redhat.com> > Signed-off-by: John Hubbard <jhubbard at nvidia.com> > Cc: virtualization at lists.linux-foundation.org > Cc: linux-fsdevel at vger.kernel.org > Cc: linux-block at vger.kernel.org > Cc: linux-mm at kvack.org > Cc: Jan Kara <jack at suse.cz> > Cc: Dan Williams <dan.j.williams at intel.com> > Cc: Alexander Viro <viro at zeniv.linux.org.uk> > Cc: Johannes Thumshirn <jthumshirn at suse.de> > Cc: Christoph Hellwig <hch at lst.de> > Cc: Jens Axboe <axboe at kernel.dk> > Cc: Ming Lei <ming.lei at redhat.com> > Cc: Dave Chinner <david at fromorbit.com> > Cc: Jason Gunthorpe <jgg at ziepe.ca> > Cc: Matthew Wilcox <willy at infradead.org> > Cc: Boaz Harrosh <boaz at plexistor.com> > Cc: Miklos Szeredi <miklos at szeredi.hu> > Cc: "Michael S. Tsirkin" <mst at redhat.com> > Cc: Jason Wang <jasowang at redhat.com> > Cc: Paolo Bonzini <pbonzini at redhat.com> > Cc: Stefan Hajnoczi <stefanha at redhat.com>Acked-by: Michael S. Tsirkin <mst at redhat.com>> --- > drivers/vhost/scsi.c | 13 ++++++++++--- > 1 file changed, 10 insertions(+), 3 deletions(-) > > diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c > index a9caf1bc3c3e..282565ab5e3f 100644 > --- a/drivers/vhost/scsi.c > +++ b/drivers/vhost/scsi.c > @@ -329,11 +329,11 @@ static void vhost_scsi_release_cmd(struct se_cmd *se_cmd) > > if (tv_cmd->tvc_sgl_count) { > for (i = 0; i < tv_cmd->tvc_sgl_count; i++) > - put_page(sg_page(&tv_cmd->tvc_sgl[i])); > + put_user_page(sg_page(&tv_cmd->tvc_sgl[i])); > } > if (tv_cmd->tvc_prot_sgl_count) { > for (i = 0; i < tv_cmd->tvc_prot_sgl_count; i++) > - put_page(sg_page(&tv_cmd->tvc_prot_sgl[i])); > + put_user_page(sg_page(&tv_cmd->tvc_prot_sgl[i])); > } > > vhost_scsi_put_inflight(tv_cmd->inflight); > @@ -630,6 +630,13 @@ vhost_scsi_map_to_sgl(struct vhost_scsi_cmd *cmd, > size_t offset; > unsigned int npages = 0; > > + /* > + * Here in all cases we should have an IOVEC which use GUP. If that is > + * not the case then we will wrongly call put_user_page() and the page > + * refcount will go wrong (this is in vhost_scsi_release_cmd()) > + */ > + WARN_ON(!iov_iter_get_pages_use_gup(iter)); > + > bytes = iov_iter_get_pages(iter, pages, LONG_MAX, > VHOST_SCSI_PREALLOC_UPAGES, &offset); > /* No pages were pinned */ > @@ -681,7 +688,7 @@ vhost_scsi_iov_to_sgl(struct vhost_scsi_cmd *cmd, bool write, > while (p < sg) { > struct page *page = sg_page(p++); > if (page) > - put_page(page); > + put_user_page(page); > } > return ret; > } > -- > 2.22.0
John Hubbard
2019-Jul-24 23:23 UTC
[PATCH 00/12] block/bio, fs: convert put_page() to put_user_page*()
On 7/23/19 11:17 PM, Christoph Hellwig wrote:> On Tue, Jul 23, 2019 at 09:25:06PM -0700, john.hubbard at gmail.com wrote: >> * Store, in the iov_iter, a "came from gup (get_user_pages)" parameter. >> Then, use the new iov_iter_get_pages_use_gup() to retrieve it when >> it is time to release the pages. That allows choosing between put_page() >> and put_user_page*(). >> >> * Pass in one more piece of information to bio_release_pages: a "from_gup" >> parameter. Similar use as above. >> >> * Change the block layer, and several file systems, to use >> put_user_page*(). > > I think we can do this in a simple and better way. We have 5 ITER_* > types. Of those ITER_DISCARD as the name suggests never uses pages, so > we can skip handling it. ITER_PIPE is rejected ?n the direct I/O path, > which leaves us with three. > > Out of those ITER_BVEC needs a user page reference, so we want to call^ ITER_IOVEC, I hope. Otherwise I'm hopeless lost. :)> put_user_page* on it. ITER_BVEC always already has page reference, > which means in the block direct I/O path path we alread don't take > a page reference. We should extent that handling to all other calls > of iov_iter_get_pages / iov_iter_get_pages_alloc. I think we should > just reject ITER_KVEC for direct I/O as well as we have no users and > it is rather pointless. Alternatively if we see a use for it the > callers should always have a life page reference anyway (or might > be on kmalloc memory), so we really should not take a reference either. > > In other words: the only time we should ever have to put a page in > this patch is when they are user pages. We'll need to clean up > various bits of code for that, but that can be done gradually before > even getting to the actual put_user_pages conversion. >Sounds great. I'm part way into it and it doesn't look too bad. The main question is where to scatter various checks and assertions, to keep the kvecs out of direct I/0. Or at least keep the gups away from direct I/0. thanks, -- John Hubbard NVIDIA
Bob Liu
2019-Jul-25 00:41 UTC
[PATCH 00/12] block/bio, fs: convert put_page() to put_user_page*()
On 7/24/19 12:25 PM, john.hubbard at gmail.com wrote:> From: John Hubbard <jhubbard at nvidia.com> > > Hi, > > This is mostly Jerome's work, converting the block/bio and related areas > to call put_user_page*() instead of put_page(). Because I've changed > Jerome's patches, in some cases significantly, I'd like to get his > feedback before we actually leave him listed as the author (he might > want to disown some or all of these). >Could you add some background to the commit log for people don't have the context.. Why this converting? What's the main differences? Regards, -Bob> I added a new patch, in order to make this work with Christoph Hellwig's > recent overhaul to bio_release_pages(): "block: bio_release_pages: use > flags arg instead of bool". > > I've started the series with a patch that I've posted in another > series ("mm/gup: add make_dirty arg to put_user_pages_dirty_lock()"[1]), > because I'm not sure which of these will go in first, and this allows each > to stand alone. > > Testing: not much beyond build and boot testing has been done yet. And > I'm not set up to even exercise all of it (especially the IB parts) at > run time. > > Anyway, changes here are: > > * Store, in the iov_iter, a "came from gup (get_user_pages)" parameter. > Then, use the new iov_iter_get_pages_use_gup() to retrieve it when > it is time to release the pages. That allows choosing between put_page() > and put_user_page*(). > > * Pass in one more piece of information to bio_release_pages: a "from_gup" > parameter. Similar use as above. > > * Change the block layer, and several file systems, to use > put_user_page*(). > > [1] https://urldefense.proofpoint.com/v2/url?u=https-3A__lore.kernel.org_r_20190724012606.25844-2D2-2Djhubbard-40nvidia.com&d=DwIDaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=1ktT0U2YS_I8Zz2o-MS1YcCAzWZ6hFGtyTgvVMGM7gI&m=FpFhv2rjbKCAYGmO6Hy8WJAottr1Qz_mDKDLObQ40FU&s=q-_mX3daEr22WbdZMElc_ZbD8L9oGLD7U0xLeyJ661Y&e= > And please note the correction email that I posted as a follow-up, > if you're looking closely at that patch. :) The fixed version is > included here. > > John Hubbard (3): > mm/gup: add make_dirty arg to put_user_pages_dirty_lock() > block: bio_release_pages: use flags arg instead of bool > fs/ceph: fix a build warning: returning a value from void function > > J?r?me Glisse (9): > iov_iter: add helper to test if an iter would use GUP v2 > block: bio_release_pages: convert put_page() to put_user_page*() > block_dev: convert put_page() to put_user_page*() > fs/nfs: convert put_page() to put_user_page*() > vhost-scsi: convert put_page() to put_user_page*() > fs/cifs: convert put_page() to put_user_page*() > fs/fuse: convert put_page() to put_user_page*() > fs/ceph: convert put_page() to put_user_page*() > 9p/net: convert put_page() to put_user_page*() > > block/bio.c | 81 ++++++++++++--- > drivers/infiniband/core/umem.c | 5 +- > drivers/infiniband/hw/hfi1/user_pages.c | 5 +- > drivers/infiniband/hw/qib/qib_user_pages.c | 5 +- > drivers/infiniband/hw/usnic/usnic_uiom.c | 5 +- > drivers/infiniband/sw/siw/siw_mem.c | 8 +- > drivers/vhost/scsi.c | 13 ++- > fs/block_dev.c | 22 +++- > fs/ceph/debugfs.c | 2 +- > fs/ceph/file.c | 62 ++++++++--- > fs/cifs/cifsglob.h | 3 + > fs/cifs/file.c | 22 +++- > fs/cifs/misc.c | 19 +++- > fs/direct-io.c | 2 +- > fs/fuse/dev.c | 22 +++- > fs/fuse/file.c | 53 +++++++--- > fs/nfs/direct.c | 10 +- > include/linux/bio.h | 22 +++- > include/linux/mm.h | 5 +- > include/linux/uio.h | 11 ++ > mm/gup.c | 115 +++++++++------------ > net/9p/trans_common.c | 14 ++- > net/9p/trans_common.h | 3 +- > net/9p/trans_virtio.c | 18 +++- > 24 files changed, 357 insertions(+), 170 deletions(-) >
John Hubbard
2019-Jul-26 01:24 UTC
[PATCH 00/12] block/bio, fs: convert put_page() to put_user_page*()
On 7/24/19 5:41 PM, Bob Liu wrote:> On 7/24/19 12:25 PM, john.hubbard at gmail.com wrote: >> From: John Hubbard <jhubbard at nvidia.com> >> >> Hi, >> >> This is mostly Jerome's work, converting the block/bio and related areas >> to call put_user_page*() instead of put_page(). Because I've changed >> Jerome's patches, in some cases significantly, I'd like to get his >> feedback before we actually leave him listed as the author (he might >> want to disown some or all of these). >> > > Could you add some background to the commit log for people don't have the context.. > Why this converting? What's the main differences? >Hi Bob, 1. Many of the patches have a blurb like this: For pages that were retained via get_user_pages*(), release those pages via the new put_user_page*() routines, instead of via put_page(). This is part a tree-wide conversion, as described in commit fc1d8e7cca2d ("mm: introduce put_user_page*(), placeholder versions"). ...and if you look at that commit, you'll find several pages of information in its commit description, which should address your point. 2. This whole series has to be re-worked, as per the other feedback thread. So I'll keep your comment in mind when I post a new series. thanks, -- John Hubbard NVIDIA
John Hubbard
2019-Aug-05 22:54 UTC
[PATCH 00/12] block/bio, fs: convert put_page() to put_user_page*()
On 7/23/19 11:17 PM, Christoph Hellwig wrote:> On Tue, Jul 23, 2019 at 09:25:06PM -0700, john.hubbard at gmail.com wrote: >> * Store, in the iov_iter, a "came from gup (get_user_pages)" parameter. >> Then, use the new iov_iter_get_pages_use_gup() to retrieve it when >> it is time to release the pages. That allows choosing between put_page() >> and put_user_page*(). >> >> * Pass in one more piece of information to bio_release_pages: a "from_gup" >> parameter. Similar use as above. >> >> * Change the block layer, and several file systems, to use >> put_user_page*(). > > I think we can do this in a simple and better way. We have 5 ITER_* > types. Of those ITER_DISCARD as the name suggests never uses pages, so > we can skip handling it. ITER_PIPE is rejected ?n the direct I/O path, > which leaves us with three. >Hi Christoph, Are you working on anything like this? Or on the put_user_bvec() idea? Please let me know, otherwise I'll go in and implement something here. thanks, -- John Hubbard NVIDIA> Out of those ITER_BVEC needs a user page reference, so we want to call > put_user_page* on it. ITER_BVEC always already has page reference, > which means in the block direct I/O path path we alread don't take > a page reference. We should extent that handling to all other calls > of iov_iter_get_pages / iov_iter_get_pages_alloc. I think we should > just reject ITER_KVEC for direct I/O as well as we have no users and > it is rather pointless. Alternatively if we see a use for it the > callers should always have a life page reference anyway (or might > be on kmalloc memory), so we really should not take a reference either. > > In other words: the only time we should ever have to put a page in > this patch is when they are user pages. We'll need to clean up > various bits of code for that, but that can be done gradually before > even getting to the actual put_user_pages conversion. >
Apparently Analagous Threads
- [PATCH 07/12] vhost-scsi: convert put_page() to put_user_page*()
- [PATCH 3/8] vhost scsi: alloc cmds per vq instead of session
- [PATCH 00/12] block/bio, fs: convert put_page() to put_user_page*()
- [PATCH 00/12] block/bio, fs: convert put_page() to put_user_page*()
- [PATCH 00/12] block/bio, fs: convert put_page() to put_user_page*()