Ralph Campbell
2020-Nov-06 00:51 UTC
[Nouveau] [PATCH v3 0/6] mm/hmm/nouveau: add THP migration to migrate_vma_*
This series adds support for transparent huge page migration to migrate_vma_*() and adds nouveau SVM and HMM selftests as consumers. Earlier versions were posted previously [1] and [2]. The patches apply cleanly to the linux-mm 5.10.0-rc2 tree. There are a lot of other THP patches being posted. I don't think there are any semantic conflicts but there may be some merge conflicts depending on the order Andrew applies these. Changes in v3: Sent the patch ("mm/thp: fix __split_huge_pmd_locked() for migration PMD") as a separate patch from this series. Rebased to linux-mm 5.10.0-rc2. Changes in v2: Added splitting a THP midway in the migration process: i.e., in migrate_vma_pages(). [1] https://lore.kernel.org/linux-mm/20200619215649.32297-1-rcampbell at nvidia.com [2] https://lore.kernel.org/linux-mm/20200902165830.5367-1-rcampbell at nvidia.com Ralph Campbell (6): mm/thp: add prep_transhuge_device_private_page() mm/migrate: move migrate_vma_collect_skip() mm: support THP migration to device private memory mm/thp: add THP allocation helper mm/hmm/test: add self tests for THP migration nouveau: support THP migration to private memory drivers/gpu/drm/nouveau/nouveau_dmem.c | 289 +++++++++++----- drivers/gpu/drm/nouveau/nouveau_svm.c | 11 +- drivers/gpu/drm/nouveau/nouveau_svm.h | 3 +- include/linux/gfp.h | 10 + include/linux/huge_mm.h | 12 + include/linux/memremap.h | 9 + include/linux/migrate.h | 2 + lib/test_hmm.c | 437 +++++++++++++++++++++---- lib/test_hmm_uapi.h | 3 + mm/huge_memory.c | 147 +++++++-- mm/memcontrol.c | 25 +- mm/memory.c | 10 +- mm/memremap.c | 4 +- mm/migrate.c | 429 +++++++++++++++++++----- mm/rmap.c | 2 +- tools/testing/selftests/vm/hmm-tests.c | 404 +++++++++++++++++++++++ 16 files changed, 1522 insertions(+), 275 deletions(-) -- 2.20.1
Ralph Campbell
2020-Nov-06 00:51 UTC
[Nouveau] [PATCH v3 1/6] mm/thp: add prep_transhuge_device_private_page()
Add a helper function to allow device drivers to create device private transparent huge pages. This is intended to help support device private THP migrations. Signed-off-by: Ralph Campbell <rcampbell at nvidia.com> --- include/linux/huge_mm.h | 5 +++++ mm/huge_memory.c | 9 +++++++++ 2 files changed, 14 insertions(+) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 0365aa97f8e7..3ec26ef27a93 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -184,6 +184,7 @@ extern unsigned long thp_get_unmapped_area(struct file *filp, unsigned long flags); extern void prep_transhuge_page(struct page *page); +extern void prep_transhuge_device_private_page(struct page *page); extern void free_transhuge_page(struct page *page); bool is_transparent_hugepage(struct page *page); @@ -377,6 +378,10 @@ static inline bool transhuge_vma_suitable(struct vm_area_struct *vma, static inline void prep_transhuge_page(struct page *page) {} +static inline void prep_transhuge_device_private_page(struct page *page) +{ +} + static inline bool is_transparent_hugepage(struct page *page) { return false; diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 08a183f6c3ab..b4141f12ff31 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -498,6 +498,15 @@ void prep_transhuge_page(struct page *page) set_compound_page_dtor(page, TRANSHUGE_PAGE_DTOR); } +void prep_transhuge_device_private_page(struct page *page) +{ + prep_compound_page(page, HPAGE_PMD_ORDER); + prep_transhuge_page(page); + /* Only the head page has a reference to the pgmap. */ + percpu_ref_put_many(page->pgmap->ref, HPAGE_PMD_NR - 1); +} +EXPORT_SYMBOL_GPL(prep_transhuge_device_private_page); + bool is_transparent_hugepage(struct page *page) { if (!PageCompound(page)) -- 2.20.1
Ralph Campbell
2020-Nov-06 00:51 UTC
[Nouveau] [PATCH v3 2/6] mm/migrate: move migrate_vma_collect_skip()
Move the definition of migrate_vma_collect_skip() to make it callable by migrate_vma_collect_hole(). This helps make the next patch easier to read. Signed-off-by: Ralph Campbell <rcampbell at nvidia.com> --- mm/migrate.c | 30 +++++++++++++++--------------- 1 file changed, 15 insertions(+), 15 deletions(-) diff --git a/mm/migrate.c b/mm/migrate.c index c1585ec29827..665516319b66 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -2253,6 +2253,21 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm, #endif /* CONFIG_NUMA */ #ifdef CONFIG_DEVICE_PRIVATE +static int migrate_vma_collect_skip(unsigned long start, + unsigned long end, + struct mm_walk *walk) +{ + struct migrate_vma *migrate = walk->private; + unsigned long addr; + + for (addr = start; addr < end; addr += PAGE_SIZE) { + migrate->dst[migrate->npages] = 0; + migrate->src[migrate->npages++] = 0; + } + + return 0; +} + static int migrate_vma_collect_hole(unsigned long start, unsigned long end, __always_unused int depth, @@ -2281,21 +2296,6 @@ static int migrate_vma_collect_hole(unsigned long start, return 0; } -static int migrate_vma_collect_skip(unsigned long start, - unsigned long end, - struct mm_walk *walk) -{ - struct migrate_vma *migrate = walk->private; - unsigned long addr; - - for (addr = start; addr < end; addr += PAGE_SIZE) { - migrate->dst[migrate->npages] = 0; - migrate->src[migrate->npages++] = 0; - } - - return 0; -} - static int migrate_vma_collect_pmd(pmd_t *pmdp, unsigned long start, unsigned long end, -- 2.20.1
Ralph Campbell
2020-Nov-06 00:51 UTC
[Nouveau] [PATCH v3 3/6] mm: support THP migration to device private memory
Support transparent huge page migration to ZONE_DEVICE private memory. A new selection flag (MIGRATE_VMA_SELECT_COMPOUND) is added to request THP migration. Otherwise, THPs are split when filling in the source PFN array. A new flag (MIGRATE_PFN_COMPOUND) is added to the source PFN array to indicate a huge page can be migrated. If the device driver can allocate a huge page, it sets the MIGRATE_PFN_COMPOUND flag in the destination PFN array. migrate_vma_pages() will fallback to PAGE_SIZE pages if MIGRATE_PFN_COMPOUND is not set in both source and destination arrays. Signed-off-by: Ralph Campbell <rcampbell at nvidia.com> --- include/linux/huge_mm.h | 7 + include/linux/memremap.h | 9 + include/linux/migrate.h | 2 + mm/huge_memory.c | 124 +++++++++--- mm/memcontrol.c | 25 ++- mm/memory.c | 10 +- mm/memremap.c | 4 +- mm/migrate.c | 413 ++++++++++++++++++++++++++++++++------- mm/rmap.c | 2 +- 9 files changed, 486 insertions(+), 110 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 3ec26ef27a93..1e8625cc233c 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -190,6 +190,8 @@ bool is_transparent_hugepage(struct page *page); bool can_split_huge_page(struct page *page, int *pextra_pins); int split_huge_page_to_list(struct page *page, struct list_head *list); +int split_migrating_huge_page(struct vm_area_struct *vma, pmd_t *pmd, + unsigned long address, struct page *page); static inline int split_huge_page(struct page *page) { return split_huge_page_to_list(page, NULL); @@ -456,6 +458,11 @@ static inline bool is_huge_zero_page(struct page *page) return false; } +static inline bool is_huge_zero_pmd(pmd_t pmd) +{ + return false; +} + static inline bool is_huge_zero_pud(pud_t pud) { return false; diff --git a/include/linux/memremap.h b/include/linux/memremap.h index 86c6c368ce9b..9b39a896af37 100644 --- a/include/linux/memremap.h +++ b/include/linux/memremap.h @@ -87,6 +87,15 @@ struct dev_pagemap_ops { * the page back to a CPU accessible page. */ vm_fault_t (*migrate_to_ram)(struct vm_fault *vmf); + + /* + * Used for private (un-addressable) device memory only. + * This is called when a compound device private page is split. + * The driver uses this callback to set tail_page->pgmap and + * tail_page->zone_device_data appropriately based on the head + * page. + */ + void (*page_split)(struct page *head, struct page *tail_page); }; #define PGMAP_ALTMAP_VALID (1 << 0) diff --git a/include/linux/migrate.h b/include/linux/migrate.h index 0f8d1583fa8e..92179bf360d1 100644 --- a/include/linux/migrate.h +++ b/include/linux/migrate.h @@ -144,6 +144,7 @@ static inline int migrate_misplaced_transhuge_page(struct mm_struct *mm, #define MIGRATE_PFN_MIGRATE (1UL << 1) #define MIGRATE_PFN_LOCKED (1UL << 2) #define MIGRATE_PFN_WRITE (1UL << 3) +#define MIGRATE_PFN_COMPOUND (1UL << 4) #define MIGRATE_PFN_SHIFT 6 static inline struct page *migrate_pfn_to_page(unsigned long mpfn) @@ -161,6 +162,7 @@ static inline unsigned long migrate_pfn(unsigned long pfn) enum migrate_vma_direction { MIGRATE_VMA_SELECT_SYSTEM = 1 << 0, MIGRATE_VMA_SELECT_DEVICE_PRIVATE = 1 << 1, + MIGRATE_VMA_SELECT_COMPOUND = 1 << 2, }; struct migrate_vma { diff --git a/mm/huge_memory.c b/mm/huge_memory.c index b4141f12ff31..a073e66d0ee2 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1682,23 +1682,35 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, } else { struct page *page = NULL; int flush_needed = 1; + bool is_anon = false; if (pmd_present(orig_pmd)) { page = pmd_page(orig_pmd); + is_anon = PageAnon(page); page_remove_rmap(page, true); VM_BUG_ON_PAGE(page_mapcount(page) < 0, page); VM_BUG_ON_PAGE(!PageHead(page), page); } else if (thp_migration_supported()) { swp_entry_t entry; - VM_BUG_ON(!is_pmd_migration_entry(orig_pmd)); entry = pmd_to_swp_entry(orig_pmd); - page = pfn_to_page(swp_offset(entry)); + if (is_device_private_entry(entry)) { + page = device_private_entry_to_page(entry); + is_anon = PageAnon(page); + page_remove_rmap(page, true); + VM_BUG_ON_PAGE(page_mapcount(page) < 0, page); + VM_BUG_ON_PAGE(!PageHead(page), page); + put_page(page); + } else { + VM_BUG_ON(!is_pmd_migration_entry(orig_pmd)); + page = pfn_to_page(swp_offset(entry)); + is_anon = PageAnon(page); + } flush_needed = 0; } else WARN_ONCE(1, "Non present huge pmd without pmd migration enabled!"); - if (PageAnon(page)) { + if (is_anon) { zap_deposited_table(tlb->mm, pmd); add_mm_counter(tlb->mm, MM_ANONPAGES, -HPAGE_PMD_NR); } else { @@ -2358,9 +2370,10 @@ static void remap_page(struct page *page, unsigned int nr) } static void __split_huge_page_tail(struct page *head, int tail, - struct lruvec *lruvec, struct list_head *list) + struct lruvec *lruvec, struct list_head *list, bool remap) { struct page *page_tail = head + tail; + int pin_count; VM_BUG_ON_PAGE(atomic_read(&page_tail->_mapcount) != -1, page_tail); @@ -2396,15 +2409,24 @@ static void __split_huge_page_tail(struct page *head, int tail, smp_wmb(); /* - * Clear PageTail before unfreezing page refcount. + * A successful get_page_unless_zero() might follow page_ref_unfreeze() + * so PageTail needs to be cleared before unfreezing the page refcount + * in order for compound_head() to work correctly. * - * After successful get_page_unless_zero() might follow put_page() - * which needs correct compound_head(). + * Also, ZONE_DEVICE struct pages share the compound_head field and + * need to restore the pgmap pointer before unfreezing page refcount + * in order for is_zone_device_page() to work correctly. */ - clear_compound_head(page_tail); + if (is_device_private_page(head)) { + head->pgmap->ops->page_split(head, page_tail); + pin_count = 2; + } else { + clear_compound_head(page_tail); + pin_count = 1; + } /* Finally unfreeze refcount. Additional reference from page cache. */ - page_ref_unfreeze(page_tail, 1 + (!PageAnon(head) || + page_ref_unfreeze(page_tail, pin_count + (!PageAnon(head) || PageSwapCache(head))); if (page_is_young(head)) @@ -2419,11 +2441,12 @@ static void __split_huge_page_tail(struct page *head, int tail, * pages to show after the currently processed elements - e.g. * migrate_pages */ - lru_add_page_tail(head, page_tail, lruvec, list); + if (remap) + lru_add_page_tail(head, page_tail, lruvec, list); } static void __split_huge_page(struct page *page, struct list_head *list, - pgoff_t end, unsigned long flags) + pgoff_t end, unsigned long flags, bool remap) { struct page *head = compound_head(page); pg_data_t *pgdat = page_pgdat(head); @@ -2447,7 +2470,7 @@ static void __split_huge_page(struct page *page, struct list_head *list, } for (i = nr - 1; i >= 1; i--) { - __split_huge_page_tail(head, i, lruvec, list); + __split_huge_page_tail(head, i, lruvec, list, remap); /* Some pages can be beyond i_size: drop them from page cache */ if (head[i].index >= end) { ClearPageDirty(head + i); @@ -2474,6 +2497,9 @@ static void __split_huge_page(struct page *page, struct list_head *list, if (PageSwapCache(head)) { page_ref_add(head, 2); xa_unlock(&swap_cache->i_pages); + } else if (is_device_private_page(head)) { + percpu_ref_get_many(page->pgmap->ref, nr - 1); + page_ref_add(head, 2); } else { page_ref_inc(head); } @@ -2485,6 +2511,9 @@ static void __split_huge_page(struct page *page, struct list_head *list, spin_unlock_irqrestore(&pgdat->lru_lock, flags); + if (!remap) + return; + remap_page(head, nr); if (PageSwapCache(head)) { @@ -2602,6 +2631,7 @@ bool can_split_huge_page(struct page *page, int *pextra_pins) extra_pins = PageSwapCache(page) ? thp_nr_pages(page) : 0; else extra_pins = thp_nr_pages(page); + extra_pins += is_device_private_page(page); if (pextra_pins) *pextra_pins = extra_pins; return total_mapcount(page) == page_count(page) - extra_pins - 1; @@ -2626,7 +2656,8 @@ bool can_split_huge_page(struct page *page, int *pextra_pins) * Returns -EBUSY if the page is pinned or if anon_vma disappeared from under * us. */ -int split_huge_page_to_list(struct page *page, struct list_head *list) +static int __split_huge_page_to_list(struct page *page, struct list_head *list, + bool remap) { struct page *head = compound_head(page); struct pglist_data *pgdata = NODE_DATA(page_to_nid(head)); @@ -2653,14 +2684,16 @@ int split_huge_page_to_list(struct page *page, struct list_head *list) * is taken to serialise against parallel split or collapse * operations. */ - anon_vma = page_get_anon_vma(head); - if (!anon_vma) { - ret = -EBUSY; - goto out; + if (remap) { + anon_vma = page_get_anon_vma(head); + if (!anon_vma) { + ret = -EBUSY; + goto out; + } + anon_vma_lock_write(anon_vma); } end = -1; mapping = NULL; - anon_vma_lock_write(anon_vma); } else { mapping = head->mapping; @@ -2686,13 +2719,19 @@ int split_huge_page_to_list(struct page *page, struct list_head *list) /* * Racy check if we can split the page, before unmap_page() will * split PMDs + * If we are splitting a migrating THP, there is no check needed + * because the page is already unmapped and isolated from the LRU. */ - if (!can_split_huge_page(head, &extra_pins)) { + if (!remap) + extra_pins = thp_nr_pages(page) - 1 + + is_device_private_page(head); + else if (!can_split_huge_page(head, &extra_pins)) { ret = -EBUSY; goto out_unlock; } - unmap_page(head); + if (remap) + unmap_page(head); VM_BUG_ON_PAGE(compound_mapcount(head), head); /* prevent PageLRU to go away from under us, and freeze lru stats */ @@ -2717,7 +2756,7 @@ int split_huge_page_to_list(struct page *page, struct list_head *list) if (!mapcount && page_ref_freeze(head, 1 + extra_pins)) { if (!list_empty(page_deferred_list(head))) { ds_queue->split_queue_len--; - list_del(page_deferred_list(head)); + list_del_init(page_deferred_list(head)); } spin_unlock(&ds_queue->split_queue_lock); if (mapping) { @@ -2727,7 +2766,7 @@ int split_huge_page_to_list(struct page *page, struct list_head *list) __dec_lruvec_page_state(head, NR_FILE_THPS); } - __split_huge_page(page, list, end, flags); + __split_huge_page(page, list, end, flags, remap); ret = 0; } else { if (IS_ENABLED(CONFIG_DEBUG_VM) && mapcount) { @@ -2742,7 +2781,8 @@ int split_huge_page_to_list(struct page *page, struct list_head *list) fail: if (mapping) xa_unlock(&mapping->i_pages); spin_unlock_irqrestore(&pgdata->lru_lock, flags); - remap_page(head, thp_nr_pages(head)); + if (remap) + remap_page(head, thp_nr_pages(head)); ret = -EBUSY; } @@ -2758,6 +2798,36 @@ fail: if (mapping) return ret; } +int split_huge_page_to_list(struct page *page, struct list_head *list) +{ + return __split_huge_page_to_list(page, list, true); +} + +/* + * Split a migrating huge page. + * The caller should have mmap_lock_read() held, the huge page unmapped and + * isolated, and the PMD page table entry set to a migration entry for the + * given head page. + */ +int split_migrating_huge_page(struct vm_area_struct *vma, pmd_t *pmd, + unsigned long address, struct page *head) +{ + spinlock_t *ptl; + + VM_BUG_ON_PAGE(is_huge_zero_page(head), head); + VM_BUG_ON_PAGE(!PageLocked(head), head); + VM_BUG_ON_PAGE(!PageHead(head), head); + VM_BUG_ON_PAGE(PageWriteback(head), head); + VM_BUG_ON_PAGE(PageLRU(head), head); + VM_BUG_ON_PAGE(compound_mapcount(head), head); + + ptl = pmd_lock(vma->vm_mm, pmd); + __split_huge_pmd_locked(vma, pmd, address, false); + spin_unlock(ptl); + + return __split_huge_page_to_list(head, NULL, false); +} + void free_transhuge_page(struct page *page) { struct deferred_split *ds_queue = get_deferred_split_queue(page); @@ -2766,9 +2836,11 @@ void free_transhuge_page(struct page *page) spin_lock_irqsave(&ds_queue->split_queue_lock, flags); if (!list_empty(page_deferred_list(page))) { ds_queue->split_queue_len--; - list_del(page_deferred_list(page)); + list_del_init(page_deferred_list(page)); } spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags); + if (is_device_private_page(page)) + return; free_compound_page(page); } @@ -2986,6 +3058,10 @@ void remove_migration_pmd(struct page_vma_mapped_walk *pvmw, struct page *new) pmde = pmd_mksoft_dirty(pmde); if (is_write_migration_entry(entry)) pmde = maybe_pmd_mkwrite(pmde, vma); + if (unlikely(is_device_private_page(new))) { + entry = make_device_private_entry(new, pmd_write(pmde)); + pmde = swp_entry_to_pmd(entry); + } flush_cache_range(vma, mmun_start, mmun_start + HPAGE_PMD_SIZE); if (PageAnon(new)) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 3a12df292712..12d3d79c4e32 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -5792,12 +5792,22 @@ static enum mc_target_type get_mctgt_type_thp(struct vm_area_struct *vma, struct page *page = NULL; enum mc_target_type ret = MC_TARGET_NONE; - if (unlikely(is_swap_pmd(pmd))) { - VM_BUG_ON(thp_migration_supported() && - !is_pmd_migration_entry(pmd)); + if (!(mc.flags & MOVE_ANON)) return ret; + if (unlikely(is_swap_pmd(pmd))) { + swp_entry_t entry = pmd_to_swp_entry(pmd); + + if (!is_device_private_entry(entry)) { + VM_BUG_ON(thp_migration_supported() && + !is_pmd_migration_entry(pmd)); + return ret; + } + page = device_private_entry_to_page(entry); + ret = MC_TARGET_DEVICE; + } else { + page = pmd_page(pmd); + ret = MC_TARGET_PAGE; } - page = pmd_page(pmd); VM_BUG_ON_PAGE(!page || !PageHead(page), page); if (!(mc.flags & MOVE_ANON)) return ret; @@ -5828,12 +5838,7 @@ static int mem_cgroup_count_precharge_pte_range(pmd_t *pmd, ptl = pmd_trans_huge_lock(pmd, vma); if (ptl) { - /* - * Note their can not be MC_TARGET_DEVICE for now as we do not - * support transparent huge page with MEMORY_DEVICE_PRIVATE but - * this might change. - */ - if (get_mctgt_type_thp(vma, addr, *pmd, NULL) == MC_TARGET_PAGE) + if (get_mctgt_type_thp(vma, addr, *pmd, NULL)) mc.precharge += HPAGE_PMD_NR; spin_unlock(ptl); return 0; diff --git a/mm/memory.c b/mm/memory.c index f8d66f0e8da7..963c168a93dc 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4485,9 +4485,15 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma, barrier(); if (unlikely(is_swap_pmd(orig_pmd))) { + swp_entry_t entry = pmd_to_swp_entry(orig_pmd); + + if (is_device_private_entry(entry)) { + vmf.page = device_private_entry_to_page(entry); + return vmf.page->pgmap->ops->migrate_to_ram(&vmf); + } VM_BUG_ON(thp_migration_supported() && - !is_pmd_migration_entry(orig_pmd)); - if (is_pmd_migration_entry(orig_pmd)) + !is_migration_entry(entry)); + if (is_migration_entry(entry)) pmd_migration_entry_wait(mm, vmf.pmd); return 0; } diff --git a/mm/memremap.c b/mm/memremap.c index d72ce30da94e..8b4e6f12e58f 100644 --- a/mm/memremap.c +++ b/mm/memremap.c @@ -92,7 +92,7 @@ static unsigned long pfn_next(unsigned long pfn) { if (pfn % 1024 == 0) cond_resched(); - return pfn + 1; + return pfn + thp_nr_pages(pfn_to_page(pfn)); } /* @@ -509,6 +509,8 @@ void free_devmap_managed_page(struct page *page) __ClearPageWaiters(page); mem_cgroup_uncharge(page); + if (PageHead(page)) + free_transhuge_page(page); /* * When a device_private page is freed, the page->mapping field diff --git a/mm/migrate.c b/mm/migrate.c index 665516319b66..7b69a5f91d0a 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -51,6 +51,7 @@ #include <linux/oom.h> #include <asm/tlbflush.h> +#include <asm/pgalloc.h> #define CREATE_TRACE_POINTS #include <trace/events/migrate.h> @@ -2275,19 +2276,28 @@ static int migrate_vma_collect_hole(unsigned long start, { struct migrate_vma *migrate = walk->private; unsigned long addr; + unsigned long mpfn; /* Only allow populating anonymous memory. */ - if (!vma_is_anonymous(walk->vma)) { - for (addr = start; addr < end; addr += PAGE_SIZE) { - migrate->src[migrate->npages] = 0; - migrate->dst[migrate->npages] = 0; - migrate->npages++; - } - return 0; + if (!vma_is_anonymous(walk->vma) || + !((migrate->flags & MIGRATE_VMA_SELECT_SYSTEM))) + return migrate_vma_collect_skip(start, end, walk); + + if (thp_migration_supported() && + (migrate->flags & MIGRATE_VMA_SELECT_COMPOUND) && + (start & ~PMD_MASK) == 0 && (end & ~PMD_MASK) == 0) { + migrate->src[migrate->npages] = MIGRATE_PFN_MIGRATE | + MIGRATE_PFN_COMPOUND; + migrate->dst[migrate->npages] = 0; + migrate->npages++; + migrate->cpages++; + return migrate_vma_collect_skip(start + PAGE_SIZE, end, walk); } + mpfn = (migrate->vma->vm_flags & VM_WRITE) ? + (MIGRATE_PFN_MIGRATE | MIGRATE_PFN_WRITE) : MIGRATE_PFN_MIGRATE; for (addr = start; addr < end; addr += PAGE_SIZE) { - migrate->src[migrate->npages] = MIGRATE_PFN_MIGRATE; + migrate->src[migrate->npages] = mpfn; migrate->dst[migrate->npages] = 0; migrate->npages++; migrate->cpages++; @@ -2296,59 +2306,133 @@ static int migrate_vma_collect_hole(unsigned long start, return 0; } -static int migrate_vma_collect_pmd(pmd_t *pmdp, - unsigned long start, - unsigned long end, - struct mm_walk *walk) +static int migrate_vma_handle_pmd(pmd_t *pmdp, unsigned long start, + unsigned long end, struct mm_walk *walk) { struct migrate_vma *migrate = walk->private; struct vm_area_struct *vma = walk->vma; struct mm_struct *mm = vma->vm_mm; - unsigned long addr = start, unmapped = 0; spinlock_t *ptl; - pte_t *ptep; + struct page *page; + unsigned long write = 0; + int ret; -again: - if (pmd_none(*pmdp)) + ptl = pmd_lock(mm, pmdp); + if (pmd_none(*pmdp)) { + spin_unlock(ptl); return migrate_vma_collect_hole(start, end, -1, walk); - + } if (pmd_trans_huge(*pmdp)) { - struct page *page; - - ptl = pmd_lock(mm, pmdp); - if (unlikely(!pmd_trans_huge(*pmdp))) { + if (!(migrate->flags & MIGRATE_VMA_SELECT_SYSTEM)) { spin_unlock(ptl); - goto again; + return migrate_vma_collect_skip(start, end, walk); } - page = pmd_page(*pmdp); if (is_huge_zero_page(page)) { spin_unlock(ptl); - split_huge_pmd(vma, pmdp, addr); - if (pmd_trans_unstable(pmdp)) - return migrate_vma_collect_skip(start, end, - walk); - } else { - int ret; + return migrate_vma_collect_hole(start, end, -1, walk); + } + if (pmd_write(*pmdp)) + write = MIGRATE_PFN_WRITE; + } else if (!pmd_present(*pmdp)) { + swp_entry_t entry = pmd_to_swp_entry(*pmdp); + + if (is_migration_entry(entry)) { + bool wait; - get_page(page); + page = migration_entry_to_page(entry); + wait = get_page_unless_zero(page); spin_unlock(ptl); - if (unlikely(!trylock_page(page))) - return migrate_vma_collect_skip(start, end, - walk); - ret = split_huge_page(page); - unlock_page(page); - put_page(page); - if (ret) - return migrate_vma_collect_skip(start, end, - walk); - if (pmd_none(*pmdp)) - return migrate_vma_collect_hole(start, end, -1, - walk); + if (wait) + put_and_wait_on_page_locked(page); + return -EAGAIN; + } + if (!is_device_private_entry(entry)) { + spin_unlock(ptl); + return migrate_vma_collect_skip(start, end, walk); + } + page = device_private_entry_to_page(entry); + if (!(migrate->flags & MIGRATE_VMA_SELECT_DEVICE_PRIVATE) || + page->pgmap->owner != migrate->pgmap_owner) { + spin_unlock(ptl); + return migrate_vma_collect_skip(start, end, walk); } + if (is_write_device_private_entry(entry)) + write = MIGRATE_PFN_WRITE; + } else { + spin_unlock(ptl); + return -EAGAIN; + } + + get_page(page); + if (unlikely(!trylock_page(page))) { + spin_unlock(ptl); + put_page(page); + return migrate_vma_collect_skip(start, end, walk); + } + if (thp_migration_supported() && + (migrate->flags & MIGRATE_VMA_SELECT_COMPOUND) && + (start & ~PMD_MASK) == 0 && (start + PMD_SIZE) == end) { + struct page_vma_mapped_walk vmw = { + .vma = vma, + .address = start, + .pmd = pmdp, + .ptl = ptl, + }; + + migrate->src[migrate->npages] = write | + migrate_pfn(page_to_pfn(page)) | + MIGRATE_PFN_MIGRATE | MIGRATE_PFN_LOCKED | + MIGRATE_PFN_COMPOUND; + migrate->dst[migrate->npages] = 0; + migrate->npages++; + migrate->cpages++; + migrate_vma_collect_skip(start + PAGE_SIZE, end, walk); + + /* Note this also removes the page from the rmap. */ + set_pmd_migration_entry(&vmw, page); + spin_unlock(ptl); + + return 0; + } + spin_unlock(ptl); + + ret = split_huge_page(page); + unlock_page(page); + put_page(page); + + if (ret) + return migrate_vma_collect_skip(start, end, walk); + if (pmd_none(*pmdp)) + return migrate_vma_collect_hole(start, end, -1, walk); + + /* This just causes migrate_vma_collect_pmd() to handle PTEs. */ + return -ENOENT; +} + +static int migrate_vma_collect_pmd(pmd_t *pmdp, + unsigned long start, + unsigned long end, + struct mm_walk *walk) +{ + struct migrate_vma *migrate = walk->private; + struct vm_area_struct *vma = walk->vma; + struct mm_struct *mm = vma->vm_mm; + unsigned long addr = start, unmapped = 0; + spinlock_t *ptl; + pte_t *ptep; + +again: + if (pmd_trans_huge(*pmdp) || !pmd_present(*pmdp)) { + int ret = migrate_vma_handle_pmd(pmdp, start, end, walk); + + if (!ret) + return 0; + if (ret == -EAGAIN) + goto again; } - if (unlikely(pmd_bad(*pmdp))) + if (unlikely(pmd_bad(*pmdp) || pmd_devmap(*pmdp))) return migrate_vma_collect_skip(start, end, walk); ptep = pte_offset_map_lock(mm, pmdp, addr, &ptl); @@ -2404,8 +2488,7 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp, mpfn |= pte_write(pte) ? MIGRATE_PFN_WRITE : 0; } - /* FIXME support THP */ - if (!page || !page->mapping || PageTransCompound(page)) { + if (!page || !page->mapping) { mpfn = 0; goto next; } @@ -2527,14 +2610,6 @@ static bool migrate_vma_check_page(struct page *page) */ int extra = 1; - /* - * FIXME support THP (transparent huge page), it is bit more complex to - * check them than regular pages, because they can be mapped with a pmd - * or with a pte (split pte mapping). - */ - if (PageCompound(page)) - return false; - /* Page from ZONE_DEVICE have one extra reference */ if (is_zone_device_page(page)) { /* @@ -2833,13 +2908,191 @@ int migrate_vma_setup(struct migrate_vma *args) } EXPORT_SYMBOL(migrate_vma_setup); +static pmd_t *find_pmd(struct mm_struct *mm, unsigned long addr) +{ + pgd_t *pgdp; + p4d_t *p4dp; + pud_t *pudp; + + pgdp = pgd_offset(mm, addr); + p4dp = p4d_alloc(mm, pgdp, addr); + if (!p4dp) + return NULL; + pudp = pud_alloc(mm, p4dp, addr); + if (!pudp) + return NULL; + return pmd_alloc(mm, pudp, addr); +} + +#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION +/* + * This code closely follows: + * do_huge_pmd_anonymous_page() + * __do_huge_pmd_anonymous_page() + * except that the page being inserted is likely to be a device private page + * instead of an allocated or zero page. + */ +static int insert_huge_pmd_anonymous_page(struct vm_area_struct *vma, + unsigned long haddr, + struct page *page, + unsigned long *src, + pmd_t *pmdp) +{ + struct mm_struct *mm = vma->vm_mm; + unsigned int i; + spinlock_t *ptl; + bool flush = false; + pgtable_t pgtable; + gfp_t gfp; + pmd_t entry; + + if (WARN_ON_ONCE(compound_order(page) != HPAGE_PMD_ORDER)) + goto abort; + + if (unlikely(anon_vma_prepare(vma))) + goto abort; + + prep_transhuge_page(page); + + gfp = GFP_TRANSHUGE_LIGHT; + if (mem_cgroup_charge(page, mm, gfp)) + goto abort; + + pgtable = pte_alloc_one(mm); + if (unlikely(!pgtable)) + goto abort; + + __SetPageUptodate(page); + + if (is_zone_device_page(page)) { + if (!is_device_private_page(page)) + goto pgtable_abort; + entry = swp_entry_to_pmd(make_device_private_entry(page, + vma->vm_flags & VM_WRITE)); + } else { + entry = mk_huge_pmd(page, vma->vm_page_prot); + entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma); + } + + ptl = pmd_lock(mm, pmdp); + + if (check_stable_address_space(mm)) + goto unlock_abort; + + /* + * Check for userfaultfd but do not deliver the fault. Instead, + * just back off. + */ + if (userfaultfd_missing(vma)) + goto unlock_abort; + + if (pmd_present(*pmdp)) { + if (!is_huge_zero_pmd(*pmdp)) + goto unlock_abort; + flush = true; + } else if (!pmd_none(*pmdp)) + goto unlock_abort; + + get_page(page); + page_add_new_anon_rmap(page, vma, haddr, true); + if (!is_zone_device_page(page)) + lru_cache_add_inactive_or_unevictable(page, vma); + if (flush) { + pte_free(mm, pgtable); + flush_cache_range(vma, haddr, haddr + HPAGE_PMD_SIZE); + pmdp_invalidate(vma, haddr, pmdp); + } else { + pgtable_trans_huge_deposit(mm, pmdp, pgtable); + mm_inc_nr_ptes(mm); + } + set_pmd_at(mm, haddr, pmdp, entry); + update_mmu_cache_pmd(vma, haddr, pmdp); + add_mm_counter(mm, MM_ANONPAGES, HPAGE_PMD_NR); + spin_unlock(ptl); + count_vm_event(THP_FAULT_ALLOC); + count_memcg_event_mm(mm, THP_FAULT_ALLOC); + + return 0; + +unlock_abort: + spin_unlock(ptl); +pgtable_abort: + pte_free(mm, pgtable); +abort: + for (i = 0; i < HPAGE_PMD_NR; i++) + src[i] &= ~MIGRATE_PFN_MIGRATE; + return -EINVAL; +} + +static void migrate_vma_split(struct migrate_vma *migrate, unsigned long i, + unsigned long addr) +{ + const unsigned long npages = i + HPAGE_PMD_NR; + unsigned long mpfn; + unsigned long j; + bool migrating = false; + struct page *page; + + migrate->src[i] &= ~MIGRATE_PFN_COMPOUND; + + /* If no part of the THP is migrating, we can skip splitting. */ + for (j = i; j < npages; j++) { + if (migrate->dst[j] & MIGRATE_PFN_VALID) { + migrating = true; + break; + } + } + if (!migrating) + return; + + mpfn = migrate->src[i]; + page = migrate_pfn_to_page(mpfn); + if (page) { + pmd_t *pmdp; + int ret; + + pmdp = find_pmd(migrate->vma->vm_mm, addr); + if (!pmdp) { + migrate->src[i] = mpfn & ~MIGRATE_PFN_MIGRATE; + return; + } + ret = split_migrating_huge_page(migrate->vma, pmdp, addr, page); + if (ret) { + migrate->src[i] = mpfn & ~MIGRATE_PFN_MIGRATE; + return; + } + while (++i < npages) { + mpfn += 1UL << MIGRATE_PFN_SHIFT; + migrate->src[i] = mpfn; + } + } else { + while (++i < npages) + migrate->src[i] = mpfn; + } +} +#else +static int insert_huge_pmd_anonymous_page(struct vm_area_struct *vma, + unsigned long haddr, + struct page *page, + unsigned long *src, + pmd_t *pmdp) +{ + return 0; +} + +static void migrate_vma_split(struct migrate_vma *migrate, unsigned long i, + unsigned long addr) +{ +} +#endif + /* * This code closely matches the code in: * __handle_mm_fault() * handle_pte_fault() * do_anonymous_page() - * to map in an anonymous zero page but the struct page will be a ZONE_DEVICE - * private page. + * to map in an anonymous zero page except the struct page is already allocated + * and will likely be a ZONE_DEVICE private page. */ static void migrate_vma_insert_page(struct migrate_vma *migrate, unsigned long addr, @@ -2852,9 +3105,6 @@ static void migrate_vma_insert_page(struct migrate_vma *migrate, bool flush = false; spinlock_t *ptl; pte_t entry; - pgd_t *pgdp; - p4d_t *p4dp; - pud_t *pudp; pmd_t *pmdp; pte_t *ptep; @@ -2862,19 +3112,25 @@ static void migrate_vma_insert_page(struct migrate_vma *migrate, if (!vma_is_anonymous(vma)) goto abort; - pgdp = pgd_offset(mm, addr); - p4dp = p4d_alloc(mm, pgdp, addr); - if (!p4dp) - goto abort; - pudp = pud_alloc(mm, p4dp, addr); - if (!pudp) - goto abort; - pmdp = pmd_alloc(mm, pudp, addr); + pmdp = find_pmd(mm, addr); if (!pmdp) goto abort; - if (pmd_trans_huge(*pmdp) || pmd_devmap(*pmdp)) - goto abort; + if (thp_migration_supported() && *dst & MIGRATE_PFN_COMPOUND) { + int ret = insert_huge_pmd_anonymous_page(vma, addr, page, src, + pmdp); + if (ret) + goto abort; + return; + } + if (!pmd_none(*pmdp)) { + if (pmd_trans_huge(*pmdp)) { + if (!is_huge_zero_pmd(*pmdp)) + goto abort; + __split_huge_pmd(vma, pmdp, addr, false, NULL); + } else if (pmd_leaf(*pmdp)) + goto abort; + } /* * Use pte_alloc() instead of pte_alloc_map(). We can't run @@ -2909,9 +3165,11 @@ static void migrate_vma_insert_page(struct migrate_vma *migrate, if (is_device_private_page(page)) { swp_entry_t swp_entry; - swp_entry = make_device_private_entry(page, vma->vm_flags & VM_WRITE); + swp_entry = make_device_private_entry(page, + vma->vm_flags & VM_WRITE); entry = swp_entry_to_pte(swp_entry); - } + } else + goto abort; } else { entry = mk_pte(page, vma->vm_page_prot); if (vma->vm_flags & VM_WRITE) @@ -2940,10 +3198,10 @@ static void migrate_vma_insert_page(struct migrate_vma *migrate, goto unlock_abort; inc_mm_counter(mm, MM_ANONPAGES); + get_page(page); page_add_new_anon_rmap(page, vma, addr, false); if (!is_zone_device_page(page)) lru_cache_add_inactive_or_unevictable(page, vma); - get_page(page); if (flush) { flush_cache_page(vma, addr, pte_pfn(*ptep)); @@ -2957,7 +3215,6 @@ static void migrate_vma_insert_page(struct migrate_vma *migrate, } pte_unmap_unlock(ptep, ptl); - *src = MIGRATE_PFN_MIGRATE; return; unlock_abort: @@ -2988,11 +3245,23 @@ void migrate_vma_pages(struct migrate_vma *migrate) struct address_space *mapping; int r; + /* + * If the caller didn't allocate a THP, split the PMD and + * fix up the src array. + */ + if (thp_migration_supported() && + (migrate->src[i] & MIGRATE_PFN_MIGRATE) && + (migrate->src[i] & MIGRATE_PFN_COMPOUND) && + !(migrate->dst[i] & MIGRATE_PFN_COMPOUND)) + migrate_vma_split(migrate, i, addr); + + newpage = migrate_pfn_to_page(migrate->dst[i]); if (!newpage) { migrate->src[i] &= ~MIGRATE_PFN_MIGRATE; continue; } + page = migrate_pfn_to_page(migrate->src[i]); if (!page) { if (!(migrate->src[i] & MIGRATE_PFN_MIGRATE)) continue; diff --git a/mm/rmap.c b/mm/rmap.c index 1b84945d655c..13eb0247d8b7 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1497,7 +1497,7 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma, } if (IS_ENABLED(CONFIG_MIGRATION) && - (flags & TTU_MIGRATION) && + (flags & (TTU_MIGRATION | TTU_SPLIT_FREEZE)) && is_zone_device_page(page)) { swp_entry_t entry; pte_t swp_pte; -- 2.20.1
Ralph Campbell
2020-Nov-06 00:51 UTC
[Nouveau] [PATCH v3 4/6] mm/thp: add THP allocation helper
Transparent huge page allocation policy is controlled by several sysfs variables. Rather than expose these to each device driver that needs to allocate THPs, provide a helper function. Signed-off-by: Ralph Campbell <rcampbell at nvidia.com> --- include/linux/gfp.h | 10 ++++++++++ mm/huge_memory.c | 14 ++++++++++++++ 2 files changed, 24 insertions(+) diff --git a/include/linux/gfp.h b/include/linux/gfp.h index c603237e006c..242398c4b556 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -564,6 +564,16 @@ static inline struct page *alloc_pages(gfp_t gfp_mask, unsigned int order) #define alloc_page(gfp_mask) alloc_pages(gfp_mask, 0) #define alloc_page_vma(gfp_mask, vma, addr) \ alloc_pages_vma(gfp_mask, 0, vma, addr, numa_node_id(), false) +#ifdef CONFIG_TRANSPARENT_HUGEPAGE +extern struct page *alloc_transhugepage(struct vm_area_struct *vma, + unsigned long addr); +#else +static inline struct page *alloc_transhugepage(struct vm_area_struct *vma, + unsigned long addr) +{ + return NULL; +} +#endif extern unsigned long __get_free_pages(gfp_t gfp_mask, unsigned int order); extern unsigned long get_zeroed_page(gfp_t gfp_mask); diff --git a/mm/huge_memory.c b/mm/huge_memory.c index a073e66d0ee2..c2c1d3e7c35f 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -765,6 +765,20 @@ vm_fault_t do_huge_pmd_anonymous_page(struct vm_fault *vmf) return __do_huge_pmd_anonymous_page(vmf, page, gfp); } +struct page *alloc_transhugepage(struct vm_area_struct *vma, + unsigned long haddr) +{ + gfp_t gfp; + struct page *page; + + gfp = alloc_hugepage_direct_gfpmask(vma); + page = alloc_hugepage_vma(gfp, vma, haddr, HPAGE_PMD_ORDER); + if (page) + prep_transhuge_page(page); + return page; +} +EXPORT_SYMBOL_GPL(alloc_transhugepage); + static void insert_pfn_pmd(struct vm_area_struct *vma, unsigned long addr, pmd_t *pmd, pfn_t pfn, pgprot_t prot, bool write, pgtable_t pgtable) -- 2.20.1
Ralph Campbell
2020-Nov-06 00:51 UTC
[Nouveau] [PATCH v3 5/6] mm/hmm/test: add self tests for THP migration
Add some basic stand alone self tests for migrating system memory to device private memory and back. Signed-off-by: Ralph Campbell <rcampbell at nvidia.com> --- lib/test_hmm.c | 437 +++++++++++++++++++++---- lib/test_hmm_uapi.h | 3 + tools/testing/selftests/vm/hmm-tests.c | 404 +++++++++++++++++++++++ 3 files changed, 775 insertions(+), 69 deletions(-) diff --git a/lib/test_hmm.c b/lib/test_hmm.c index 80a78877bd93..456f1a90bcc3 100644 --- a/lib/test_hmm.c +++ b/lib/test_hmm.c @@ -66,6 +66,7 @@ struct dmirror { struct xarray pt; struct mmu_interval_notifier notifier; struct mutex mutex; + __u64 flags; }; /* @@ -91,6 +92,7 @@ struct dmirror_device { unsigned long calloc; unsigned long cfree; struct page *free_pages; + struct page *free_huge_pages; spinlock_t lock; /* protects the above */ }; @@ -450,6 +452,7 @@ static int dmirror_write(struct dmirror *dmirror, struct hmm_dmirror_cmd *cmd) } static bool dmirror_allocate_chunk(struct dmirror_device *mdevice, + bool is_huge, struct page **ppage) { struct dmirror_chunk *devmem; @@ -503,28 +506,51 @@ static bool dmirror_allocate_chunk(struct dmirror_device *mdevice, mutex_unlock(&mdevice->devmem_lock); - pr_info("added new %u MB chunk (total %u chunks, %u MB) PFNs [0x%lx 0x%lx)\n", + pr_info("dev %u added %u MB (total %u chunks, %u MB) PFNs [0x%lx 0x%lx)\n", + MINOR(mdevice->cdevice.dev), DEVMEM_CHUNK_SIZE / (1024 * 1024), mdevice->devmem_count, mdevice->devmem_count * (DEVMEM_CHUNK_SIZE / (1024 * 1024)), pfn_first, pfn_last); spin_lock(&mdevice->lock); - for (pfn = pfn_first; pfn < pfn_last; pfn++) { + for (pfn = pfn_first; pfn < pfn_last; ) { struct page *page = pfn_to_page(pfn); + if (is_huge && (pfn & (HPAGE_PMD_NR - 1)) == 0 && + pfn + HPAGE_PMD_NR <= pfn_last) { + prep_transhuge_device_private_page(page); + page->zone_device_data = mdevice->free_huge_pages; + mdevice->free_huge_pages = page; + pfn += HPAGE_PMD_NR; + continue; + } page->zone_device_data = mdevice->free_pages; mdevice->free_pages = page; + pfn++; } if (ppage) { - *ppage = mdevice->free_pages; - mdevice->free_pages = (*ppage)->zone_device_data; - mdevice->calloc++; + if (is_huge) { + if (!mdevice->free_huge_pages) + goto err_unlock; + *ppage = mdevice->free_huge_pages; + mdevice->free_huge_pages = (*ppage)->zone_device_data; + mdevice->calloc += thp_nr_pages(*ppage); + } else if (mdevice->free_pages) { + *ppage = mdevice->free_pages; + mdevice->free_pages = (*ppage)->zone_device_data; + mdevice->calloc++; + } else + goto err_unlock; } spin_unlock(&mdevice->lock); return true; +err_unlock: + spin_unlock(&mdevice->lock); + return false; + err_release: mutex_unlock(&mdevice->devmem_lock); release_mem_region(devmem->pagemap.range.start, range_len(&devmem->pagemap.range)); @@ -534,7 +560,8 @@ static bool dmirror_allocate_chunk(struct dmirror_device *mdevice, return false; } -static struct page *dmirror_devmem_alloc_page(struct dmirror_device *mdevice) +static struct page *dmirror_devmem_alloc_page(struct dmirror_device *mdevice, + bool is_huge) { struct page *dpage = NULL; struct page *rpage; @@ -549,17 +576,40 @@ static struct page *dmirror_devmem_alloc_page(struct dmirror_device *mdevice) spin_lock(&mdevice->lock); - if (mdevice->free_pages) { + if (is_huge && mdevice->free_huge_pages) { + dpage = mdevice->free_huge_pages; + mdevice->free_huge_pages = dpage->zone_device_data; + mdevice->calloc += thp_nr_pages(dpage); + spin_unlock(&mdevice->lock); + } else if (!is_huge && mdevice->free_pages) { dpage = mdevice->free_pages; mdevice->free_pages = dpage->zone_device_data; mdevice->calloc++; spin_unlock(&mdevice->lock); } else { spin_unlock(&mdevice->lock); - if (!dmirror_allocate_chunk(mdevice, &dpage)) + if (!dmirror_allocate_chunk(mdevice, is_huge, &dpage)) goto error; } + if (is_huge) { + unsigned int nr_pages = thp_nr_pages(dpage); + unsigned int i; + struct page **tpage; + + tpage = kmap(rpage); + for (i = 0; i < nr_pages; i++, tpage++) { + *tpage = alloc_page(GFP_HIGHUSER); + if (!*tpage) { + while (i--) + __free_page(*--tpage); + kunmap(rpage); + goto error; + } + } + kunmap(rpage); + } + dpage->zone_device_data = rpage; get_page(dpage); lock_page(dpage); @@ -570,22 +620,26 @@ static struct page *dmirror_devmem_alloc_page(struct dmirror_device *mdevice) return NULL; } -static void dmirror_migrate_alloc_and_copy(struct migrate_vma *args, - struct dmirror *dmirror) +static int dmirror_migrate_alloc_and_copy(struct migrate_vma *args, + struct dmirror *dmirror) { struct dmirror_device *mdevice = dmirror->mdevice; const unsigned long *src = args->src; unsigned long *dst = args->dst; - unsigned long addr; + unsigned long end_pfn = args->end >> PAGE_SHIFT; + unsigned long pfn; - for (addr = args->start; addr < args->end; addr += PAGE_SIZE, - src++, dst++) { + for (pfn = args->start >> PAGE_SHIFT; pfn < end_pfn; ) { struct page *spage; struct page *dpage; struct page *rpage; + bool is_huge; + unsigned long write; + struct page **tpage; + unsigned long endp; if (!(*src & MIGRATE_PFN_MIGRATE)) - continue; + goto next; /* * Note that spage might be NULL which is OK since it is an @@ -593,15 +647,39 @@ static void dmirror_migrate_alloc_and_copy(struct migrate_vma *args, */ spage = migrate_pfn_to_page(*src); - dpage = dmirror_devmem_alloc_page(mdevice); - if (!dpage) + /* This flag is only set if a whole huge page is migrated. */ + is_huge = *src & MIGRATE_PFN_COMPOUND; + write = (*src & MIGRATE_PFN_WRITE) ? MIGRATE_PFN_WRITE : 0; + + if (dmirror->flags & HMM_DMIRROR_FLAG_FAIL_ALLOC) { + dmirror->flags &= ~HMM_DMIRROR_FLAG_FAIL_ALLOC; + dpage = NULL; + } else + dpage = dmirror_devmem_alloc_page(mdevice, is_huge); + if (!dpage) { + if (!is_huge) + return -ENOMEM; + /* Try falling back to PAGE_SIZE pages. */ + endp = pfn + HPAGE_PMD_NR; + while (pfn < endp) { + dpage = dmirror_devmem_alloc_page(mdevice, + false); + if (!dpage) + return -ENOMEM; + rpage = dpage->zone_device_data; + rpage->zone_device_data = dmirror; + *dst = migrate_pfn(page_to_pfn(dpage)) | + MIGRATE_PFN_LOCKED | write; + if (spage) + copy_highpage(rpage, spage++); + else + clear_highpage(rpage); + pfn++; + src++; + dst++; + } continue; - - rpage = dpage->zone_device_data; - if (spage) - copy_highpage(rpage, spage); - else - clear_highpage(rpage); + } /* * Normally, a device would use the page->zone_device_data to @@ -609,14 +687,40 @@ static void dmirror_migrate_alloc_and_copy(struct migrate_vma *args, * the simulated device memory and that page holds the pointer * to the mirror. */ + rpage = dpage->zone_device_data; rpage->zone_device_data = dmirror; - *dst = migrate_pfn(page_to_pfn(dpage)) | - MIGRATE_PFN_LOCKED; - if ((*src & MIGRATE_PFN_WRITE) || - (!spage && args->vma->vm_flags & VM_WRITE)) - *dst |= MIGRATE_PFN_WRITE; + *dst = migrate_pfn(page_to_pfn(dpage)) | MIGRATE_PFN_LOCKED | + write; + + if (is_huge) { + endp = pfn + thp_nr_pages(dpage); + *dst |= MIGRATE_PFN_COMPOUND; + tpage = kmap(rpage); + while (pfn < endp) { + if (spage) + copy_highpage(*tpage, spage++); + else + clear_highpage(*tpage); + tpage++; + pfn++; + src++; + dst++; + } + kunmap(rpage); + continue; + } + + if (spage) + copy_highpage(rpage, spage); + else + clear_highpage(rpage); +next: + pfn++; + src++; + dst++; } + return 0; } static int dmirror_migrate_finalize_and_map(struct migrate_vma *args, @@ -627,38 +731,75 @@ static int dmirror_migrate_finalize_and_map(struct migrate_vma *args, const unsigned long *src = args->src; const unsigned long *dst = args->dst; unsigned long pfn; + int ret = 0; /* Map the migrated pages into the device's page tables. */ mutex_lock(&dmirror->mutex); - for (pfn = start >> PAGE_SHIFT; pfn < (end >> PAGE_SHIFT); pfn++, - src++, dst++) { + for (pfn = start >> PAGE_SHIFT; pfn < (end >> PAGE_SHIFT); ) { + unsigned long mpfn; struct page *dpage; + struct page *rpage; void *entry; if (!(*src & MIGRATE_PFN_MIGRATE)) - continue; + goto next; - dpage = migrate_pfn_to_page(*dst); + mpfn = *dst; + dpage = migrate_pfn_to_page(mpfn); if (!dpage) - continue; + goto next; /* * Store the page that holds the data so the page table * doesn't have to deal with ZONE_DEVICE private pages. */ - entry = dpage->zone_device_data; - if (*dst & MIGRATE_PFN_WRITE) + rpage = dpage->zone_device_data; + if (mpfn & MIGRATE_PFN_COMPOUND) { + struct page **tpage; + unsigned long end_pfn = pfn + thp_nr_pages(dpage); + + ret = 0; + tpage = kmap(rpage); + while (pfn < end_pfn) { + entry = *tpage; + if (mpfn & MIGRATE_PFN_WRITE) + entry = xa_tag_pointer(entry, + DPT_XA_TAG_WRITE); + entry = xa_store(&dmirror->pt, pfn, entry, + GFP_KERNEL); + if (xa_is_err(entry)) { + ret = xa_err(entry); + break; + } + tpage++; + pfn++; + src++; + dst++; + } + kunmap(rpage); + if (ret) + goto err; + continue; + } + + entry = rpage; + if (mpfn & MIGRATE_PFN_WRITE) entry = xa_tag_pointer(entry, DPT_XA_TAG_WRITE); entry = xa_store(&dmirror->pt, pfn, entry, GFP_ATOMIC); if (xa_is_err(entry)) { - mutex_unlock(&dmirror->mutex); - return xa_err(entry); + ret = xa_err(entry); + goto err; } +next: + pfn++; + src++; + dst++; } +err: mutex_unlock(&dmirror->mutex); - return 0; + return ret; } static int dmirror_migrate(struct dmirror *dmirror, @@ -668,8 +809,8 @@ static int dmirror_migrate(struct dmirror *dmirror, unsigned long size = cmd->npages << PAGE_SHIFT; struct mm_struct *mm = dmirror->notifier.mm; struct vm_area_struct *vma; - unsigned long src_pfns[64]; - unsigned long dst_pfns[64]; + unsigned long *src_pfns; + unsigned long *dst_pfns; struct dmirror_bounce bounce; struct migrate_vma args; unsigned long next; @@ -684,6 +825,17 @@ static int dmirror_migrate(struct dmirror *dmirror, if (!mmget_not_zero(mm)) return -EINVAL; + src_pfns = kmalloc_array(PTRS_PER_PTE, sizeof(*src_pfns), GFP_KERNEL); + if (!src_pfns) { + ret = -ENOMEM; + goto out_put; + } + dst_pfns = kmalloc_array(PTRS_PER_PTE, sizeof(*dst_pfns), GFP_KERNEL); + if (!dst_pfns) { + ret = -ENOMEM; + goto out_free_src; + } + mmap_read_lock(mm); for (addr = start; addr < end; addr = next) { vma = find_vma(mm, addr); @@ -692,7 +844,7 @@ static int dmirror_migrate(struct dmirror *dmirror, ret = -EINVAL; goto out; } - next = min(end, addr + (ARRAY_SIZE(src_pfns) << PAGE_SHIFT)); + next = pmd_addr_end(addr, end); if (next > vma->vm_end) next = vma->vm_end; @@ -702,17 +854,24 @@ static int dmirror_migrate(struct dmirror *dmirror, args.start = addr; args.end = next; args.pgmap_owner = dmirror->mdevice; - args.flags = MIGRATE_VMA_SELECT_SYSTEM; + args.flags = MIGRATE_VMA_SELECT_SYSTEM | + MIGRATE_VMA_SELECT_COMPOUND; ret = migrate_vma_setup(&args); if (ret) goto out; - dmirror_migrate_alloc_and_copy(&args, dmirror); - migrate_vma_pages(&args); - dmirror_migrate_finalize_and_map(&args, dmirror); + ret = dmirror_migrate_alloc_and_copy(&args, dmirror); + if (!ret) { + migrate_vma_pages(&args); + dmirror_migrate_finalize_and_map(&args, dmirror); + } migrate_vma_finalize(&args); + if (ret) + goto out; } mmap_read_unlock(mm); + kfree(dst_pfns); + kfree(src_pfns); mmput(mm); /* Return the migrated data for verification. */ @@ -733,6 +892,10 @@ static int dmirror_migrate(struct dmirror *dmirror, out: mmap_read_unlock(mm); + kfree(dst_pfns); +out_free_src: + kfree(src_pfns); +out_put: mmput(mm); return ret; } @@ -953,6 +1116,11 @@ static long dmirror_fops_unlocked_ioctl(struct file *filp, ret = dmirror_snapshot(dmirror, &cmd); break; + case HMM_DMIRROR_FLAGS: + dmirror->flags = cmd.npages; + ret = 0; + break; + default: return -EINVAL; } @@ -976,22 +1144,70 @@ static const struct file_operations dmirror_fops = { static void dmirror_devmem_free(struct page *page) { struct page *rpage = page->zone_device_data; + unsigned int order = thp_order(page); + unsigned int nr_pages = 1U << order; struct dmirror_device *mdevice; - if (rpage) + VM_BUG_ON_PAGE(PageTail(page), page); + + if (rpage) { + if (order) { + unsigned int i; + struct page **tpage; + void *kaddr; + + kaddr = kmap_atomic(rpage); + tpage = kaddr; + for (i = 0; i < nr_pages; i++, tpage++) + __free_page(*tpage); + kunmap_atomic(kaddr); + } __free_page(rpage); + } mdevice = dmirror_page_to_device(page); spin_lock(&mdevice->lock); - mdevice->cfree++; - page->zone_device_data = mdevice->free_pages; - mdevice->free_pages = page; + if (order) { + page->zone_device_data = mdevice->free_huge_pages; + mdevice->free_huge_pages = page; + } else { + page->zone_device_data = mdevice->free_pages; + mdevice->free_pages = page; + } + mdevice->cfree += nr_pages; spin_unlock(&mdevice->lock); } +static void dmirror_devmem_split(struct page *head, struct page *page) +{ + struct page *rpage = head->zone_device_data; + unsigned long i; + struct page **tpage; + void *kaddr; + + page->pgmap = head->pgmap; + + if (!rpage) { + page->zone_device_data = NULL; + return; + } + + kaddr = kmap_atomic(rpage); + tpage = kaddr; + i = page - head; + page->zone_device_data = tpage[i]; + if (i == 1) { + head->zone_device_data = tpage[0]; + kunmap_atomic(kaddr); + __free_page(rpage); + } else + kunmap_atomic(kaddr); +} + static vm_fault_t dmirror_devmem_fault_alloc_and_copy(struct migrate_vma *args, - struct dmirror *dmirror) + struct dmirror *dmirror, + unsigned long fault_addr) { const unsigned long *src = args->src; unsigned long *dst = args->dst; @@ -999,25 +1215,71 @@ static vm_fault_t dmirror_devmem_fault_alloc_and_copy(struct migrate_vma *args, unsigned long end = args->end; unsigned long addr; - for (addr = start; addr < end; addr += PAGE_SIZE, - src++, dst++) { - struct page *dpage, *spage; + for (addr = start; addr < end; ) { + struct page *spage, *dpage; + unsigned int order = 0; + unsigned int nr_pages = 1; + struct page **tpage; + unsigned int i; spage = migrate_pfn_to_page(*src); if (!spage || !(*src & MIGRATE_PFN_MIGRATE)) - continue; + goto next; + order = thp_order(spage); + nr_pages = 1U << order; + /* The source page is the ZONE_DEVICE private page. */ spage = spage->zone_device_data; - dpage = alloc_page_vma(GFP_HIGHUSER_MOVABLE, args->vma, addr); - if (!dpage) - continue; + if (dmirror->flags & HMM_DMIRROR_FLAG_FAIL_ALLOC) { + dmirror->flags &= ~HMM_DMIRROR_FLAG_FAIL_ALLOC; + dpage = NULL; + } else if (order) + dpage = alloc_transhugepage(args->vma, addr); + else + dpage = alloc_pages_vma(GFP_HIGHUSER_MOVABLE, 0, + args->vma, addr, + numa_node_id(), false); + if (!dpage) { + if (!order) + return VM_FAULT_OOM; + /* Try falling back to PAGE_SIZE pages. */ + dpage = alloc_pages_vma(GFP_HIGHUSER_MOVABLE, 0, + args->vma, addr, + numa_node_id(), false); + if (!dpage) + return VM_FAULT_OOM; + lock_page(dpage); + xa_erase(&dmirror->pt, fault_addr >> PAGE_SHIFT); + i = (fault_addr - start) >> PAGE_SHIFT; + dst[i] = migrate_pfn(page_to_pfn(dpage)) | MIGRATE_PFN_LOCKED; + if (*src & MIGRATE_PFN_WRITE) + dst[i] |= MIGRATE_PFN_WRITE; + tpage = kmap(spage); + copy_highpage(dpage, tpage[i]); + kunmap(spage); + goto next; + } lock_page(dpage); xa_erase(&dmirror->pt, addr >> PAGE_SHIFT); - copy_highpage(dpage, spage); *dst = migrate_pfn(page_to_pfn(dpage)) | MIGRATE_PFN_LOCKED; if (*src & MIGRATE_PFN_WRITE) *dst |= MIGRATE_PFN_WRITE; + if (order) { + *dst |= MIGRATE_PFN_COMPOUND; + tpage = kmap(spage); + for (i = 0; i < nr_pages; i++) { + copy_highpage(dpage, *tpage); + tpage++; + dpage++; + } + kunmap(spage); + } else + copy_highpage(dpage, spage); +next: + addr += PAGE_SIZE << order; + src += nr_pages; + dst += nr_pages; } return 0; } @@ -1027,33 +1289,55 @@ static vm_fault_t dmirror_devmem_fault(struct vm_fault *vmf) struct migrate_vma args; unsigned long src_pfns; unsigned long dst_pfns; + struct page *page; struct page *rpage; + unsigned int order; struct dmirror *dmirror; vm_fault_t ret; + page = thp_head(vmf->page); + order = thp_order(page); + /* * Normally, a device would use the page->zone_device_data to point to * the mirror but here we use it to hold the page for the simulated * device memory and that page holds the pointer to the mirror. */ - rpage = vmf->page->zone_device_data; + rpage = page->zone_device_data; dmirror = rpage->zone_device_data; - /* FIXME demonstrate how we can adjust migrate range */ + if (order) { + args.start = vmf->address & (PAGE_MASK << order); + args.end = args.start + (PAGE_SIZE << order); + args.src = kcalloc(PTRS_PER_PTE, sizeof(*args.src), + GFP_KERNEL); + if (!args.src) + return VM_FAULT_OOM; + args.dst = kcalloc(PTRS_PER_PTE, sizeof(*args.dst), + GFP_KERNEL); + if (!args.dst) { + ret = VM_FAULT_OOM; + goto error_src; + } + } else { + args.start = vmf->address; + args.end = args.start + PAGE_SIZE; + args.src = &src_pfns; + args.dst = &dst_pfns; + } args.vma = vmf->vma; - args.start = vmf->address; - args.end = args.start + PAGE_SIZE; - args.src = &src_pfns; - args.dst = &dst_pfns; args.pgmap_owner = dmirror->mdevice; - args.flags = MIGRATE_VMA_SELECT_DEVICE_PRIVATE; + args.flags = MIGRATE_VMA_SELECT_DEVICE_PRIVATE | + MIGRATE_VMA_SELECT_COMPOUND; - if (migrate_vma_setup(&args)) - return VM_FAULT_SIGBUS; + if (migrate_vma_setup(&args)) { + ret = VM_FAULT_SIGBUS; + goto error_dst; + } - ret = dmirror_devmem_fault_alloc_and_copy(&args, dmirror); + ret = dmirror_devmem_fault_alloc_and_copy(&args, dmirror, vmf->address); if (ret) - return ret; + goto error_fin; migrate_vma_pages(&args); /* * No device finalize step is needed since @@ -1061,12 +1345,27 @@ static vm_fault_t dmirror_devmem_fault(struct vm_fault *vmf) * invalidated the device page table. */ migrate_vma_finalize(&args); + if (order) { + kfree(args.dst); + kfree(args.src); + } return 0; + +error_fin: + migrate_vma_finalize(&args); +error_dst: + if (args.dst != &dst_pfns) + kfree(args.dst); +error_src: + if (args.src != &src_pfns) + kfree(args.src); + return ret; } static const struct dev_pagemap_ops dmirror_devmem_ops = { .page_free = dmirror_devmem_free, .migrate_to_ram = dmirror_devmem_fault, + .page_split = dmirror_devmem_split, }; static int dmirror_device_init(struct dmirror_device *mdevice, int id) @@ -1085,7 +1384,7 @@ static int dmirror_device_init(struct dmirror_device *mdevice, int id) return ret; /* Build a list of free ZONE_DEVICE private struct pages */ - dmirror_allocate_chunk(mdevice, NULL); + dmirror_allocate_chunk(mdevice, false, NULL); return 0; } diff --git a/lib/test_hmm_uapi.h b/lib/test_hmm_uapi.h index 670b4ef2a5b6..39e6ef3b67b9 100644 --- a/lib/test_hmm_uapi.h +++ b/lib/test_hmm_uapi.h @@ -33,6 +33,9 @@ struct hmm_dmirror_cmd { #define HMM_DMIRROR_WRITE _IOWR('H', 0x01, struct hmm_dmirror_cmd) #define HMM_DMIRROR_MIGRATE _IOWR('H', 0x02, struct hmm_dmirror_cmd) #define HMM_DMIRROR_SNAPSHOT _IOWR('H', 0x03, struct hmm_dmirror_cmd) +#define HMM_DMIRROR_FLAGS _IOWR('H', 0x04, struct hmm_dmirror_cmd) + +#define HMM_DMIRROR_FLAG_FAIL_ALLOC (1ULL << 0) /* * Values returned in hmm_dmirror_cmd.ptr for HMM_DMIRROR_SNAPSHOT. diff --git a/tools/testing/selftests/vm/hmm-tests.c b/tools/testing/selftests/vm/hmm-tests.c index 5d1ac691b9f4..069c3cc3c89b 100644 --- a/tools/testing/selftests/vm/hmm-tests.c +++ b/tools/testing/selftests/vm/hmm-tests.c @@ -1485,4 +1485,408 @@ TEST_F(hmm2, double_map) hmm_buffer_free(buffer); } +/* + * Migrate private anonymous huge empty page. + */ +TEST_F(hmm, migrate_anon_huge_empty) +{ + struct hmm_buffer *buffer; + unsigned long npages; + unsigned long size; + unsigned long i; + void *old_ptr; + void *map; + int *ptr; + int ret; + + size = TWOMEG; + + buffer = malloc(sizeof(*buffer)); + ASSERT_NE(buffer, NULL); + + buffer->fd = -1; + buffer->size = 2 * size; + buffer->mirror = malloc(size); + ASSERT_NE(buffer->mirror, NULL); + memset(buffer->mirror, 0xFF, size); + + buffer->ptr = mmap(NULL, 2 * size, + PROT_READ, + MAP_PRIVATE | MAP_ANONYMOUS, + buffer->fd, 0); + ASSERT_NE(buffer->ptr, MAP_FAILED); + + npages = size >> self->page_shift; + map = (void *)ALIGN((uintptr_t)buffer->ptr, size); + ret = madvise(map, size, MADV_HUGEPAGE); + ASSERT_EQ(ret, 0); + old_ptr = buffer->ptr; + buffer->ptr = map; + + /* Migrate memory to device. */ + ret = hmm_dmirror_cmd(self->fd, HMM_DMIRROR_MIGRATE, buffer, npages); + ASSERT_EQ(ret, 0); + ASSERT_EQ(buffer->cpages, npages); + + /* Check what the device read. */ + for (i = 0, ptr = buffer->mirror; i < size / sizeof(*ptr); ++i) + ASSERT_EQ(ptr[i], 0); + + buffer->ptr = old_ptr; + hmm_buffer_free(buffer); +} + +/* + * Migrate private anonymous huge zero page. + */ +TEST_F(hmm, migrate_anon_huge_zero) +{ + struct hmm_buffer *buffer; + unsigned long npages; + unsigned long size; + unsigned long i; + void *old_ptr; + void *map; + int *ptr; + int ret; + int val; + + size = TWOMEG; + + buffer = malloc(sizeof(*buffer)); + ASSERT_NE(buffer, NULL); + + buffer->fd = -1; + buffer->size = 2 * size; + buffer->mirror = malloc(size); + ASSERT_NE(buffer->mirror, NULL); + memset(buffer->mirror, 0xFF, size); + + buffer->ptr = mmap(NULL, 2 * size, + PROT_READ, + MAP_PRIVATE | MAP_ANONYMOUS, + buffer->fd, 0); + ASSERT_NE(buffer->ptr, MAP_FAILED); + + npages = size >> self->page_shift; + map = (void *)ALIGN((uintptr_t)buffer->ptr, size); + ret = madvise(map, size, MADV_HUGEPAGE); + ASSERT_EQ(ret, 0); + old_ptr = buffer->ptr; + buffer->ptr = map; + + /* Initialize a read-only zero huge page. */ + val = *(int *)buffer->ptr; + ASSERT_EQ(val, 0); + + /* Migrate memory to device. */ + ret = hmm_dmirror_cmd(self->fd, HMM_DMIRROR_MIGRATE, buffer, npages); + ASSERT_EQ(ret, 0); + ASSERT_EQ(buffer->cpages, npages); + + /* Check what the device read. */ + for (i = 0, ptr = buffer->mirror; i < size / sizeof(*ptr); ++i) + ASSERT_EQ(ptr[i], 0); + + /* Fault pages back to system memory and check them. */ + for (i = 0, ptr = buffer->ptr; i < size / sizeof(*ptr); ++i) { + ASSERT_EQ(ptr[i], 0); + /* If it asserts once, it probably will 500,000 times */ + if (ptr[i] != 0) + break; + } + + buffer->ptr = old_ptr; + hmm_buffer_free(buffer); +} + +/* + * Migrate private anonymous huge page and free. + */ +TEST_F(hmm, migrate_anon_huge_free) +{ + struct hmm_buffer *buffer; + unsigned long npages; + unsigned long size; + unsigned long i; + void *old_ptr; + void *map; + int *ptr; + int ret; + + size = TWOMEG; + + buffer = malloc(sizeof(*buffer)); + ASSERT_NE(buffer, NULL); + + buffer->fd = -1; + buffer->size = 2 * size; + buffer->mirror = malloc(size); + ASSERT_NE(buffer->mirror, NULL); + memset(buffer->mirror, 0xFF, size); + + buffer->ptr = mmap(NULL, 2 * size, + PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANONYMOUS, + buffer->fd, 0); + ASSERT_NE(buffer->ptr, MAP_FAILED); + + npages = size >> self->page_shift; + map = (void *)ALIGN((uintptr_t)buffer->ptr, size); + ret = madvise(map, size, MADV_HUGEPAGE); + ASSERT_EQ(ret, 0); + old_ptr = buffer->ptr; + buffer->ptr = map; + + /* Initialize buffer in system memory. */ + for (i = 0, ptr = buffer->ptr; i < size / sizeof(*ptr); ++i) + ptr[i] = i; + + /* Migrate memory to device. */ + ret = hmm_dmirror_cmd(self->fd, HMM_DMIRROR_MIGRATE, buffer, npages); + ASSERT_EQ(ret, 0); + ASSERT_EQ(buffer->cpages, npages); + + /* Check what the device read. */ + for (i = 0, ptr = buffer->mirror; i < size / sizeof(*ptr); ++i) + ASSERT_EQ(ptr[i], i); + + /* Try freeing it. */ + ret = madvise(map, size, MADV_FREE); + ASSERT_EQ(ret, 0); + + buffer->ptr = old_ptr; + hmm_buffer_free(buffer); +} + +/* + * Migrate private anonymous huge page and fault back to sysmem. + */ +TEST_F(hmm, migrate_anon_huge_fault) +{ + struct hmm_buffer *buffer; + unsigned long npages; + unsigned long size; + unsigned long i; + void *old_ptr; + void *map; + int *ptr; + int ret; + + size = TWOMEG; + + buffer = malloc(sizeof(*buffer)); + ASSERT_NE(buffer, NULL); + + buffer->fd = -1; + buffer->size = 2 * size; + buffer->mirror = malloc(size); + ASSERT_NE(buffer->mirror, NULL); + memset(buffer->mirror, 0xFF, size); + + buffer->ptr = mmap(NULL, 2 * size, + PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANONYMOUS, + buffer->fd, 0); + ASSERT_NE(buffer->ptr, MAP_FAILED); + + npages = size >> self->page_shift; + map = (void *)ALIGN((uintptr_t)buffer->ptr, size); + ret = madvise(map, size, MADV_HUGEPAGE); + ASSERT_EQ(ret, 0); + old_ptr = buffer->ptr; + buffer->ptr = map; + + /* Initialize buffer in system memory. */ + for (i = 0, ptr = buffer->ptr; i < size / sizeof(*ptr); ++i) + ptr[i] = i; + + /* Migrate memory to device. */ + ret = hmm_dmirror_cmd(self->fd, HMM_DMIRROR_MIGRATE, buffer, npages); + ASSERT_EQ(ret, 0); + ASSERT_EQ(buffer->cpages, npages); + + /* Check what the device read. */ + for (i = 0, ptr = buffer->mirror; i < size / sizeof(*ptr); ++i) + ASSERT_EQ(ptr[i], i); + + /* Fault pages back to system memory and check them. */ + for (i = 0, ptr = buffer->ptr; i < size / sizeof(*ptr); ++i) + ASSERT_EQ(ptr[i], i); + + buffer->ptr = old_ptr; + hmm_buffer_free(buffer); +} + +/* + * Migrate private anonymous huge page with allocation errors. + */ +TEST_F(hmm, migrate_anon_huge_err) +{ + struct hmm_buffer *buffer; + unsigned long npages; + unsigned long size; + unsigned long i; + void *old_ptr; + void *map; + int *ptr; + int ret; + + size = TWOMEG; + + buffer = malloc(sizeof(*buffer)); + ASSERT_NE(buffer, NULL); + + buffer->fd = -1; + buffer->size = 2 * size; + buffer->mirror = malloc(2 * size); + ASSERT_NE(buffer->mirror, NULL); + memset(buffer->mirror, 0xFF, 2 * size); + + old_ptr = mmap(NULL, 2 * size, PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANONYMOUS, buffer->fd, 0); + ASSERT_NE(old_ptr, MAP_FAILED); + + npages = size >> self->page_shift; + map = (void *)ALIGN((uintptr_t)old_ptr, size); + ret = madvise(map, size, MADV_HUGEPAGE); + ASSERT_EQ(ret, 0); + buffer->ptr = map; + + /* Initialize buffer in system memory. */ + for (i = 0, ptr = buffer->ptr; i < size / sizeof(*ptr); ++i) + ptr[i] = i; + + /* Migrate memory to device but force a THP allocation error. */ + ret = hmm_dmirror_cmd(self->fd, HMM_DMIRROR_FLAGS, buffer, + HMM_DMIRROR_FLAG_FAIL_ALLOC); + ASSERT_EQ(ret, 0); + ret = hmm_dmirror_cmd(self->fd, HMM_DMIRROR_MIGRATE, buffer, npages); + ASSERT_EQ(ret, 0); + ASSERT_EQ(buffer->cpages, npages); + + /* Check what the device read. */ + for (i = 0, ptr = buffer->mirror; i < size / sizeof(*ptr); ++i) + ASSERT_EQ(ptr[i], i); + + /* Try faulting back a single (PAGE_SIZE) page. */ + ptr = buffer->ptr; + ASSERT_EQ(ptr[2048], 2048); + + /* unmap and remap the region to reset things. */ + ret = munmap(old_ptr, 2 * size); + ASSERT_EQ(ret, 0); + old_ptr = mmap(NULL, 2 * size, PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANONYMOUS, buffer->fd, 0); + ASSERT_NE(old_ptr, MAP_FAILED); + map = (void *)ALIGN((uintptr_t)old_ptr, size); + ret = madvise(map, size, MADV_HUGEPAGE); + ASSERT_EQ(ret, 0); + buffer->ptr = map; + + /* Initialize buffer in system memory. */ + for (i = 0, ptr = buffer->ptr; i < size / sizeof(*ptr); ++i) + ptr[i] = i; + + /* Migrate THP to device. */ + ret = hmm_dmirror_cmd(self->fd, HMM_DMIRROR_MIGRATE, buffer, npages); + ASSERT_EQ(ret, 0); + ASSERT_EQ(buffer->cpages, npages); + + /* + * Force an allocation error when faulting back a THP resident in the + * device. + */ + ret = hmm_dmirror_cmd(self->fd, HMM_DMIRROR_FLAGS, buffer, + HMM_DMIRROR_FLAG_FAIL_ALLOC); + ASSERT_EQ(ret, 0); + ptr = buffer->ptr; + ASSERT_EQ(ptr[2048], 2048); + + buffer->ptr = old_ptr; + hmm_buffer_free(buffer); +} + +/* + * Migrate private anonymous huge zero page with allocation errors. + */ +TEST_F(hmm, migrate_anon_huge_zero_err) +{ + struct hmm_buffer *buffer; + unsigned long npages; + unsigned long size; + unsigned long i; + void *old_ptr; + void *map; + int *ptr; + int ret; + + size = TWOMEG; + + buffer = malloc(sizeof(*buffer)); + ASSERT_NE(buffer, NULL); + + buffer->fd = -1; + buffer->size = 2 * size; + buffer->mirror = malloc(2 * size); + ASSERT_NE(buffer->mirror, NULL); + memset(buffer->mirror, 0xFF, 2 * size); + + old_ptr = mmap(NULL, 2 * size, PROT_READ, + MAP_PRIVATE | MAP_ANONYMOUS, buffer->fd, 0); + ASSERT_NE(old_ptr, MAP_FAILED); + + npages = size >> self->page_shift; + map = (void *)ALIGN((uintptr_t)old_ptr, size); + ret = madvise(map, size, MADV_HUGEPAGE); + ASSERT_EQ(ret, 0); + buffer->ptr = map; + + /* Migrate memory to device but force a THP allocation error. */ + ret = hmm_dmirror_cmd(self->fd, HMM_DMIRROR_FLAGS, buffer, + HMM_DMIRROR_FLAG_FAIL_ALLOC); + ASSERT_EQ(ret, 0); + ret = hmm_dmirror_cmd(self->fd, HMM_DMIRROR_MIGRATE, buffer, npages); + ASSERT_EQ(ret, 0); + ASSERT_EQ(buffer->cpages, npages); + + /* Check what the device read. */ + for (i = 0, ptr = buffer->mirror; i < size / sizeof(*ptr); ++i) + ASSERT_EQ(ptr[i], 0); + + /* Try faulting back a single (PAGE_SIZE) page. */ + ptr = buffer->ptr; + ASSERT_EQ(ptr[2048], 0); + + /* unmap and remap the region to reset things. */ + ret = munmap(old_ptr, 2 * size); + ASSERT_EQ(ret, 0); + old_ptr = mmap(NULL, 2 * size, PROT_READ, + MAP_PRIVATE | MAP_ANONYMOUS, buffer->fd, 0); + ASSERT_NE(old_ptr, MAP_FAILED); + map = (void *)ALIGN((uintptr_t)old_ptr, size); + ret = madvise(map, size, MADV_HUGEPAGE); + ASSERT_EQ(ret, 0); + buffer->ptr = map; + + /* Initialize buffer in system memory (zero THP page). */ + ret = ptr[0]; + ASSERT_EQ(ret, 0); + + /* Migrate memory to device but force a THP allocation error. */ + ret = hmm_dmirror_cmd(self->fd, HMM_DMIRROR_FLAGS, buffer, + HMM_DMIRROR_FLAG_FAIL_ALLOC); + ASSERT_EQ(ret, 0); + ret = hmm_dmirror_cmd(self->fd, HMM_DMIRROR_MIGRATE, buffer, npages); + ASSERT_EQ(ret, 0); + ASSERT_EQ(buffer->cpages, npages); + + /* Fault the device memory back and check it. */ + for (i = 0, ptr = buffer->ptr; i < size / sizeof(*ptr); ++i) + ASSERT_EQ(ptr[i], 0); + + buffer->ptr = old_ptr; + hmm_buffer_free(buffer); +} + TEST_HARNESS_MAIN -- 2.20.1
Ralph Campbell
2020-Nov-06 00:51 UTC
[Nouveau] [PATCH v3 6/6] nouveau: support THP migration to private memory
Add support for migrating transparent huge pages to and from device private memory. Signed-off-by: Ralph Campbell <rcampbell at nvidia.com> --- drivers/gpu/drm/nouveau/nouveau_dmem.c | 289 ++++++++++++++++++------- drivers/gpu/drm/nouveau/nouveau_svm.c | 11 +- drivers/gpu/drm/nouveau/nouveau_svm.h | 3 +- 3 files changed, 215 insertions(+), 88 deletions(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c b/drivers/gpu/drm/nouveau/nouveau_dmem.c index 92987daa5e17..93eea8e9d987 100644 --- a/drivers/gpu/drm/nouveau/nouveau_dmem.c +++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c @@ -82,6 +82,7 @@ struct nouveau_dmem { struct list_head chunks; struct mutex mutex; struct page *free_pages; + struct page *free_huge_pages; spinlock_t lock; }; @@ -112,8 +113,13 @@ static void nouveau_dmem_page_free(struct page *page) struct nouveau_dmem *dmem = chunk->drm->dmem; spin_lock(&dmem->lock); - page->zone_device_data = dmem->free_pages; - dmem->free_pages = page; + if (PageHead(page)) { + page->zone_device_data = dmem->free_huge_pages; + dmem->free_huge_pages = page; + } else { + page->zone_device_data = dmem->free_pages; + dmem->free_pages = page; + } WARN_ON(!chunk->callocated); chunk->callocated--; @@ -139,51 +145,100 @@ static void nouveau_dmem_fence_done(struct nouveau_fence **fence) static vm_fault_t nouveau_dmem_fault_copy_one(struct nouveau_drm *drm, struct vm_fault *vmf, struct migrate_vma *args, - dma_addr_t *dma_addr) + struct page *spage, bool is_huge, dma_addr_t *dma_addr) { + struct nouveau_svmm *svmm = spage->zone_device_data; struct device *dev = drm->dev->dev; - struct page *dpage, *spage; - struct nouveau_svmm *svmm; - - spage = migrate_pfn_to_page(args->src[0]); - if (!spage || !(args->src[0] & MIGRATE_PFN_MIGRATE)) - return 0; + struct page *dpage; + unsigned int i; - dpage = alloc_page_vma(GFP_HIGHUSER, vmf->vma, vmf->address); + if (is_huge) + dpage = alloc_transhugepage(vmf->vma, args->start); + else + dpage = alloc_page_vma(GFP_HIGHUSER, vmf->vma, vmf->address); if (!dpage) - return VM_FAULT_SIGBUS; - lock_page(dpage); + return VM_FAULT_OOM; + WARN_ON_ONCE(thp_order(spage) != thp_order(dpage)); - *dma_addr = dma_map_page(dev, dpage, 0, PAGE_SIZE, DMA_BIDIRECTIONAL); + *dma_addr = dma_map_page(dev, dpage, 0, page_size(dpage), + DMA_BIDIRECTIONAL); if (dma_mapping_error(dev, *dma_addr)) goto error_free_page; - svmm = spage->zone_device_data; + lock_page(dpage); + i = (vmf->address - args->start) >> PAGE_SHIFT; + spage += i; mutex_lock(&svmm->mutex); nouveau_svmm_invalidate(svmm, args->start, args->end); - if (drm->dmem->migrate.copy_func(drm, 1, NOUVEAU_APER_HOST, *dma_addr, - NOUVEAU_APER_VRAM, nouveau_dmem_page_addr(spage))) + if (drm->dmem->migrate.copy_func(drm, thp_nr_pages(dpage), + NOUVEAU_APER_HOST, *dma_addr, NOUVEAU_APER_VRAM, + nouveau_dmem_page_addr(spage))) goto error_dma_unmap; mutex_unlock(&svmm->mutex); - args->dst[0] = migrate_pfn(page_to_pfn(dpage)) | MIGRATE_PFN_LOCKED; + args->dst[i] = migrate_pfn(page_to_pfn(dpage)) | MIGRATE_PFN_LOCKED; + if (is_huge) + args->dst[i] |= MIGRATE_PFN_COMPOUND; return 0; error_dma_unmap: mutex_unlock(&svmm->mutex); - dma_unmap_page(dev, *dma_addr, PAGE_SIZE, DMA_BIDIRECTIONAL); + unlock_page(dpage); + dma_unmap_page(dev, *dma_addr, page_size(dpage), DMA_BIDIRECTIONAL); error_free_page: __free_page(dpage); return VM_FAULT_SIGBUS; } +static vm_fault_t nouveau_dmem_fault_chunk(struct nouveau_drm *drm, + struct vm_fault *vmf, struct migrate_vma *args) +{ + struct device *dev = drm->dev->dev; + struct nouveau_fence *fence; + struct page *spage; + unsigned long src = args->src[0]; + bool is_huge = (src & (MIGRATE_PFN_MIGRATE | MIGRATE_PFN_COMPOUND)) =+ (MIGRATE_PFN_MIGRATE | MIGRATE_PFN_COMPOUND); + unsigned long dma_page_size; + dma_addr_t dma_addr; + vm_fault_t ret = 0; + + spage = migrate_pfn_to_page(src); + if (!spage) { + ret = VM_FAULT_SIGBUS; + goto out; + } + if (is_huge) { + dma_page_size = PMD_SIZE; + ret = nouveau_dmem_fault_copy_one(drm, vmf, args, spage, true, + &dma_addr); + if (!ret) + goto fence; + /* + * If we couldn't allocate a huge page, fallback to migrating + * a single page. + */ + } + dma_page_size = PAGE_SIZE; + ret = nouveau_dmem_fault_copy_one(drm, vmf, args, spage, false, + &dma_addr); + if (ret) + goto out; +fence: + nouveau_fence_new(drm->dmem->migrate.chan, false, &fence); + migrate_vma_pages(args); + nouveau_dmem_fence_done(&fence); + dma_unmap_page(dev, dma_addr, dma_page_size, DMA_BIDIRECTIONAL); +out: + migrate_vma_finalize(args); + return ret; +} + static vm_fault_t nouveau_dmem_migrate_to_ram(struct vm_fault *vmf) { struct nouveau_drm *drm = page_to_drm(vmf->page); - struct nouveau_dmem *dmem = drm->dmem; - struct nouveau_fence *fence; unsigned long src = 0, dst = 0; - dma_addr_t dma_addr = 0; + struct page *page; vm_fault_t ret; struct migrate_vma args = { .vma = vmf->vma, @@ -192,39 +247,64 @@ static vm_fault_t nouveau_dmem_migrate_to_ram(struct vm_fault *vmf) .src = &src, .dst = &dst, .pgmap_owner = drm->dev, - .flags = MIGRATE_VMA_SELECT_DEVICE_PRIVATE, + .flags = MIGRATE_VMA_SELECT_DEVICE_PRIVATE | + MIGRATE_VMA_SELECT_COMPOUND, }; + /* + * If the page was migrated to the GPU as a huge page, try to + * migrate it back the same way. + */ + page = thp_head(vmf->page); + if (PageHead(page)) { + unsigned int order = thp_order(page); + unsigned int nr_pages = 1U << order; + + args.start &= PAGE_MASK << order; + args.end = args.start + (PAGE_SIZE << order); + args.src = kmalloc_array(nr_pages, sizeof(*args.src), + GFP_KERNEL); + if (!args.src) + return VM_FAULT_OOM; + args.dst = kmalloc_array(nr_pages, sizeof(*args.dst), + GFP_KERNEL); + if (!args.dst) { + ret = VM_FAULT_OOM; + goto error_src; + } + } + /* * FIXME what we really want is to find some heuristic to migrate more * than just one page on CPU fault. When such fault happens it is very * likely that more surrounding page will CPU fault too. */ - if (migrate_vma_setup(&args) < 0) - return VM_FAULT_SIGBUS; - if (!args.cpages) - return 0; - - ret = nouveau_dmem_fault_copy_one(drm, vmf, &args, &dma_addr); - if (ret || dst == 0) - goto done; - - nouveau_fence_new(dmem->migrate.chan, false, &fence); - migrate_vma_pages(&args); - nouveau_dmem_fence_done(&fence); - dma_unmap_page(drm->dev->dev, dma_addr, PAGE_SIZE, DMA_BIDIRECTIONAL); -done: - migrate_vma_finalize(&args); + if (migrate_vma_setup(&args)) + ret = VM_FAULT_SIGBUS; + else + ret = nouveau_dmem_fault_chunk(drm, vmf, &args); + if (args.dst != &dst) + kfree(args.dst); +error_src: + if (args.src != &src) + kfree(args.src); return ret; } +static void nouveau_page_split(struct page *head, struct page *page) +{ + page->pgmap = head->pgmap; + page->zone_device_data = head->zone_device_data; +} + static const struct dev_pagemap_ops nouveau_dmem_pagemap_ops = { .page_free = nouveau_dmem_page_free, .migrate_to_ram = nouveau_dmem_migrate_to_ram, + .page_split = nouveau_page_split, }; -static int -nouveau_dmem_chunk_alloc(struct nouveau_drm *drm, struct page **ppage) +static int nouveau_dmem_chunk_alloc(struct nouveau_drm *drm, bool is_huge, + struct page **ppage) { struct nouveau_dmem_chunk *chunk; struct resource *res; @@ -278,16 +358,20 @@ nouveau_dmem_chunk_alloc(struct nouveau_drm *drm, struct page **ppage) pfn_first = chunk->pagemap.range.start >> PAGE_SHIFT; page = pfn_to_page(pfn_first); spin_lock(&drm->dmem->lock); - for (i = 0; i < DMEM_CHUNK_NPAGES - 1; ++i, ++page) { - page->zone_device_data = drm->dmem->free_pages; - drm->dmem->free_pages = page; - } + if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) && is_huge) + prep_transhuge_device_private_page(page); + else + for (i = 0; i < DMEM_CHUNK_NPAGES - 1; ++i, ++page) { + page->zone_device_data = drm->dmem->free_pages; + drm->dmem->free_pages = page; + } *ppage = page; chunk->callocated++; spin_unlock(&drm->dmem->lock); - NV_INFO(drm, "DMEM: registered %ldMB of device memory\n", - DMEM_CHUNK_SIZE >> 20); + NV_INFO(drm, "DMEM: registered %ldMB of %sdevice memory %lx %lx\n", + DMEM_CHUNK_SIZE >> 20, is_huge ? "huge " : "", pfn_first, + nouveau_dmem_page_addr(page)); return 0; @@ -304,14 +388,20 @@ nouveau_dmem_chunk_alloc(struct nouveau_drm *drm, struct page **ppage) } static struct page * -nouveau_dmem_page_alloc_locked(struct nouveau_drm *drm) +nouveau_dmem_page_alloc_locked(struct nouveau_drm *drm, bool is_huge) { struct nouveau_dmem_chunk *chunk; struct page *page = NULL; int ret; spin_lock(&drm->dmem->lock); - if (drm->dmem->free_pages) { + if (is_huge && drm->dmem->free_huge_pages) { + page = drm->dmem->free_huge_pages; + drm->dmem->free_huge_pages = page->zone_device_data; + chunk = nouveau_page_to_chunk(page); + chunk->callocated++; + spin_unlock(&drm->dmem->lock); + } else if (!is_huge && drm->dmem->free_pages) { page = drm->dmem->free_pages; drm->dmem->free_pages = page->zone_device_data; chunk = nouveau_page_to_chunk(page); @@ -319,7 +409,7 @@ nouveau_dmem_page_alloc_locked(struct nouveau_drm *drm) spin_unlock(&drm->dmem->lock); } else { spin_unlock(&drm->dmem->lock); - ret = nouveau_dmem_chunk_alloc(drm, &page); + ret = nouveau_dmem_chunk_alloc(drm, is_huge, &page); if (ret) return NULL; } @@ -567,31 +657,22 @@ nouveau_dmem_init(struct nouveau_drm *drm) static unsigned long nouveau_dmem_migrate_copy_one(struct nouveau_drm *drm, struct nouveau_svmm *svmm, unsigned long src, - dma_addr_t *dma_addr, u64 *pfn) + struct page *spage, bool is_huge, dma_addr_t dma_addr, u64 *pfn) { - struct device *dev = drm->dev->dev; - struct page *dpage, *spage; + struct page *dpage; unsigned long paddr; + unsigned long dst; - spage = migrate_pfn_to_page(src); - if (!(src & MIGRATE_PFN_MIGRATE)) - goto out; - - dpage = nouveau_dmem_page_alloc_locked(drm); + dpage = nouveau_dmem_page_alloc_locked(drm, is_huge); if (!dpage) goto out; paddr = nouveau_dmem_page_addr(dpage); if (spage) { - *dma_addr = dma_map_page(dev, spage, 0, page_size(spage), - DMA_BIDIRECTIONAL); - if (dma_mapping_error(dev, *dma_addr)) + if (drm->dmem->migrate.copy_func(drm, thp_nr_pages(dpage), + NOUVEAU_APER_VRAM, paddr, NOUVEAU_APER_HOST, dma_addr)) goto out_free_page; - if (drm->dmem->migrate.copy_func(drm, 1, - NOUVEAU_APER_VRAM, paddr, NOUVEAU_APER_HOST, *dma_addr)) - goto out_dma_unmap; } else { - *dma_addr = DMA_MAPPING_ERROR; if (drm->dmem->migrate.clear_func(drm, page_size(dpage), NOUVEAU_APER_VRAM, paddr)) goto out_free_page; @@ -602,10 +683,11 @@ static unsigned long nouveau_dmem_migrate_copy_one(struct nouveau_drm *drm, ((paddr >> PAGE_SHIFT) << NVIF_VMM_PFNMAP_V0_ADDR_SHIFT); if (src & MIGRATE_PFN_WRITE) *pfn |= NVIF_VMM_PFNMAP_V0_W; - return migrate_pfn(page_to_pfn(dpage)) | MIGRATE_PFN_LOCKED; + dst = migrate_pfn(page_to_pfn(dpage)) | MIGRATE_PFN_LOCKED; + if (PageHead(dpage)) + dst |= MIGRATE_PFN_COMPOUND; + return dst; -out_dma_unmap: - dma_unmap_page(dev, *dma_addr, PAGE_SIZE, DMA_BIDIRECTIONAL); out_free_page: nouveau_dmem_page_free_locked(drm, dpage); out: @@ -617,26 +699,64 @@ static void nouveau_dmem_migrate_chunk(struct nouveau_drm *drm, struct nouveau_svmm *svmm, struct migrate_vma *args, dma_addr_t *dma_addrs, u64 *pfns) { + struct device *dev = drm->dev->dev; struct nouveau_fence *fence; unsigned long addr = args->start, nr_dma = 0, i; + unsigned int page_shift = PAGE_SHIFT; + struct page *spage; + unsigned long src = args->src[0]; + bool is_huge = (src & (MIGRATE_PFN_MIGRATE | MIGRATE_PFN_COMPOUND)) =+ (MIGRATE_PFN_MIGRATE | MIGRATE_PFN_COMPOUND); + unsigned long dma_page_size = is_huge ? PMD_SIZE : PAGE_SIZE; + + if (is_huge) { + spage = migrate_pfn_to_page(src); + if (spage) { + dma_addrs[nr_dma] = dma_map_page(dev, spage, 0, + page_size(spage), + DMA_BIDIRECTIONAL); + if (dma_mapping_error(dev, dma_addrs[nr_dma])) + goto out; + nr_dma++; + } + args->dst[0] = nouveau_dmem_migrate_copy_one(drm, svmm, src, + spage, true, *dma_addrs, pfns); + if (args->dst[0] & MIGRATE_PFN_COMPOUND) { + page_shift = PMD_SHIFT; + i = 1; + goto fence; + } + } - for (i = 0; addr < args->end; i++) { - args->dst[i] = nouveau_dmem_migrate_copy_one(drm, svmm, - args->src[i], dma_addrs + nr_dma, pfns + i); - if (!dma_mapping_error(drm->dev->dev, dma_addrs[nr_dma])) + for (i = 0; addr < args->end; i++, addr += PAGE_SIZE) { + src = args->src[i]; + if (!(src & MIGRATE_PFN_MIGRATE)) + continue; + spage = migrate_pfn_to_page(src); + if (spage && !is_huge) { + dma_addrs[i] = dma_map_page(dev, spage, 0, + page_size(spage), + DMA_BIDIRECTIONAL); + if (dma_mapping_error(dev, dma_addrs[i])) + break; nr_dma++; - addr += PAGE_SIZE; + } else if (spage && is_huge && i != 0) + dma_addrs[i] = dma_addrs[i - 1] + PAGE_SIZE; + args->dst[i] = nouveau_dmem_migrate_copy_one(drm, svmm, src, + spage, false, dma_addrs[i], pfns + i); } +fence: nouveau_fence_new(drm->dmem->migrate.chan, false, &fence); migrate_vma_pages(args); nouveau_dmem_fence_done(&fence); - nouveau_pfns_map(svmm, args->vma->vm_mm, args->start, pfns, i); + nouveau_pfns_map(svmm, args->vma->vm_mm, args->start, pfns, i, + page_shift); - while (nr_dma--) { - dma_unmap_page(drm->dev->dev, dma_addrs[nr_dma], PAGE_SIZE, - DMA_BIDIRECTIONAL); - } + while (nr_dma) + dma_unmap_page(drm->dev->dev, dma_addrs[--nr_dma], + dma_page_size, DMA_BIDIRECTIONAL); +out: migrate_vma_finalize(args); } @@ -648,25 +768,25 @@ nouveau_dmem_migrate_vma(struct nouveau_drm *drm, unsigned long end) { unsigned long npages = (end - start) >> PAGE_SHIFT; - unsigned long max = min(SG_MAX_SINGLE_ALLOC, npages); + unsigned long max = min(1UL << (PMD_SHIFT - PAGE_SHIFT), npages); dma_addr_t *dma_addrs; struct migrate_vma args = { .vma = vma, .start = start, .pgmap_owner = drm->dev, - .flags = MIGRATE_VMA_SELECT_SYSTEM, + .flags = MIGRATE_VMA_SELECT_SYSTEM | + MIGRATE_VMA_SELECT_COMPOUND, }; - unsigned long i; u64 *pfns; int ret = -ENOMEM; if (drm->dmem == NULL) return -ENODEV; - args.src = kcalloc(max, sizeof(*args.src), GFP_KERNEL); + args.src = kmalloc_array(max, sizeof(*args.src), GFP_KERNEL); if (!args.src) goto out; - args.dst = kcalloc(max, sizeof(*args.dst), GFP_KERNEL); + args.dst = kmalloc_array(max, sizeof(*args.dst), GFP_KERNEL); if (!args.dst) goto out_free_src; @@ -678,8 +798,10 @@ nouveau_dmem_migrate_vma(struct nouveau_drm *drm, if (!pfns) goto out_free_dma; - for (i = 0; i < npages; i += max) { - args.end = start + (max << PAGE_SHIFT); + for (; args.start < end; args.start = args.end) { + args.end = min(end, ALIGN(args.start, PMD_SIZE)); + if (args.start == args.end) + args.end = min(end, args.start + PMD_SIZE); ret = migrate_vma_setup(&args); if (ret) goto out_free_pfns; @@ -687,7 +809,6 @@ nouveau_dmem_migrate_vma(struct nouveau_drm *drm, if (args.cpages) nouveau_dmem_migrate_chunk(drm, svmm, &args, dma_addrs, pfns); - args.start = args.end; } ret = 0; diff --git a/drivers/gpu/drm/nouveau/nouveau_svm.c b/drivers/gpu/drm/nouveau/nouveau_svm.c index 4f69e4c3dafd..3db0997f21b5 100644 --- a/drivers/gpu/drm/nouveau/nouveau_svm.c +++ b/drivers/gpu/drm/nouveau/nouveau_svm.c @@ -681,7 +681,6 @@ nouveau_svm_fault(struct nvif_notify *notify) nouveau_svm_fault_cancel_fault(svm, buffer->fault[fi]); continue; } - SVMM_DBG(svmm, "addr %016llx", buffer->fault[fi]->addr); /* We try and group handling of faults within a small * window into a single update. @@ -733,6 +732,10 @@ nouveau_svm_fault(struct nvif_notify *notify) } mmput(mm); + SVMM_DBG(svmm, "addr %llx %s %c", buffer->fault[fi]->addr, + args.phys[0] & NVIF_VMM_PFNMAP_V0_VRAM ? + "vram" : "sysmem", + args.i.p.size > PAGE_SIZE ? 'H' : 'N'); limit = args.i.p.addr + args.i.p.size; for (fn = fi; ++fn < buffer->fault_nr; ) { /* It's okay to skip over duplicate addresses from the @@ -804,13 +807,15 @@ nouveau_pfns_free(u64 *pfns) void nouveau_pfns_map(struct nouveau_svmm *svmm, struct mm_struct *mm, - unsigned long addr, u64 *pfns, unsigned long npages) + unsigned long addr, u64 *pfns, unsigned long npages, + unsigned int page_shift) { struct nouveau_pfnmap_args *args = nouveau_pfns_to_args(pfns); int ret; args->p.addr = addr; - args->p.size = npages << PAGE_SHIFT; + args->p.page = page_shift; + args->p.size = npages << args->p.page; mutex_lock(&svmm->mutex); diff --git a/drivers/gpu/drm/nouveau/nouveau_svm.h b/drivers/gpu/drm/nouveau/nouveau_svm.h index e7d63d7f0c2d..3fd78662f17e 100644 --- a/drivers/gpu/drm/nouveau/nouveau_svm.h +++ b/drivers/gpu/drm/nouveau/nouveau_svm.h @@ -33,7 +33,8 @@ void nouveau_svmm_invalidate(struct nouveau_svmm *svmm, u64 start, u64 limit); u64 *nouveau_pfns_alloc(unsigned long npages); void nouveau_pfns_free(u64 *pfns); void nouveau_pfns_map(struct nouveau_svmm *svmm, struct mm_struct *mm, - unsigned long addr, u64 *pfns, unsigned long npages); + unsigned long addr, u64 *pfns, unsigned long npages, + unsigned int page_shift); #else /* IS_ENABLED(CONFIG_DRM_NOUVEAU_SVM) */ static inline void nouveau_svm_init(struct nouveau_drm *drm) {} static inline void nouveau_svm_fini(struct nouveau_drm *drm) {} -- 2.20.1
Christoph Hellwig
2020-Nov-06 07:55 UTC
[Nouveau] [PATCH v3 1/6] mm/thp: add prep_transhuge_device_private_page()
On Thu, Nov 05, 2020 at 04:51:42PM -0800, Ralph Campbell wrote:> +extern void prep_transhuge_device_private_page(struct page *page);No need for the extern.> +static inline void prep_transhuge_device_private_page(struct page *page) > +{ > +}Is the code to call this even reachable if THP support is configured out? If not just declaring it unconditionally and letting dead code elimination do its job might be a tad cleaner.> +void prep_transhuge_device_private_page(struct page *page)I think a kerneldoc comment explaining what this function is useful for would be helpful.
Christoph Hellwig
2020-Nov-06 07:56 UTC
[Nouveau] [PATCH v3 2/6] mm/migrate: move migrate_vma_collect_skip()
On Thu, Nov 05, 2020 at 04:51:43PM -0800, Ralph Campbell wrote:> Move the definition of migrate_vma_collect_skip() to make it callable > by migrate_vma_collect_hole(). This helps make the next patch easier > to read. > > Signed-off-by: Ralph Campbell <rcampbell at nvidia.com>Looks good, Reviewed-by: Christoph Hellwig <hch at lst.de>
Christoph Hellwig
2020-Nov-06 07:57 UTC
[Nouveau] [PATCH v3 2/6] mm/migrate: move migrate_vma_collect_skip()
Looks good: Reviewed-by: Christoph Hellwig <hch at lst.de>
Christoph Hellwig
2020-Nov-06 08:01 UTC
[Nouveau] [PATCH v3 4/6] mm/thp: add THP allocation helper
> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE > +extern struct page *alloc_transhugepage(struct vm_area_struct *vma, > + unsigned long addr);No need for the extern. And also here: do we actually need the stub, or can the caller make sure (using IS_ENABLED and similar) that the compiler knows the code is dead?> +struct page *alloc_transhugepage(struct vm_area_struct *vma, > + unsigned long haddr) > +{ > + gfp_t gfp; > + struct page *page; > + > + gfp = alloc_hugepage_direct_gfpmask(vma); > + page = alloc_hugepage_vma(gfp, vma, haddr, HPAGE_PMD_ORDER); > + if (page) > + prep_transhuge_page(page); > + return page;I think do_huge_pmd_anonymous_page should be switched to use this helper as well.
Christoph Hellwig
2020-Nov-06 08:03 UTC
[Nouveau] [PATCH v3 3/6] mm: support THP migration to device private memory
I hate the extra pin count magic here. IMHO we really need to finish off the series to get rid of the extra references on the ZONE_DEVICE pages first.
Matthew Wilcox
2020-Nov-06 12:14 UTC
[Nouveau] [PATCH v3 1/6] mm/thp: add prep_transhuge_device_private_page()
On Thu, Nov 05, 2020 at 04:51:42PM -0800, Ralph Campbell wrote:> Add a helper function to allow device drivers to create device private > transparent huge pages. This is intended to help support device private > THP migrations.I think you'd be better off with these calling conventions: -void prep_transhuge_page(struct page *page) +struct page *thp_prep(struct page *page) { + if (!page || compound_order(page) == 0) + return page; /* - * we use page->mapping and page->indexlru in second tail page + * we use page->mapping and page->index in second tail page * as list_head: assuming THP order >= 2 */ + BUG_ON(compound_order(page) == 1); INIT_LIST_HEAD(page_deferred_list(page)); set_compound_page_dtor(page, TRANSHUGE_PAGE_DTOR); + + return page; } It simplifies the users.> +void prep_transhuge_device_private_page(struct page *page) > +{ > + prep_compound_page(page, HPAGE_PMD_ORDER); > + prep_transhuge_page(page); > + /* Only the head page has a reference to the pgmap. */ > + percpu_ref_put_many(page->pgmap->ref, HPAGE_PMD_NR - 1); > +} > +EXPORT_SYMBOL_GPL(prep_transhuge_device_private_page);Something else that may interest you from my patch series is support for page sizes other than PMD_SIZE. I don't know what page sizes your hardware supports. There's no support for page sizes other than PMD for anonymous memory, so this might not be too useful for you yet.
Possibly Parallel Threads
- [PATCH v3 3/6] mm: support THP migration to device private memory
- [RFC PATCH v3 0/2] mm: remove extra ZONE_DEVICE struct page refcount
- [PATCH] mm: remove extra ZONE_DEVICE struct page refcount
- [PATCH v3 3/6] mm: support THP migration to device private memory
- [RFC PATCH v3 2/2] mm: remove extra ZONE_DEVICE struct page refcount