Ralph Campbell
2020-Sep-02 16:58 UTC
[Nouveau] [PATCH v2 0/7] mm/hmm/nouveau: add THP migration to migrate_vma_*
This series adds support for transparent huge page migration to migrate_vma_*() and adds nouveau SVM and HMM selftests as consumers. An earlier version was posted previously [1]. This version now supports splitting a THP midway in the migration process which led to a number of changes. The patches apply cleanly to the current linux-mm tree. Since there are a couple of patches in linux-mm from Dan Williams that modify lib/test_hmm.c and drivers/gpu/drm/nouveau/nouveau_dmem.c, it might be easiest if Andrew could take these through the linux-mm tree assuming that's OK with other maintainers like Ben Skeggs. [1] https://lore.kernel.org/linux-mm/20200619215649.32297-1-rcampbell at nvidia.com Ralph Campbell (7): mm/thp: fix __split_huge_pmd_locked() for migration PMD mm/migrate: move migrate_vma_collect_skip() mm: support THP migration to device private memory mm/thp: add prep_transhuge_device_private_page() mm/thp: add THP allocation helper mm/hmm/test: add self tests for THP migration nouveau: support THP migration to private memory drivers/gpu/drm/nouveau/nouveau_dmem.c | 289 +++++++++++----- drivers/gpu/drm/nouveau/nouveau_svm.c | 11 +- drivers/gpu/drm/nouveau/nouveau_svm.h | 3 +- include/linux/gfp.h | 10 + include/linux/huge_mm.h | 12 + include/linux/memremap.h | 9 + include/linux/migrate.h | 2 + lib/test_hmm.c | 439 +++++++++++++++++++++---- lib/test_hmm_uapi.h | 3 + mm/huge_memory.c | 177 +++++++--- mm/memory.c | 10 +- mm/migrate.c | 429 +++++++++++++++++++----- mm/rmap.c | 2 +- tools/testing/selftests/vm/hmm-tests.c | 404 +++++++++++++++++++++++ 14 files changed, 1519 insertions(+), 281 deletions(-) -- 2.20.1
Ralph Campbell
2020-Sep-02 16:58 UTC
[Nouveau] [PATCH v2 1/7] mm/thp: fix __split_huge_pmd_locked() for migration PMD
A migrating transparent huge page has to already be unmapped. Otherwise,
the page could be modified while it is being copied to a new page and
data could be lost. The function __split_huge_pmd() checks for a PMD
migration entry before calling __split_huge_pmd_locked() leading one to
think that __split_huge_pmd_locked() can handle splitting a migrating PMD.
However, the code always increments the page->_mapcount and adjusts the
memory control group accounting assuming the page is mapped.
Also, if the PMD entry is a migration PMD entry, the call to
is_huge_zero_pmd(*pmd) is incorrect because it calls pmd_pfn(pmd) instead
of migration_entry_to_pfn(pmd_to_swp_entry(pmd)).
Fix these problems by checking for a PMD migration entry.
Signed-off-by: Ralph Campbell <rcampbell at nvidia.com>
---
mm/huge_memory.c | 42 +++++++++++++++++++++++-------------------
1 file changed, 23 insertions(+), 19 deletions(-)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 2a468a4acb0a..606d712d9505 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2023,7 +2023,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct
*vma, pmd_t *pmd,
put_page(page);
add_mm_counter(mm, mm_counter_file(page), -HPAGE_PMD_NR);
return;
- } else if (is_huge_zero_pmd(*pmd)) {
+ } else if (pmd_trans_huge(*pmd) && is_huge_zero_pmd(*pmd)) {
/*
* FIXME: Do we want to invalidate secondary mmu by calling
* mmu_notifier_invalidate_range() see comments below inside
@@ -2117,30 +2117,34 @@ static void __split_huge_pmd_locked(struct
vm_area_struct *vma, pmd_t *pmd,
pte = pte_offset_map(&_pmd, addr);
BUG_ON(!pte_none(*pte));
set_pte_at(mm, addr, pte, entry);
- atomic_inc(&page[i]._mapcount);
- pte_unmap(pte);
- }
-
- /*
- * Set PG_double_map before dropping compound_mapcount to avoid
- * false-negative page_mapped().
- */
- if (compound_mapcount(page) > 1 && !TestSetPageDoubleMap(page)) {
- for (i = 0; i < HPAGE_PMD_NR; i++)
+ if (!pmd_migration)
atomic_inc(&page[i]._mapcount);
+ pte_unmap(pte);
}
- lock_page_memcg(page);
- if (atomic_add_negative(-1, compound_mapcount_ptr(page))) {
- /* Last compound_mapcount is gone. */
- __dec_lruvec_page_state(page, NR_ANON_THPS);
- if (TestClearPageDoubleMap(page)) {
- /* No need in mapcount reference anymore */
+ if (!pmd_migration) {
+ /*
+ * Set PG_double_map before dropping compound_mapcount to avoid
+ * false-negative page_mapped().
+ */
+ if (compound_mapcount(page) > 1 &&
+ !TestSetPageDoubleMap(page)) {
for (i = 0; i < HPAGE_PMD_NR; i++)
- atomic_dec(&page[i]._mapcount);
+ atomic_inc(&page[i]._mapcount);
+ }
+
+ lock_page_memcg(page);
+ if (atomic_add_negative(-1, compound_mapcount_ptr(page))) {
+ /* Last compound_mapcount is gone. */
+ __dec_lruvec_page_state(page, NR_ANON_THPS);
+ if (TestClearPageDoubleMap(page)) {
+ /* No need in mapcount reference anymore */
+ for (i = 0; i < HPAGE_PMD_NR; i++)
+ atomic_dec(&page[i]._mapcount);
+ }
}
+ unlock_page_memcg(page);
}
- unlock_page_memcg(page);
smp_wmb(); /* make pte visible before pmd */
pmd_populate(mm, pmd, pgtable);
--
2.20.1
Ralph Campbell
2020-Sep-02 16:58 UTC
[Nouveau] [PATCH v2 2/7] mm/migrate: move migrate_vma_collect_skip()
Move the definition of migrate_vma_collect_skip() to make it callable
by migrate_vma_collect_hole(). This helps make the next patch easier
to read.
Signed-off-by: Ralph Campbell <rcampbell at nvidia.com>
---
mm/migrate.c | 30 +++++++++++++++---------------
1 file changed, 15 insertions(+), 15 deletions(-)
diff --git a/mm/migrate.c b/mm/migrate.c
index 4f89360d9e77..ce16ed3deab6 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -2254,6 +2254,21 @@ int migrate_misplaced_transhuge_page(struct mm_struct
*mm,
#endif /* CONFIG_NUMA */
#ifdef CONFIG_DEVICE_PRIVATE
+static int migrate_vma_collect_skip(unsigned long start,
+ unsigned long end,
+ struct mm_walk *walk)
+{
+ struct migrate_vma *migrate = walk->private;
+ unsigned long addr;
+
+ for (addr = start; addr < end; addr += PAGE_SIZE) {
+ migrate->dst[migrate->npages] = 0;
+ migrate->src[migrate->npages++] = 0;
+ }
+
+ return 0;
+}
+
static int migrate_vma_collect_hole(unsigned long start,
unsigned long end,
__always_unused int depth,
@@ -2282,21 +2297,6 @@ static int migrate_vma_collect_hole(unsigned long start,
return 0;
}
-static int migrate_vma_collect_skip(unsigned long start,
- unsigned long end,
- struct mm_walk *walk)
-{
- struct migrate_vma *migrate = walk->private;
- unsigned long addr;
-
- for (addr = start; addr < end; addr += PAGE_SIZE) {
- migrate->dst[migrate->npages] = 0;
- migrate->src[migrate->npages++] = 0;
- }
-
- return 0;
-}
-
static int migrate_vma_collect_pmd(pmd_t *pmdp,
unsigned long start,
unsigned long end,
--
2.20.1
Ralph Campbell
2020-Sep-02 16:58 UTC
[Nouveau] [PATCH v2 3/7] mm: support THP migration to device private memory
Support transparent huge page migration to ZONE_DEVICE private memory.
A new selection flag (MIGRATE_VMA_SELECT_COMPOUND) is added to request
THP migration. Otherwise, THPs are split when filling in the source PFN
array. A new flag (MIGRATE_PFN_COMPOUND) is added to the source PFN array
to indicate a huge page can be migrated. If the device driver can allocate
a huge page, it sets the MIGRATE_PFN_COMPOUND flag in the destination PFN
array. migrate_vma_pages() will fallback to PAGE_SIZE pages if
MIGRATE_PFN_COMPOUND is not set in both source and destination arrays.
Signed-off-by: Ralph Campbell <rcampbell at nvidia.com>
---
include/linux/huge_mm.h | 7 +
include/linux/memremap.h | 9 +
include/linux/migrate.h | 2 +
mm/huge_memory.c | 113 ++++++++---
mm/memory.c | 10 +-
mm/migrate.c | 413 ++++++++++++++++++++++++++++++++-------
mm/rmap.c | 2 +-
7 files changed, 459 insertions(+), 97 deletions(-)
diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 8a8bc46a2432..87b42c81dedc 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -192,6 +192,8 @@ bool is_transparent_hugepage(struct page *page);
bool can_split_huge_page(struct page *page, int *pextra_pins);
int split_huge_page_to_list(struct page *page, struct list_head *list);
+int split_migrating_huge_page(struct vm_area_struct *vma, pmd_t *pmd,
+ unsigned long address, struct page *page);
static inline int split_huge_page(struct page *page)
{
return split_huge_page_to_list(page, NULL);
@@ -454,6 +456,11 @@ static inline bool is_huge_zero_page(struct page *page)
return false;
}
+static inline bool is_huge_zero_pmd(pmd_t pmd)
+{
+ return false;
+}
+
static inline bool is_huge_zero_pud(pud_t pud)
{
return false;
diff --git a/include/linux/memremap.h b/include/linux/memremap.h
index 4e9c738f4b31..5175b1eaea01 100644
--- a/include/linux/memremap.h
+++ b/include/linux/memremap.h
@@ -88,6 +88,15 @@ struct dev_pagemap_ops {
* the page back to a CPU accessible page.
*/
vm_fault_t (*migrate_to_ram)(struct vm_fault *vmf);
+
+ /*
+ * Used for private (un-addressable) device memory only.
+ * This is called when a compound device private page is split.
+ * The driver uses this callback to set tail_page->pgmap and
+ * tail_page->zone_device_data appropriately based on the head
+ * page.
+ */
+ void (*page_split)(struct page *head, struct page *tail_page);
};
#define PGMAP_ALTMAP_VALID (1 << 0)
diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index 0f8d1583fa8e..92179bf360d1 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -144,6 +144,7 @@ static inline int migrate_misplaced_transhuge_page(struct
mm_struct *mm,
#define MIGRATE_PFN_MIGRATE (1UL << 1)
#define MIGRATE_PFN_LOCKED (1UL << 2)
#define MIGRATE_PFN_WRITE (1UL << 3)
+#define MIGRATE_PFN_COMPOUND (1UL << 4)
#define MIGRATE_PFN_SHIFT 6
static inline struct page *migrate_pfn_to_page(unsigned long mpfn)
@@ -161,6 +162,7 @@ static inline unsigned long migrate_pfn(unsigned long pfn)
enum migrate_vma_direction {
MIGRATE_VMA_SELECT_SYSTEM = 1 << 0,
MIGRATE_VMA_SELECT_DEVICE_PRIVATE = 1 << 1,
+ MIGRATE_VMA_SELECT_COMPOUND = 1 << 2,
};
struct migrate_vma {
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 606d712d9505..a8d48994481a 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1644,23 +1644,35 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct
vm_area_struct *vma,
} else {
struct page *page = NULL;
int flush_needed = 1;
+ bool is_anon = false;
if (pmd_present(orig_pmd)) {
page = pmd_page(orig_pmd);
+ is_anon = PageAnon(page);
page_remove_rmap(page, true);
VM_BUG_ON_PAGE(page_mapcount(page) < 0, page);
VM_BUG_ON_PAGE(!PageHead(page), page);
} else if (thp_migration_supported()) {
swp_entry_t entry;
- VM_BUG_ON(!is_pmd_migration_entry(orig_pmd));
entry = pmd_to_swp_entry(orig_pmd);
- page = pfn_to_page(swp_offset(entry));
+ if (is_device_private_entry(entry)) {
+ page = device_private_entry_to_page(entry);
+ is_anon = PageAnon(page);
+ page_remove_rmap(page, true);
+ VM_BUG_ON_PAGE(page_mapcount(page) < 0, page);
+ VM_BUG_ON_PAGE(!PageHead(page), page);
+ put_page(page);
+ } else {
+ VM_BUG_ON(!is_pmd_migration_entry(orig_pmd));
+ page = pfn_to_page(swp_offset(entry));
+ is_anon = PageAnon(page);
+ }
flush_needed = 0;
} else
WARN_ONCE(1, "Non present huge pmd without pmd migration
enabled!");
- if (PageAnon(page)) {
+ if (is_anon) {
zap_deposited_table(tlb->mm, pmd);
add_mm_counter(tlb->mm, MM_ANONPAGES, -HPAGE_PMD_NR);
} else {
@@ -2320,9 +2332,10 @@ static void remap_page(struct page *page)
}
static void __split_huge_page_tail(struct page *head, int tail,
- struct lruvec *lruvec, struct list_head *list)
+ struct lruvec *lruvec, struct list_head *list, bool remap)
{
struct page *page_tail = head + tail;
+ int pin_count;
VM_BUG_ON_PAGE(atomic_read(&page_tail->_mapcount) != -1, page_tail);
@@ -2360,10 +2373,17 @@ static void __split_huge_page_tail(struct page *head,
int tail,
* After successful get_page_unless_zero() might follow put_page()
* which needs correct compound_head().
*/
- clear_compound_head(page_tail);
+ if (is_device_private_page(head)) {
+ /* Restore page_tail->pgmap and page_tail->zone_device_data. */
+ head->pgmap->ops->page_split(head, page_tail);
+ pin_count = 2;
+ } else {
+ clear_compound_head(page_tail);
+ pin_count = 1;
+ }
/* Finally unfreeze refcount. Additional reference from page cache. */
- page_ref_unfreeze(page_tail, 1 + (!PageAnon(head) ||
+ page_ref_unfreeze(page_tail, pin_count + (!PageAnon(head) ||
PageSwapCache(head)));
if (page_is_young(head))
@@ -2378,11 +2398,12 @@ static void __split_huge_page_tail(struct page *head,
int tail,
* pages to show after the currently processed elements - e.g.
* migrate_pages
*/
- lru_add_page_tail(head, page_tail, lruvec, list);
+ if (remap)
+ lru_add_page_tail(head, page_tail, lruvec, list);
}
static void __split_huge_page(struct page *page, struct list_head *list,
- pgoff_t end, unsigned long flags)
+ pgoff_t end, unsigned long flags, bool remap)
{
struct page *head = compound_head(page);
pg_data_t *pgdat = page_pgdat(head);
@@ -2405,7 +2426,7 @@ static void __split_huge_page(struct page *page, struct
list_head *list,
}
for (i = HPAGE_PMD_NR - 1; i >= 1; i--) {
- __split_huge_page_tail(head, i, lruvec, list);
+ __split_huge_page_tail(head, i, lruvec, list, remap);
/* Some pages can be beyond i_size: drop them from page cache */
if (head[i].index >= end) {
ClearPageDirty(head + i);
@@ -2432,6 +2453,8 @@ static void __split_huge_page(struct page *page, struct
list_head *list,
if (PageSwapCache(head)) {
page_ref_add(head, 2);
xa_unlock(&swap_cache->i_pages);
+ } else if (is_device_private_page(head)) {
+ page_ref_add(head, 2);
} else {
page_ref_inc(head);
}
@@ -2443,6 +2466,9 @@ static void __split_huge_page(struct page *page, struct
list_head *list,
spin_unlock_irqrestore(&pgdat->lru_lock, flags);
+ if (!remap)
+ return;
+
remap_page(head);
for (i = 0; i < HPAGE_PMD_NR; i++) {
@@ -2577,7 +2603,8 @@ bool can_split_huge_page(struct page *page, int
*pextra_pins)
* Returns -EBUSY if the page is pinned or if anon_vma disappeared from under
* us.
*/
-int split_huge_page_to_list(struct page *page, struct list_head *list)
+static int __split_huge_page_to_list(struct page *page, struct list_head *list,
+ bool remap)
{
struct page *head = compound_head(page);
struct pglist_data *pgdata = NODE_DATA(page_to_nid(head));
@@ -2590,7 +2617,7 @@ int split_huge_page_to_list(struct page *page, struct
list_head *list)
VM_BUG_ON_PAGE(is_huge_zero_page(head), head);
VM_BUG_ON_PAGE(!PageLocked(head), head);
- VM_BUG_ON_PAGE(!PageCompound(head), head);
+ VM_BUG_ON_PAGE(!PageHead(head), head);
if (PageWriteback(head))
return -EBUSY;
@@ -2604,14 +2631,16 @@ int split_huge_page_to_list(struct page *page, struct
list_head *list)
* is taken to serialise against parallel split or collapse
* operations.
*/
- anon_vma = page_get_anon_vma(head);
- if (!anon_vma) {
- ret = -EBUSY;
- goto out;
+ if (remap) {
+ anon_vma = page_get_anon_vma(head);
+ if (!anon_vma) {
+ ret = -EBUSY;
+ goto out;
+ }
+ anon_vma_lock_write(anon_vma);
}
end = -1;
mapping = NULL;
- anon_vma_lock_write(anon_vma);
} else {
mapping = head->mapping;
@@ -2637,13 +2666,18 @@ int split_huge_page_to_list(struct page *page, struct
list_head *list)
/*
* Racy check if we can split the page, before unmap_page() will
* split PMDs
+ * If we are splitting a migrating THP, there is no check needed
+ * because the page is already unmapped and isolated from the LRU.
*/
- if (!can_split_huge_page(head, &extra_pins)) {
+ if (!remap)
+ extra_pins = HPAGE_PMD_NR - 1 + is_device_private_page(head);
+ else if (!can_split_huge_page(head, &extra_pins)) {
ret = -EBUSY;
goto out_unlock;
}
- unmap_page(head);
+ if (remap)
+ unmap_page(head);
VM_BUG_ON_PAGE(compound_mapcount(head), head);
/* prevent PageLRU to go away from under us, and freeze lru stats */
@@ -2668,7 +2702,7 @@ int split_huge_page_to_list(struct page *page, struct
list_head *list)
if (!mapcount && page_ref_freeze(head, 1 + extra_pins)) {
if (!list_empty(page_deferred_list(head))) {
ds_queue->split_queue_len--;
- list_del(page_deferred_list(head));
+ list_del_init(page_deferred_list(head));
}
spin_unlock(&ds_queue->split_queue_lock);
if (mapping) {
@@ -2678,7 +2712,7 @@ int split_huge_page_to_list(struct page *page, struct
list_head *list)
__dec_node_page_state(head, NR_FILE_THPS);
}
- __split_huge_page(page, list, end, flags);
+ __split_huge_page(page, list, end, flags, remap);
if (PageSwapCache(head)) {
swp_entry_t entry = { .val = page_private(head) };
@@ -2698,7 +2732,8 @@ int split_huge_page_to_list(struct page *page, struct
list_head *list)
fail: if (mapping)
xa_unlock(&mapping->i_pages);
spin_unlock_irqrestore(&pgdata->lru_lock, flags);
- remap_page(head);
+ if (remap)
+ remap_page(head);
ret = -EBUSY;
}
@@ -2714,6 +2749,36 @@ fail: if (mapping)
return ret;
}
+int split_huge_page_to_list(struct page *page, struct list_head *list)
+{
+ return __split_huge_page_to_list(page, list, true);
+}
+
+/*
+ * Split a migrating huge page.
+ * The caller should have mmap_lock_read() held, the huge page unmapped and
+ * isolated, and the PMD page table entry set to a migration entry for the
+ * given head page.
+ */
+int split_migrating_huge_page(struct vm_area_struct *vma, pmd_t *pmd,
+ unsigned long address, struct page *head)
+{
+ spinlock_t *ptl;
+
+ VM_BUG_ON_PAGE(is_huge_zero_page(head), head);
+ VM_BUG_ON_PAGE(!PageLocked(head), head);
+ VM_BUG_ON_PAGE(!PageHead(head), head);
+ VM_BUG_ON_PAGE(PageWriteback(head), head);
+ VM_BUG_ON_PAGE(PageLRU(head), head);
+ VM_BUG_ON_PAGE(compound_mapcount(head), head);
+
+ ptl = pmd_lock(vma->vm_mm, pmd);
+ __split_huge_pmd_locked(vma, pmd, address, false);
+ spin_unlock(ptl);
+
+ return __split_huge_page_to_list(head, NULL, false);
+}
+
void free_transhuge_page(struct page *page)
{
struct deferred_split *ds_queue = get_deferred_split_queue(page);
@@ -2722,7 +2787,7 @@ void free_transhuge_page(struct page *page)
spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
if (!list_empty(page_deferred_list(page))) {
ds_queue->split_queue_len--;
- list_del(page_deferred_list(page));
+ list_del_init(page_deferred_list(page));
}
spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags);
free_compound_page(page);
@@ -2942,6 +3007,10 @@ void remove_migration_pmd(struct page_vma_mapped_walk
*pvmw, struct page *new)
pmde = pmd_mksoft_dirty(pmde);
if (is_write_migration_entry(entry))
pmde = maybe_pmd_mkwrite(pmde, vma);
+ if (unlikely(is_device_private_page(new))) {
+ entry = make_device_private_entry(new, pmd_write(pmde));
+ pmde = swp_entry_to_pmd(entry);
+ }
flush_cache_range(vma, mmun_start, mmun_start + HPAGE_PMD_SIZE);
if (PageAnon(new))
diff --git a/mm/memory.c b/mm/memory.c
index fb5463153351..81fcf5101fc6 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4346,9 +4346,15 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct
*vma,
barrier();
if (unlikely(is_swap_pmd(orig_pmd))) {
+ swp_entry_t entry = pmd_to_swp_entry(orig_pmd);
+
+ if (is_device_private_entry(entry)) {
+ vmf.page = device_private_entry_to_page(entry);
+ return vmf.page->pgmap->ops->migrate_to_ram(&vmf);
+ }
VM_BUG_ON(thp_migration_supported() &&
- !is_pmd_migration_entry(orig_pmd));
- if (is_pmd_migration_entry(orig_pmd))
+ !is_migration_entry(entry));
+ if (is_migration_entry(entry))
pmd_migration_entry_wait(mm, vmf.pmd);
return 0;
}
diff --git a/mm/migrate.c b/mm/migrate.c
index ce16ed3deab6..139844534dd8 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -51,6 +51,7 @@
#include <linux/oom.h>
#include <asm/tlbflush.h>
+#include <asm/pgalloc.h>
#define CREATE_TRACE_POINTS
#include <trace/events/migrate.h>
@@ -2276,19 +2277,28 @@ static int migrate_vma_collect_hole(unsigned long start,
{
struct migrate_vma *migrate = walk->private;
unsigned long addr;
+ unsigned long mpfn;
/* Only allow populating anonymous memory. */
- if (!vma_is_anonymous(walk->vma)) {
- for (addr = start; addr < end; addr += PAGE_SIZE) {
- migrate->src[migrate->npages] = 0;
- migrate->dst[migrate->npages] = 0;
- migrate->npages++;
- }
- return 0;
+ if (!vma_is_anonymous(walk->vma) ||
+ !((migrate->flags & MIGRATE_VMA_SELECT_SYSTEM)))
+ return migrate_vma_collect_skip(start, end, walk);
+
+ if (thp_migration_supported() &&
+ (migrate->flags & MIGRATE_VMA_SELECT_COMPOUND) &&
+ (start & ~PMD_MASK) == 0 && (end & ~PMD_MASK) == 0) {
+ migrate->src[migrate->npages] = MIGRATE_PFN_MIGRATE |
+ MIGRATE_PFN_COMPOUND;
+ migrate->dst[migrate->npages] = 0;
+ migrate->npages++;
+ migrate->cpages++;
+ return migrate_vma_collect_skip(start + PAGE_SIZE, end, walk);
}
+ mpfn = (migrate->vma->vm_flags & VM_WRITE) ?
+ (MIGRATE_PFN_MIGRATE | MIGRATE_PFN_WRITE) : MIGRATE_PFN_MIGRATE;
for (addr = start; addr < end; addr += PAGE_SIZE) {
- migrate->src[migrate->npages] = MIGRATE_PFN_MIGRATE;
+ migrate->src[migrate->npages] = mpfn;
migrate->dst[migrate->npages] = 0;
migrate->npages++;
migrate->cpages++;
@@ -2297,59 +2307,133 @@ static int migrate_vma_collect_hole(unsigned long
start,
return 0;
}
-static int migrate_vma_collect_pmd(pmd_t *pmdp,
- unsigned long start,
- unsigned long end,
- struct mm_walk *walk)
+static int migrate_vma_handle_pmd(pmd_t *pmdp, unsigned long start,
+ unsigned long end, struct mm_walk *walk)
{
struct migrate_vma *migrate = walk->private;
struct vm_area_struct *vma = walk->vma;
struct mm_struct *mm = vma->vm_mm;
- unsigned long addr = start, unmapped = 0;
spinlock_t *ptl;
- pte_t *ptep;
+ struct page *page;
+ unsigned long write = 0;
+ int ret;
-again:
- if (pmd_none(*pmdp))
+ ptl = pmd_lock(mm, pmdp);
+ if (pmd_none(*pmdp)) {
+ spin_unlock(ptl);
return migrate_vma_collect_hole(start, end, -1, walk);
-
+ }
if (pmd_trans_huge(*pmdp)) {
- struct page *page;
-
- ptl = pmd_lock(mm, pmdp);
- if (unlikely(!pmd_trans_huge(*pmdp))) {
+ if (!(migrate->flags & MIGRATE_VMA_SELECT_SYSTEM)) {
spin_unlock(ptl);
- goto again;
+ return migrate_vma_collect_skip(start, end, walk);
}
-
page = pmd_page(*pmdp);
if (is_huge_zero_page(page)) {
spin_unlock(ptl);
- split_huge_pmd(vma, pmdp, addr);
- if (pmd_trans_unstable(pmdp))
- return migrate_vma_collect_skip(start, end,
- walk);
- } else {
- int ret;
+ return migrate_vma_collect_hole(start, end, -1, walk);
+ }
+ if (pmd_write(*pmdp))
+ write = MIGRATE_PFN_WRITE;
+ } else if (!pmd_present(*pmdp)) {
+ swp_entry_t entry = pmd_to_swp_entry(*pmdp);
+
+ if (is_migration_entry(entry)) {
+ bool wait;
- get_page(page);
+ page = migration_entry_to_page(entry);
+ wait = get_page_unless_zero(page);
spin_unlock(ptl);
- if (unlikely(!trylock_page(page)))
- return migrate_vma_collect_skip(start, end,
- walk);
- ret = split_huge_page(page);
- unlock_page(page);
- put_page(page);
- if (ret)
- return migrate_vma_collect_skip(start, end,
- walk);
- if (pmd_none(*pmdp))
- return migrate_vma_collect_hole(start, end, -1,
- walk);
+ if (wait)
+ put_and_wait_on_page_locked(page);
+ return -EAGAIN;
+ }
+ if (!is_device_private_entry(entry)) {
+ spin_unlock(ptl);
+ return migrate_vma_collect_skip(start, end, walk);
+ }
+ page = device_private_entry_to_page(entry);
+ if (!(migrate->flags & MIGRATE_VMA_SELECT_DEVICE_PRIVATE) ||
+ page->pgmap->owner != migrate->pgmap_owner) {
+ spin_unlock(ptl);
+ return migrate_vma_collect_skip(start, end, walk);
}
+ if (is_write_device_private_entry(entry))
+ write = MIGRATE_PFN_WRITE;
+ } else {
+ spin_unlock(ptl);
+ return -EAGAIN;
+ }
+
+ get_page(page);
+ if (unlikely(!trylock_page(page))) {
+ spin_unlock(ptl);
+ put_page(page);
+ return migrate_vma_collect_skip(start, end, walk);
+ }
+ if (thp_migration_supported() &&
+ (migrate->flags & MIGRATE_VMA_SELECT_COMPOUND) &&
+ (start & ~PMD_MASK) == 0 && (start + PMD_SIZE) == end) {
+ struct page_vma_mapped_walk vmw = {
+ .vma = vma,
+ .address = start,
+ .pmd = pmdp,
+ .ptl = ptl,
+ };
+
+ migrate->src[migrate->npages] = write |
+ migrate_pfn(page_to_pfn(page)) |
+ MIGRATE_PFN_MIGRATE | MIGRATE_PFN_LOCKED |
+ MIGRATE_PFN_COMPOUND;
+ migrate->dst[migrate->npages] = 0;
+ migrate->npages++;
+ migrate->cpages++;
+ migrate_vma_collect_skip(start + PAGE_SIZE, end, walk);
+
+ /* Note this also removes the page from the rmap. */
+ set_pmd_migration_entry(&vmw, page);
+ spin_unlock(ptl);
+
+ return 0;
+ }
+ spin_unlock(ptl);
+
+ ret = split_huge_page(page);
+ unlock_page(page);
+ put_page(page);
+
+ if (ret)
+ return migrate_vma_collect_skip(start, end, walk);
+ if (pmd_none(*pmdp))
+ return migrate_vma_collect_hole(start, end, -1, walk);
+
+ /* This just causes migrate_vma_collect_pmd() to handle PTEs. */
+ return -ENOENT;
+}
+
+static int migrate_vma_collect_pmd(pmd_t *pmdp,
+ unsigned long start,
+ unsigned long end,
+ struct mm_walk *walk)
+{
+ struct migrate_vma *migrate = walk->private;
+ struct vm_area_struct *vma = walk->vma;
+ struct mm_struct *mm = vma->vm_mm;
+ unsigned long addr = start, unmapped = 0;
+ spinlock_t *ptl;
+ pte_t *ptep;
+
+again:
+ if (pmd_trans_huge(*pmdp) || !pmd_present(*pmdp)) {
+ int ret = migrate_vma_handle_pmd(pmdp, start, end, walk);
+
+ if (!ret)
+ return 0;
+ if (ret == -EAGAIN)
+ goto again;
}
- if (unlikely(pmd_bad(*pmdp)))
+ if (unlikely(pmd_bad(*pmdp) || pmd_devmap(*pmdp)))
return migrate_vma_collect_skip(start, end, walk);
ptep = pte_offset_map_lock(mm, pmdp, addr, &ptl);
@@ -2405,8 +2489,7 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp,
mpfn |= pte_write(pte) ? MIGRATE_PFN_WRITE : 0;
}
- /* FIXME support THP */
- if (!page || !page->mapping || PageTransCompound(page)) {
+ if (!page || !page->mapping) {
mpfn = 0;
goto next;
}
@@ -2528,14 +2611,6 @@ static bool migrate_vma_check_page(struct page *page)
*/
int extra = 1;
- /*
- * FIXME support THP (transparent huge page), it is bit more complex to
- * check them than regular pages, because they can be mapped with a pmd
- * or with a pte (split pte mapping).
- */
- if (PageCompound(page))
- return false;
-
/* Page from ZONE_DEVICE have one extra reference */
if (is_zone_device_page(page)) {
/*
@@ -2834,13 +2909,191 @@ int migrate_vma_setup(struct migrate_vma *args)
}
EXPORT_SYMBOL(migrate_vma_setup);
+static pmd_t *find_pmd(struct mm_struct *mm, unsigned long addr)
+{
+ pgd_t *pgdp;
+ p4d_t *p4dp;
+ pud_t *pudp;
+
+ pgdp = pgd_offset(mm, addr);
+ p4dp = p4d_alloc(mm, pgdp, addr);
+ if (!p4dp)
+ return NULL;
+ pudp = pud_alloc(mm, p4dp, addr);
+ if (!pudp)
+ return NULL;
+ return pmd_alloc(mm, pudp, addr);
+}
+
+#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION
+/*
+ * This code closely follows:
+ * do_huge_pmd_anonymous_page()
+ * __do_huge_pmd_anonymous_page()
+ * except that the page being inserted is likely to be a device private page
+ * instead of an allocated or zero page.
+ */
+static int insert_huge_pmd_anonymous_page(struct vm_area_struct *vma,
+ unsigned long haddr,
+ struct page *page,
+ unsigned long *src,
+ pmd_t *pmdp)
+{
+ struct mm_struct *mm = vma->vm_mm;
+ unsigned int i;
+ spinlock_t *ptl;
+ bool flush = false;
+ pgtable_t pgtable;
+ gfp_t gfp;
+ pmd_t entry;
+
+ if (WARN_ON_ONCE(compound_order(page) != HPAGE_PMD_ORDER))
+ goto abort;
+
+ if (unlikely(anon_vma_prepare(vma)))
+ goto abort;
+
+ prep_transhuge_page(page);
+
+ gfp = GFP_TRANSHUGE_LIGHT;
+ if (mem_cgroup_charge(page, mm, gfp))
+ goto abort;
+
+ pgtable = pte_alloc_one(mm);
+ if (unlikely(!pgtable))
+ goto abort;
+
+ __SetPageUptodate(page);
+
+ if (is_zone_device_page(page)) {
+ if (!is_device_private_page(page))
+ goto pgtable_abort;
+ entry = swp_entry_to_pmd(make_device_private_entry(page,
+ vma->vm_flags & VM_WRITE));
+ } else {
+ entry = mk_huge_pmd(page, vma->vm_page_prot);
+ entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma);
+ }
+
+ ptl = pmd_lock(mm, pmdp);
+
+ if (check_stable_address_space(mm))
+ goto unlock_abort;
+
+ /*
+ * Check for userfaultfd but do not deliver the fault. Instead,
+ * just back off.
+ */
+ if (userfaultfd_missing(vma))
+ goto unlock_abort;
+
+ if (pmd_present(*pmdp)) {
+ if (!is_huge_zero_pmd(*pmdp))
+ goto unlock_abort;
+ flush = true;
+ } else if (!pmd_none(*pmdp))
+ goto unlock_abort;
+
+ get_page(page);
+ page_add_new_anon_rmap(page, vma, haddr, true);
+ if (!is_zone_device_page(page))
+ lru_cache_add_inactive_or_unevictable(page, vma);
+ if (flush) {
+ pte_free(mm, pgtable);
+ flush_cache_range(vma, haddr, haddr + HPAGE_PMD_SIZE);
+ pmdp_invalidate(vma, haddr, pmdp);
+ } else {
+ pgtable_trans_huge_deposit(mm, pmdp, pgtable);
+ mm_inc_nr_ptes(mm);
+ }
+ set_pmd_at(mm, haddr, pmdp, entry);
+ update_mmu_cache_pmd(vma, haddr, pmdp);
+ add_mm_counter(mm, MM_ANONPAGES, HPAGE_PMD_NR);
+ spin_unlock(ptl);
+ count_vm_event(THP_FAULT_ALLOC);
+ count_memcg_event_mm(mm, THP_FAULT_ALLOC);
+
+ return 0;
+
+unlock_abort:
+ spin_unlock(ptl);
+pgtable_abort:
+ pte_free(mm, pgtable);
+abort:
+ for (i = 0; i < HPAGE_PMD_NR; i++)
+ src[i] &= ~MIGRATE_PFN_MIGRATE;
+ return -EINVAL;
+}
+
+static void migrate_vma_split(struct migrate_vma *migrate, unsigned long i,
+ unsigned long addr)
+{
+ const unsigned long npages = i + HPAGE_PMD_NR;
+ unsigned long mpfn;
+ unsigned long j;
+ bool migrating = false;
+ struct page *page;
+
+ migrate->src[i] &= ~MIGRATE_PFN_COMPOUND;
+
+ /* If no part of the THP is migrating, we can skip splitting. */
+ for (j = i; j < npages; j++) {
+ if (migrate->dst[j] & MIGRATE_PFN_VALID) {
+ migrating = true;
+ break;
+ }
+ }
+ if (!migrating)
+ return;
+
+ mpfn = migrate->src[i];
+ page = migrate_pfn_to_page(mpfn);
+ if (page) {
+ pmd_t *pmdp;
+ int ret;
+
+ pmdp = find_pmd(migrate->vma->vm_mm, addr);
+ if (!pmdp) {
+ migrate->src[i] = mpfn & ~MIGRATE_PFN_MIGRATE;
+ return;
+ }
+ ret = split_migrating_huge_page(migrate->vma, pmdp, addr, page);
+ if (ret) {
+ migrate->src[i] = mpfn & ~MIGRATE_PFN_MIGRATE;
+ return;
+ }
+ while (++i < npages) {
+ mpfn += 1UL << MIGRATE_PFN_SHIFT;
+ migrate->src[i] = mpfn;
+ }
+ } else {
+ while (++i < npages)
+ migrate->src[i] = mpfn;
+ }
+}
+#else
+static int insert_huge_pmd_anonymous_page(struct vm_area_struct *vma,
+ unsigned long haddr,
+ struct page *page,
+ unsigned long *src,
+ pmd_t *pmdp)
+{
+ return 0;
+}
+
+static void migrate_vma_split(struct migrate_vma *migrate, unsigned long i,
+ unsigned long addr)
+{
+}
+#endif
+
/*
* This code closely matches the code in:
* __handle_mm_fault()
* handle_pte_fault()
* do_anonymous_page()
- * to map in an anonymous zero page but the struct page will be a ZONE_DEVICE
- * private page.
+ * to map in an anonymous zero page except the struct page is already allocated
+ * and will likely be a ZONE_DEVICE private page.
*/
static void migrate_vma_insert_page(struct migrate_vma *migrate,
unsigned long addr,
@@ -2853,9 +3106,6 @@ static void migrate_vma_insert_page(struct migrate_vma
*migrate,
bool flush = false;
spinlock_t *ptl;
pte_t entry;
- pgd_t *pgdp;
- p4d_t *p4dp;
- pud_t *pudp;
pmd_t *pmdp;
pte_t *ptep;
@@ -2863,19 +3113,25 @@ static void migrate_vma_insert_page(struct migrate_vma
*migrate,
if (!vma_is_anonymous(vma))
goto abort;
- pgdp = pgd_offset(mm, addr);
- p4dp = p4d_alloc(mm, pgdp, addr);
- if (!p4dp)
- goto abort;
- pudp = pud_alloc(mm, p4dp, addr);
- if (!pudp)
- goto abort;
- pmdp = pmd_alloc(mm, pudp, addr);
+ pmdp = find_pmd(migrate->vma->vm_mm, addr);
if (!pmdp)
goto abort;
- if (pmd_trans_huge(*pmdp) || pmd_devmap(*pmdp))
- goto abort;
+ if (thp_migration_supported() && *dst & MIGRATE_PFN_COMPOUND) {
+ int ret = insert_huge_pmd_anonymous_page(vma, addr, page, src,
+ pmdp);
+ if (ret)
+ goto abort;
+ return;
+ }
+ if (!pmd_none(*pmdp)) {
+ if (pmd_trans_huge(*pmdp)) {
+ if (!is_huge_zero_pmd(*pmdp))
+ goto abort;
+ __split_huge_pmd(vma, pmdp, addr, false, NULL);
+ } else if (pmd_leaf(*pmdp))
+ goto abort;
+ }
/*
* Use pte_alloc() instead of pte_alloc_map(). We can't run
@@ -2910,9 +3166,11 @@ static void migrate_vma_insert_page(struct migrate_vma
*migrate,
if (is_device_private_page(page)) {
swp_entry_t swp_entry;
- swp_entry = make_device_private_entry(page, vma->vm_flags &
VM_WRITE);
+ swp_entry = make_device_private_entry(page,
+ vma->vm_flags & VM_WRITE);
entry = swp_entry_to_pte(swp_entry);
- }
+ } else
+ goto abort;
} else {
entry = mk_pte(page, vma->vm_page_prot);
if (vma->vm_flags & VM_WRITE)
@@ -2941,10 +3199,10 @@ static void migrate_vma_insert_page(struct migrate_vma
*migrate,
goto unlock_abort;
inc_mm_counter(mm, MM_ANONPAGES);
+ get_page(page);
page_add_new_anon_rmap(page, vma, addr, false);
if (!is_zone_device_page(page))
lru_cache_add_inactive_or_unevictable(page, vma);
- get_page(page);
if (flush) {
flush_cache_page(vma, addr, pte_pfn(*ptep));
@@ -2958,7 +3216,6 @@ static void migrate_vma_insert_page(struct migrate_vma
*migrate,
}
pte_unmap_unlock(ptep, ptl);
- *src = MIGRATE_PFN_MIGRATE;
return;
unlock_abort:
@@ -2989,11 +3246,23 @@ void migrate_vma_pages(struct migrate_vma *migrate)
struct address_space *mapping;
int r;
+ /*
+ * If the caller didn't allocate a THP, split the PMD and
+ * fix up the src array.
+ */
+ if (thp_migration_supported() &&
+ (migrate->src[i] & MIGRATE_PFN_MIGRATE) &&
+ (migrate->src[i] & MIGRATE_PFN_COMPOUND) &&
+ !(migrate->dst[i] & MIGRATE_PFN_COMPOUND))
+ migrate_vma_split(migrate, i, addr);
+
+ newpage = migrate_pfn_to_page(migrate->dst[i]);
if (!newpage) {
migrate->src[i] &= ~MIGRATE_PFN_MIGRATE;
continue;
}
+ page = migrate_pfn_to_page(migrate->src[i]);
if (!page) {
if (!(migrate->src[i] & MIGRATE_PFN_MIGRATE))
continue;
diff --git a/mm/rmap.c b/mm/rmap.c
index 9425260774a1..4a4d24ea5008 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1497,7 +1497,7 @@ static bool try_to_unmap_one(struct page *page, struct
vm_area_struct *vma,
}
if (IS_ENABLED(CONFIG_MIGRATION) &&
- (flags & TTU_MIGRATION) &&
+ (flags & (TTU_MIGRATION | TTU_SPLIT_FREEZE)) &&
is_zone_device_page(page)) {
swp_entry_t entry;
pte_t swp_pte;
--
2.20.1
Ralph Campbell
2020-Sep-02 16:58 UTC
[Nouveau] [PATCH v2 4/7] mm/thp: add prep_transhuge_device_private_page()
Add a helper function to allow device drivers to create device private
transparent huge pages. This is intended to help support device private
THP migrations.
Signed-off-by: Ralph Campbell <rcampbell at nvidia.com>
---
include/linux/huge_mm.h | 5 +++++
mm/huge_memory.c | 8 ++++++++
2 files changed, 13 insertions(+)
diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 87b42c81dedc..126e54da4fee 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -187,6 +187,7 @@ extern unsigned long thp_get_unmapped_area(struct file
*filp,
unsigned long flags);
extern void prep_transhuge_page(struct page *page);
+extern void prep_transhuge_device_private_page(struct page *page);
extern void free_transhuge_page(struct page *page);
bool is_transparent_hugepage(struct page *page);
@@ -382,6 +383,10 @@ static inline bool transhuge_vma_suitable(struct
vm_area_struct *vma,
static inline void prep_transhuge_page(struct page *page) {}
+static inline void prep_transhuge_device_private_page(struct page *page)
+{
+}
+
static inline bool is_transparent_hugepage(struct page *page)
{
return false;
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index a8d48994481a..1e848cc0c3dc 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -498,6 +498,14 @@ void prep_transhuge_page(struct page *page)
set_compound_page_dtor(page, TRANSHUGE_PAGE_DTOR);
}
+void prep_transhuge_device_private_page(struct page *page)
+{
+ prep_compound_page(page, HPAGE_PMD_ORDER);
+ prep_transhuge_page(page);
+ percpu_ref_put_many(page->pgmap->ref, HPAGE_PMD_NR - 1);
+}
+EXPORT_SYMBOL_GPL(prep_transhuge_device_private_page);
+
bool is_transparent_hugepage(struct page *page)
{
if (!PageCompound(page))
--
2.20.1
Ralph Campbell
2020-Sep-02 16:58 UTC
[Nouveau] [PATCH v2 5/7] mm/thp: add THP allocation helper
Transparent huge page allocation policy is controlled by several sysfs
variables. Rather than expose these to each device driver that needs to
allocate THPs, provide a helper function.
Signed-off-by: Ralph Campbell <rcampbell at nvidia.com>
---
include/linux/gfp.h | 10 ++++++++++
mm/huge_memory.c | 14 ++++++++++++++
2 files changed, 24 insertions(+)
diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 67a0774e080b..6faf4ea5501b 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -562,6 +562,16 @@ extern struct page *alloc_pages_vma(gfp_t gfp_mask, int
order,
alloc_pages_vma(gfp_mask, 0, vma, addr, numa_node_id(), false)
#define alloc_page_vma_node(gfp_mask, vma, addr, node) \
alloc_pages_vma(gfp_mask, 0, vma, addr, node, false)
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+extern struct page *alloc_transhugepage(struct vm_area_struct *vma,
+ unsigned long addr);
+#else
+static inline struct page *alloc_transhugepage(struct vm_area_struct *vma,
+ unsigned long addr)
+{
+ return NULL;
+}
+#endif
extern unsigned long __get_free_pages(gfp_t gfp_mask, unsigned int order);
extern unsigned long get_zeroed_page(gfp_t gfp_mask);
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 1e848cc0c3dc..e4e1fe199dc1 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -764,6 +764,20 @@ vm_fault_t do_huge_pmd_anonymous_page(struct vm_fault *vmf)
return __do_huge_pmd_anonymous_page(vmf, page, gfp);
}
+struct page *alloc_transhugepage(struct vm_area_struct *vma,
+ unsigned long haddr)
+{
+ gfp_t gfp;
+ struct page *page;
+
+ gfp = alloc_hugepage_direct_gfpmask(vma);
+ page = alloc_hugepage_vma(gfp, vma, haddr, HPAGE_PMD_ORDER);
+ if (page)
+ prep_transhuge_page(page);
+ return page;
+}
+EXPORT_SYMBOL_GPL(alloc_transhugepage);
+
static void insert_pfn_pmd(struct vm_area_struct *vma, unsigned long addr,
pmd_t *pmd, pfn_t pfn, pgprot_t prot, bool write,
pgtable_t pgtable)
--
2.20.1
Ralph Campbell
2020-Sep-02 16:58 UTC
[Nouveau] [PATCH v2 6/7] mm/hmm/test: add self tests for THP migration
Add some basic stand alone self tests for migrating system memory to device
private memory and back.
Signed-off-by: Ralph Campbell <rcampbell at nvidia.com>
---
lib/test_hmm.c | 439 +++++++++++++++++++++----
lib/test_hmm_uapi.h | 3 +
tools/testing/selftests/vm/hmm-tests.c | 404 +++++++++++++++++++++++
3 files changed, 777 insertions(+), 69 deletions(-)
diff --git a/lib/test_hmm.c b/lib/test_hmm.c
index e3065d6123f0..41c005c55bcf 100644
--- a/lib/test_hmm.c
+++ b/lib/test_hmm.c
@@ -67,6 +67,7 @@ struct dmirror {
struct xarray pt;
struct mmu_interval_notifier notifier;
struct mutex mutex;
+ __u64 flags;
};
/*
@@ -92,6 +93,7 @@ struct dmirror_device {
unsigned long calloc;
unsigned long cfree;
struct page *free_pages;
+ struct page *free_huge_pages;
spinlock_t lock; /* protects the above */
};
@@ -451,6 +453,7 @@ static int dmirror_write(struct dmirror *dmirror, struct
hmm_dmirror_cmd *cmd)
}
static bool dmirror_allocate_chunk(struct dmirror_device *mdevice,
+ bool is_huge,
struct page **ppage)
{
struct dmirror_chunk *devmem;
@@ -504,28 +507,51 @@ static bool dmirror_allocate_chunk(struct dmirror_device
*mdevice,
mutex_unlock(&mdevice->devmem_lock);
- pr_info("added new %u MB chunk (total %u chunks, %u MB) PFNs [0x%lx
0x%lx)\n",
+ pr_info("dev %u added %u MB (total %u chunks, %u MB) PFNs [0x%lx
0x%lx)\n",
+ MINOR(mdevice->cdevice.dev),
DEVMEM_CHUNK_SIZE / (1024 * 1024),
mdevice->devmem_count,
mdevice->devmem_count * (DEVMEM_CHUNK_SIZE / (1024 * 1024)),
pfn_first, pfn_last);
spin_lock(&mdevice->lock);
- for (pfn = pfn_first; pfn < pfn_last; pfn++) {
+ for (pfn = pfn_first; pfn < pfn_last; ) {
struct page *page = pfn_to_page(pfn);
+ if (is_huge && (pfn & (HPAGE_PMD_NR - 1)) == 0 &&
+ pfn + HPAGE_PMD_NR <= pfn_last) {
+ prep_transhuge_device_private_page(page);
+ page->zone_device_data = mdevice->free_huge_pages;
+ mdevice->free_huge_pages = page;
+ pfn += HPAGE_PMD_NR;
+ continue;
+ }
page->zone_device_data = mdevice->free_pages;
mdevice->free_pages = page;
+ pfn++;
}
if (ppage) {
- *ppage = mdevice->free_pages;
- mdevice->free_pages = (*ppage)->zone_device_data;
- mdevice->calloc++;
+ if (is_huge) {
+ if (!mdevice->free_huge_pages)
+ goto err_unlock;
+ *ppage = mdevice->free_huge_pages;
+ mdevice->free_huge_pages = (*ppage)->zone_device_data;
+ mdevice->calloc += compound_nr(*ppage);
+ } else if (mdevice->free_pages) {
+ *ppage = mdevice->free_pages;
+ mdevice->free_pages = (*ppage)->zone_device_data;
+ mdevice->calloc++;
+ } else
+ goto err_unlock;
}
spin_unlock(&mdevice->lock);
return true;
+err_unlock:
+ spin_unlock(&mdevice->lock);
+ return false;
+
err_free:
kfree(devmem);
err_release:
@@ -535,7 +561,8 @@ static bool dmirror_allocate_chunk(struct dmirror_device
*mdevice,
return false;
}
-static struct page *dmirror_devmem_alloc_page(struct dmirror_device *mdevice)
+static struct page *dmirror_devmem_alloc_page(struct dmirror_device *mdevice,
+ bool is_huge)
{
struct page *dpage = NULL;
struct page *rpage;
@@ -550,17 +577,40 @@ static struct page *dmirror_devmem_alloc_page(struct
dmirror_device *mdevice)
spin_lock(&mdevice->lock);
- if (mdevice->free_pages) {
+ if (is_huge && mdevice->free_huge_pages) {
+ dpage = mdevice->free_huge_pages;
+ mdevice->free_huge_pages = dpage->zone_device_data;
+ mdevice->calloc += compound_nr(dpage);
+ spin_unlock(&mdevice->lock);
+ } else if (!is_huge && mdevice->free_pages) {
dpage = mdevice->free_pages;
mdevice->free_pages = dpage->zone_device_data;
mdevice->calloc++;
spin_unlock(&mdevice->lock);
} else {
spin_unlock(&mdevice->lock);
- if (!dmirror_allocate_chunk(mdevice, &dpage))
+ if (!dmirror_allocate_chunk(mdevice, is_huge, &dpage))
goto error;
}
+ if (is_huge) {
+ unsigned int nr_pages = compound_nr(dpage);
+ unsigned int i;
+ struct page **tpage;
+
+ tpage = kmap(rpage);
+ for (i = 0; i < nr_pages; i++, tpage++) {
+ *tpage = alloc_page(GFP_HIGHUSER);
+ if (!*tpage) {
+ while (i--)
+ __free_page(*--tpage);
+ kunmap(rpage);
+ goto error;
+ }
+ }
+ kunmap(rpage);
+ }
+
dpage->zone_device_data = rpage;
get_page(dpage);
lock_page(dpage);
@@ -571,22 +621,26 @@ static struct page *dmirror_devmem_alloc_page(struct
dmirror_device *mdevice)
return NULL;
}
-static void dmirror_migrate_alloc_and_copy(struct migrate_vma *args,
- struct dmirror *dmirror)
+static int dmirror_migrate_alloc_and_copy(struct migrate_vma *args,
+ struct dmirror *dmirror)
{
struct dmirror_device *mdevice = dmirror->mdevice;
const unsigned long *src = args->src;
unsigned long *dst = args->dst;
- unsigned long addr;
+ unsigned long end_pfn = args->end >> PAGE_SHIFT;
+ unsigned long pfn;
- for (addr = args->start; addr < args->end; addr += PAGE_SIZE,
- src++, dst++) {
+ for (pfn = args->start >> PAGE_SHIFT; pfn < end_pfn; ) {
struct page *spage;
struct page *dpage;
struct page *rpage;
+ bool is_huge;
+ unsigned long write;
+ struct page **tpage;
+ unsigned long endp;
if (!(*src & MIGRATE_PFN_MIGRATE))
- continue;
+ goto next;
/*
* Note that spage might be NULL which is OK since it is an
@@ -594,15 +648,39 @@ static void dmirror_migrate_alloc_and_copy(struct
migrate_vma *args,
*/
spage = migrate_pfn_to_page(*src);
- dpage = dmirror_devmem_alloc_page(mdevice);
- if (!dpage)
+ /* This flag is only set if a whole huge page is migrated. */
+ is_huge = *src & MIGRATE_PFN_COMPOUND;
+ write = (*src & MIGRATE_PFN_WRITE) ? MIGRATE_PFN_WRITE : 0;
+
+ if (dmirror->flags & HMM_DMIRROR_FLAG_FAIL_ALLOC) {
+ dmirror->flags &= ~HMM_DMIRROR_FLAG_FAIL_ALLOC;
+ dpage = NULL;
+ } else
+ dpage = dmirror_devmem_alloc_page(mdevice, is_huge);
+ if (!dpage) {
+ if (!is_huge)
+ return -ENOMEM;
+ /* Try falling back to PAGE_SIZE pages. */
+ endp = pfn + 512; // XXX
+ while (pfn < endp) {
+ dpage = dmirror_devmem_alloc_page(mdevice,
+ false);
+ if (!dpage)
+ return -ENOMEM;
+ rpage = dpage->zone_device_data;
+ rpage->zone_device_data = dmirror;
+ *dst = migrate_pfn(page_to_pfn(dpage)) |
+ MIGRATE_PFN_LOCKED | write;
+ if (spage)
+ copy_highpage(rpage, spage++);
+ else
+ clear_highpage(rpage);
+ pfn++;
+ src++;
+ dst++;
+ }
continue;
-
- rpage = dpage->zone_device_data;
- if (spage)
- copy_highpage(rpage, spage);
- else
- clear_highpage(rpage);
+ }
/*
* Normally, a device would use the page->zone_device_data to
@@ -610,14 +688,40 @@ static void dmirror_migrate_alloc_and_copy(struct
migrate_vma *args,
* the simulated device memory and that page holds the pointer
* to the mirror.
*/
+ rpage = dpage->zone_device_data;
rpage->zone_device_data = dmirror;
- *dst = migrate_pfn(page_to_pfn(dpage)) |
- MIGRATE_PFN_LOCKED;
- if ((*src & MIGRATE_PFN_WRITE) ||
- (!spage && args->vma->vm_flags & VM_WRITE))
- *dst |= MIGRATE_PFN_WRITE;
+ *dst = migrate_pfn(page_to_pfn(dpage)) | MIGRATE_PFN_LOCKED |
+ write;
+
+ if (is_huge) {
+ endp = pfn + compound_nr(dpage);
+ *dst |= MIGRATE_PFN_COMPOUND;
+ tpage = kmap(rpage);
+ while (pfn < endp) {
+ if (spage)
+ copy_highpage(*tpage, spage++);
+ else
+ clear_highpage(*tpage);
+ tpage++;
+ pfn++;
+ src++;
+ dst++;
+ }
+ kunmap(rpage);
+ continue;
+ }
+
+ if (spage)
+ copy_highpage(rpage, spage);
+ else
+ clear_highpage(rpage);
+next:
+ pfn++;
+ src++;
+ dst++;
}
+ return 0;
}
static int dmirror_migrate_finalize_and_map(struct migrate_vma *args,
@@ -628,38 +732,75 @@ static int dmirror_migrate_finalize_and_map(struct
migrate_vma *args,
const unsigned long *src = args->src;
const unsigned long *dst = args->dst;
unsigned long pfn;
+ int ret = 0;
/* Map the migrated pages into the device's page tables. */
mutex_lock(&dmirror->mutex);
- for (pfn = start >> PAGE_SHIFT; pfn < (end >> PAGE_SHIFT);
pfn++,
- src++, dst++) {
+ for (pfn = start >> PAGE_SHIFT; pfn < (end >> PAGE_SHIFT); ) {
+ unsigned long mpfn;
struct page *dpage;
+ struct page *rpage;
void *entry;
if (!(*src & MIGRATE_PFN_MIGRATE))
- continue;
+ goto next;
- dpage = migrate_pfn_to_page(*dst);
+ mpfn = *dst;
+ dpage = migrate_pfn_to_page(mpfn);
if (!dpage)
- continue;
+ goto next;
/*
* Store the page that holds the data so the page table
* doesn't have to deal with ZONE_DEVICE private pages.
*/
- entry = dpage->zone_device_data;
- if (*dst & MIGRATE_PFN_WRITE)
+ rpage = dpage->zone_device_data;
+ if (mpfn & MIGRATE_PFN_COMPOUND) {
+ struct page **tpage;
+ unsigned long end_pfn = pfn + compound_nr(dpage);
+
+ ret = 0;
+ tpage = kmap(rpage);
+ while (pfn < end_pfn) {
+ entry = *tpage;
+ if (mpfn & MIGRATE_PFN_WRITE)
+ entry = xa_tag_pointer(entry,
+ DPT_XA_TAG_WRITE);
+ entry = xa_store(&dmirror->pt, pfn, entry,
+ GFP_KERNEL);
+ if (xa_is_err(entry)) {
+ ret = xa_err(entry);
+ break;
+ }
+ tpage++;
+ pfn++;
+ src++;
+ dst++;
+ }
+ kunmap(rpage);
+ if (ret)
+ goto err;
+ continue;
+ }
+
+ entry = rpage;
+ if (mpfn & MIGRATE_PFN_WRITE)
entry = xa_tag_pointer(entry, DPT_XA_TAG_WRITE);
entry = xa_store(&dmirror->pt, pfn, entry, GFP_ATOMIC);
if (xa_is_err(entry)) {
- mutex_unlock(&dmirror->mutex);
- return xa_err(entry);
+ ret = xa_err(entry);
+ goto err;
}
+next:
+ pfn++;
+ src++;
+ dst++;
}
+err:
mutex_unlock(&dmirror->mutex);
- return 0;
+ return ret;
}
static int dmirror_migrate(struct dmirror *dmirror,
@@ -669,8 +810,8 @@ static int dmirror_migrate(struct dmirror *dmirror,
unsigned long size = cmd->npages << PAGE_SHIFT;
struct mm_struct *mm = dmirror->notifier.mm;
struct vm_area_struct *vma;
- unsigned long src_pfns[64];
- unsigned long dst_pfns[64];
+ unsigned long *src_pfns;
+ unsigned long *dst_pfns;
struct dmirror_bounce bounce;
struct migrate_vma args;
unsigned long next;
@@ -685,6 +826,17 @@ static int dmirror_migrate(struct dmirror *dmirror,
if (!mmget_not_zero(mm))
return -EINVAL;
+ src_pfns = kmalloc_array(PTRS_PER_PTE, sizeof(*src_pfns), GFP_KERNEL);
+ if (!src_pfns) {
+ ret = -ENOMEM;
+ goto out_put;
+ }
+ dst_pfns = kmalloc_array(PTRS_PER_PTE, sizeof(*dst_pfns), GFP_KERNEL);
+ if (!dst_pfns) {
+ ret = -ENOMEM;
+ goto out_free_src;
+ }
+
mmap_read_lock(mm);
for (addr = start; addr < end; addr = next) {
vma = find_vma(mm, addr);
@@ -693,7 +845,7 @@ static int dmirror_migrate(struct dmirror *dmirror,
ret = -EINVAL;
goto out;
}
- next = min(end, addr + (ARRAY_SIZE(src_pfns) << PAGE_SHIFT));
+ next = min(end, addr + (PTRS_PER_PTE << PAGE_SHIFT));
if (next > vma->vm_end)
next = vma->vm_end;
@@ -703,17 +855,24 @@ static int dmirror_migrate(struct dmirror *dmirror,
args.start = addr;
args.end = next;
args.pgmap_owner = dmirror->mdevice;
- args.flags = MIGRATE_VMA_SELECT_SYSTEM;
+ args.flags = MIGRATE_VMA_SELECT_SYSTEM |
+ MIGRATE_VMA_SELECT_COMPOUND;
ret = migrate_vma_setup(&args);
if (ret)
goto out;
- dmirror_migrate_alloc_and_copy(&args, dmirror);
- migrate_vma_pages(&args);
- dmirror_migrate_finalize_and_map(&args, dmirror);
+ ret = dmirror_migrate_alloc_and_copy(&args, dmirror);
+ if (!ret) {
+ migrate_vma_pages(&args);
+ dmirror_migrate_finalize_and_map(&args, dmirror);
+ }
migrate_vma_finalize(&args);
+ if (ret)
+ goto out;
}
mmap_read_unlock(mm);
+ kfree(dst_pfns);
+ kfree(src_pfns);
mmput(mm);
/* Return the migrated data for verification. */
@@ -734,6 +893,10 @@ static int dmirror_migrate(struct dmirror *dmirror,
out:
mmap_read_unlock(mm);
+ kfree(dst_pfns);
+out_free_src:
+ kfree(src_pfns);
+out_put:
mmput(mm);
return ret;
}
@@ -954,6 +1117,11 @@ static long dmirror_fops_unlocked_ioctl(struct file *filp,
ret = dmirror_snapshot(dmirror, &cmd);
break;
+ case HMM_DMIRROR_FLAGS:
+ dmirror->flags = cmd.npages;
+ ret = 0;
+ break;
+
default:
return -EINVAL;
}
@@ -977,22 +1145,72 @@ static const struct file_operations dmirror_fops = {
static void dmirror_devmem_free(struct page *page)
{
struct page *rpage = page->zone_device_data;
+ unsigned int order = compound_order(page);
+ unsigned int nr_pages = 1U << order;
struct dmirror_device *mdevice;
- if (rpage)
+ VM_BUG_ON_PAGE(PageTail(page), page);
+
+ if (rpage) {
+ if (order) {
+ unsigned int i;
+ struct page **tpage;
+ void *kaddr;
+
+ kaddr = kmap_atomic(rpage);
+ tpage = kaddr;
+ for (i = 0; i < nr_pages; i++, tpage++)
+ __free_page(*tpage);
+ kunmap_atomic(kaddr);
+ }
__free_page(rpage);
+ }
mdevice = dmirror_page_to_device(page);
spin_lock(&mdevice->lock);
- mdevice->cfree++;
- page->zone_device_data = mdevice->free_pages;
- mdevice->free_pages = page;
+ if (order) {
+ page->zone_device_data = mdevice->free_huge_pages;
+ mdevice->free_huge_pages = page;
+ } else {
+ page->zone_device_data = mdevice->free_pages;
+ mdevice->free_pages = page;
+ }
+ mdevice->cfree += nr_pages;
spin_unlock(&mdevice->lock);
}
+static void dmirror_devmem_split(struct page *head, struct page *page)
+{
+ struct page *rpage = head->zone_device_data;
+ unsigned long i;
+ struct page **tpage;
+ void *kaddr;
+
+ page->pgmap = head->pgmap;
+ i = page - head;
+ if (i == 1)
+ percpu_ref_get_many(page->pgmap->ref, HPAGE_PMD_NR - 1);
+
+ if (!rpage) {
+ page->zone_device_data = NULL;
+ return;
+ }
+
+ kaddr = kmap_atomic(rpage);
+ tpage = kaddr;
+ page->zone_device_data = tpage[i];
+ if (i == 1) {
+ head->zone_device_data = tpage[0];
+ kunmap_atomic(kaddr);
+ __free_page(rpage);
+ } else
+ kunmap_atomic(kaddr);
+}
+
static vm_fault_t dmirror_devmem_fault_alloc_and_copy(struct migrate_vma *args,
- struct dmirror *dmirror)
+ struct dmirror *dmirror,
+ unsigned long fault_addr)
{
const unsigned long *src = args->src;
unsigned long *dst = args->dst;
@@ -1000,25 +1218,71 @@ static vm_fault_t
dmirror_devmem_fault_alloc_and_copy(struct migrate_vma *args,
unsigned long end = args->end;
unsigned long addr;
- for (addr = start; addr < end; addr += PAGE_SIZE,
- src++, dst++) {
- struct page *dpage, *spage;
+ for (addr = start; addr < end; ) {
+ struct page *spage, *dpage;
+ unsigned int order = 0;
+ unsigned int nr_pages = 1;
+ struct page **tpage;
+ unsigned int i;
spage = migrate_pfn_to_page(*src);
if (!spage || !(*src & MIGRATE_PFN_MIGRATE))
- continue;
+ goto next;
+ order = compound_order(spage);
+ nr_pages = 1U << order;
+ /* The source page is the ZONE_DEVICE private page. */
spage = spage->zone_device_data;
- dpage = alloc_page_vma(GFP_HIGHUSER_MOVABLE, args->vma, addr);
- if (!dpage)
- continue;
+ if (dmirror->flags & HMM_DMIRROR_FLAG_FAIL_ALLOC) {
+ dmirror->flags &= ~HMM_DMIRROR_FLAG_FAIL_ALLOC;
+ dpage = NULL;
+ } else if (order)
+ dpage = alloc_transhugepage(args->vma, addr);
+ else
+ dpage = alloc_pages_vma(GFP_HIGHUSER_MOVABLE, 0,
+ args->vma, addr,
+ numa_node_id(), false);
+ if (!dpage) {
+ if (!order)
+ return VM_FAULT_OOM;
+ /* Try falling back to PAGE_SIZE pages. */
+ dpage = alloc_pages_vma(GFP_HIGHUSER_MOVABLE, 0,
+ args->vma, addr,
+ numa_node_id(), false);
+ if (!dpage)
+ return VM_FAULT_OOM;
+ lock_page(dpage);
+ xa_erase(&dmirror->pt, fault_addr >> PAGE_SHIFT);
+ i = (fault_addr - start) >> PAGE_SHIFT;
+ dst[i] = migrate_pfn(page_to_pfn(dpage)) | MIGRATE_PFN_LOCKED;
+ if (*src & MIGRATE_PFN_WRITE)
+ dst[i] |= MIGRATE_PFN_WRITE;
+ tpage = kmap(spage);
+ copy_highpage(dpage, tpage[i]);
+ kunmap(spage);
+ goto next;
+ }
lock_page(dpage);
xa_erase(&dmirror->pt, addr >> PAGE_SHIFT);
- copy_highpage(dpage, spage);
*dst = migrate_pfn(page_to_pfn(dpage)) | MIGRATE_PFN_LOCKED;
if (*src & MIGRATE_PFN_WRITE)
*dst |= MIGRATE_PFN_WRITE;
+ if (order) {
+ *dst |= MIGRATE_PFN_COMPOUND;
+ tpage = kmap(spage);
+ for (i = 0; i < nr_pages; i++) {
+ copy_highpage(dpage, *tpage);
+ tpage++;
+ dpage++;
+ }
+ kunmap(spage);
+ } else
+ copy_highpage(dpage, spage);
+next:
+ addr += PAGE_SIZE << order;
+ src += nr_pages;
+ dst += nr_pages;
}
return 0;
}
@@ -1028,33 +1292,55 @@ static vm_fault_t dmirror_devmem_fault(struct vm_fault
*vmf)
struct migrate_vma args;
unsigned long src_pfns;
unsigned long dst_pfns;
+ struct page *page;
struct page *rpage;
+ unsigned int order;
struct dmirror *dmirror;
vm_fault_t ret;
+ page = compound_head(vmf->page);
+ order = compound_order(page);
+
/*
* Normally, a device would use the page->zone_device_data to point to
* the mirror but here we use it to hold the page for the simulated
* device memory and that page holds the pointer to the mirror.
*/
- rpage = vmf->page->zone_device_data;
+ rpage = page->zone_device_data;
dmirror = rpage->zone_device_data;
- /* FIXME demonstrate how we can adjust migrate range */
+ if (order) {
+ args.start = vmf->address & (PAGE_MASK << order);
+ args.end = args.start + (PAGE_SIZE << order);
+ args.src = kcalloc(PTRS_PER_PTE, sizeof(*args.src),
+ GFP_KERNEL);
+ if (!args.src)
+ return VM_FAULT_OOM;
+ args.dst = kcalloc(PTRS_PER_PTE, sizeof(*args.dst),
+ GFP_KERNEL);
+ if (!args.dst) {
+ ret = VM_FAULT_OOM;
+ goto error_src;
+ }
+ } else {
+ args.start = vmf->address;
+ args.end = args.start + PAGE_SIZE;
+ args.src = &src_pfns;
+ args.dst = &dst_pfns;
+ }
args.vma = vmf->vma;
- args.start = vmf->address;
- args.end = args.start + PAGE_SIZE;
- args.src = &src_pfns;
- args.dst = &dst_pfns;
args.pgmap_owner = dmirror->mdevice;
- args.flags = MIGRATE_VMA_SELECT_DEVICE_PRIVATE;
+ args.flags = MIGRATE_VMA_SELECT_DEVICE_PRIVATE |
+ MIGRATE_VMA_SELECT_COMPOUND;
- if (migrate_vma_setup(&args))
- return VM_FAULT_SIGBUS;
+ if (migrate_vma_setup(&args)) {
+ ret = VM_FAULT_SIGBUS;
+ goto error_dst;
+ }
- ret = dmirror_devmem_fault_alloc_and_copy(&args, dmirror);
+ ret = dmirror_devmem_fault_alloc_and_copy(&args, dmirror,
vmf->address);
if (ret)
- return ret;
+ goto error_fin;
migrate_vma_pages(&args);
/*
* No device finalize step is needed since
@@ -1062,12 +1348,27 @@ static vm_fault_t dmirror_devmem_fault(struct vm_fault
*vmf)
* invalidated the device page table.
*/
migrate_vma_finalize(&args);
+ if (order) {
+ kfree(args.dst);
+ kfree(args.src);
+ }
return 0;
+
+error_fin:
+ migrate_vma_finalize(&args);
+error_dst:
+ if (args.dst != &dst_pfns)
+ kfree(args.dst);
+error_src:
+ if (args.src != &src_pfns)
+ kfree(args.src);
+ return ret;
}
static const struct dev_pagemap_ops dmirror_devmem_ops = {
.page_free = dmirror_devmem_free,
.migrate_to_ram = dmirror_devmem_fault,
+ .page_split = dmirror_devmem_split,
};
static int dmirror_device_init(struct dmirror_device *mdevice, int id)
@@ -1086,7 +1387,7 @@ static int dmirror_device_init(struct dmirror_device
*mdevice, int id)
return ret;
/* Build a list of free ZONE_DEVICE private struct pages */
- dmirror_allocate_chunk(mdevice, NULL);
+ dmirror_allocate_chunk(mdevice, false, NULL);
return 0;
}
diff --git a/lib/test_hmm_uapi.h b/lib/test_hmm_uapi.h
index 670b4ef2a5b6..39e6ef3b67b9 100644
--- a/lib/test_hmm_uapi.h
+++ b/lib/test_hmm_uapi.h
@@ -33,6 +33,9 @@ struct hmm_dmirror_cmd {
#define HMM_DMIRROR_WRITE _IOWR('H', 0x01, struct hmm_dmirror_cmd)
#define HMM_DMIRROR_MIGRATE _IOWR('H', 0x02, struct hmm_dmirror_cmd)
#define HMM_DMIRROR_SNAPSHOT _IOWR('H', 0x03, struct hmm_dmirror_cmd)
+#define HMM_DMIRROR_FLAGS _IOWR('H', 0x04, struct hmm_dmirror_cmd)
+
+#define HMM_DMIRROR_FLAG_FAIL_ALLOC (1ULL << 0)
/*
* Values returned in hmm_dmirror_cmd.ptr for HMM_DMIRROR_SNAPSHOT.
diff --git a/tools/testing/selftests/vm/hmm-tests.c
b/tools/testing/selftests/vm/hmm-tests.c
index 0a28a6a29581..bff6c8df7403 100644
--- a/tools/testing/selftests/vm/hmm-tests.c
+++ b/tools/testing/selftests/vm/hmm-tests.c
@@ -1477,4 +1477,408 @@ TEST_F(hmm2, double_map)
hmm_buffer_free(buffer);
}
+/*
+ * Migrate private anonymous huge empty page.
+ */
+TEST_F(hmm, migrate_anon_huge_empty)
+{
+ struct hmm_buffer *buffer;
+ unsigned long npages;
+ unsigned long size;
+ unsigned long i;
+ void *old_ptr;
+ void *map;
+ int *ptr;
+ int ret;
+
+ size = TWOMEG;
+
+ buffer = malloc(sizeof(*buffer));
+ ASSERT_NE(buffer, NULL);
+
+ buffer->fd = -1;
+ buffer->size = 2 * size;
+ buffer->mirror = malloc(size);
+ ASSERT_NE(buffer->mirror, NULL);
+ memset(buffer->mirror, 0xFF, size);
+
+ buffer->ptr = mmap(NULL, 2 * size,
+ PROT_READ,
+ MAP_PRIVATE | MAP_ANONYMOUS,
+ buffer->fd, 0);
+ ASSERT_NE(buffer->ptr, MAP_FAILED);
+
+ npages = size >> self->page_shift;
+ map = (void *)ALIGN((uintptr_t)buffer->ptr, size);
+ ret = madvise(map, size, MADV_HUGEPAGE);
+ ASSERT_EQ(ret, 0);
+ old_ptr = buffer->ptr;
+ buffer->ptr = map;
+
+ /* Migrate memory to device. */
+ ret = hmm_dmirror_cmd(self->fd, HMM_DMIRROR_MIGRATE, buffer, npages);
+ ASSERT_EQ(ret, 0);
+ ASSERT_EQ(buffer->cpages, npages);
+
+ /* Check what the device read. */
+ for (i = 0, ptr = buffer->mirror; i < size / sizeof(*ptr); ++i)
+ ASSERT_EQ(ptr[i], 0);
+
+ buffer->ptr = old_ptr;
+ hmm_buffer_free(buffer);
+}
+
+/*
+ * Migrate private anonymous huge zero page.
+ */
+TEST_F(hmm, migrate_anon_huge_zero)
+{
+ struct hmm_buffer *buffer;
+ unsigned long npages;
+ unsigned long size;
+ unsigned long i;
+ void *old_ptr;
+ void *map;
+ int *ptr;
+ int ret;
+ int val;
+
+ size = TWOMEG;
+
+ buffer = malloc(sizeof(*buffer));
+ ASSERT_NE(buffer, NULL);
+
+ buffer->fd = -1;
+ buffer->size = 2 * size;
+ buffer->mirror = malloc(size);
+ ASSERT_NE(buffer->mirror, NULL);
+ memset(buffer->mirror, 0xFF, size);
+
+ buffer->ptr = mmap(NULL, 2 * size,
+ PROT_READ,
+ MAP_PRIVATE | MAP_ANONYMOUS,
+ buffer->fd, 0);
+ ASSERT_NE(buffer->ptr, MAP_FAILED);
+
+ npages = size >> self->page_shift;
+ map = (void *)ALIGN((uintptr_t)buffer->ptr, size);
+ ret = madvise(map, size, MADV_HUGEPAGE);
+ ASSERT_EQ(ret, 0);
+ old_ptr = buffer->ptr;
+ buffer->ptr = map;
+
+ /* Initialize a read-only zero huge page. */
+ val = *(int *)buffer->ptr;
+ ASSERT_EQ(val, 0);
+
+ /* Migrate memory to device. */
+ ret = hmm_dmirror_cmd(self->fd, HMM_DMIRROR_MIGRATE, buffer, npages);
+ ASSERT_EQ(ret, 0);
+ ASSERT_EQ(buffer->cpages, npages);
+
+ /* Check what the device read. */
+ for (i = 0, ptr = buffer->mirror; i < size / sizeof(*ptr); ++i)
+ ASSERT_EQ(ptr[i], 0);
+
+ /* Fault pages back to system memory and check them. */
+ for (i = 0, ptr = buffer->ptr; i < size / sizeof(*ptr); ++i) {
+ ASSERT_EQ(ptr[i], 0);
+ /* If it asserts once, it probably will 500,000 times */
+ if (ptr[i] != 0)
+ break;
+ }
+
+ buffer->ptr = old_ptr;
+ hmm_buffer_free(buffer);
+}
+
+/*
+ * Migrate private anonymous huge page and free.
+ */
+TEST_F(hmm, migrate_anon_huge_free)
+{
+ struct hmm_buffer *buffer;
+ unsigned long npages;
+ unsigned long size;
+ unsigned long i;
+ void *old_ptr;
+ void *map;
+ int *ptr;
+ int ret;
+
+ size = TWOMEG;
+
+ buffer = malloc(sizeof(*buffer));
+ ASSERT_NE(buffer, NULL);
+
+ buffer->fd = -1;
+ buffer->size = 2 * size;
+ buffer->mirror = malloc(size);
+ ASSERT_NE(buffer->mirror, NULL);
+ memset(buffer->mirror, 0xFF, size);
+
+ buffer->ptr = mmap(NULL, 2 * size,
+ PROT_READ | PROT_WRITE,
+ MAP_PRIVATE | MAP_ANONYMOUS,
+ buffer->fd, 0);
+ ASSERT_NE(buffer->ptr, MAP_FAILED);
+
+ npages = size >> self->page_shift;
+ map = (void *)ALIGN((uintptr_t)buffer->ptr, size);
+ ret = madvise(map, size, MADV_HUGEPAGE);
+ ASSERT_EQ(ret, 0);
+ old_ptr = buffer->ptr;
+ buffer->ptr = map;
+
+ /* Initialize buffer in system memory. */
+ for (i = 0, ptr = buffer->ptr; i < size / sizeof(*ptr); ++i)
+ ptr[i] = i;
+
+ /* Migrate memory to device. */
+ ret = hmm_dmirror_cmd(self->fd, HMM_DMIRROR_MIGRATE, buffer, npages);
+ ASSERT_EQ(ret, 0);
+ ASSERT_EQ(buffer->cpages, npages);
+
+ /* Check what the device read. */
+ for (i = 0, ptr = buffer->mirror; i < size / sizeof(*ptr); ++i)
+ ASSERT_EQ(ptr[i], i);
+
+ /* Try freeing it. */
+ ret = madvise(map, size, MADV_FREE);
+ ASSERT_EQ(ret, 0);
+
+ buffer->ptr = old_ptr;
+ hmm_buffer_free(buffer);
+}
+
+/*
+ * Migrate private anonymous huge page and fault back to sysmem.
+ */
+TEST_F(hmm, migrate_anon_huge_fault)
+{
+ struct hmm_buffer *buffer;
+ unsigned long npages;
+ unsigned long size;
+ unsigned long i;
+ void *old_ptr;
+ void *map;
+ int *ptr;
+ int ret;
+
+ size = TWOMEG;
+
+ buffer = malloc(sizeof(*buffer));
+ ASSERT_NE(buffer, NULL);
+
+ buffer->fd = -1;
+ buffer->size = 2 * size;
+ buffer->mirror = malloc(size);
+ ASSERT_NE(buffer->mirror, NULL);
+ memset(buffer->mirror, 0xFF, size);
+
+ buffer->ptr = mmap(NULL, 2 * size,
+ PROT_READ | PROT_WRITE,
+ MAP_PRIVATE | MAP_ANONYMOUS,
+ buffer->fd, 0);
+ ASSERT_NE(buffer->ptr, MAP_FAILED);
+
+ npages = size >> self->page_shift;
+ map = (void *)ALIGN((uintptr_t)buffer->ptr, size);
+ ret = madvise(map, size, MADV_HUGEPAGE);
+ ASSERT_EQ(ret, 0);
+ old_ptr = buffer->ptr;
+ buffer->ptr = map;
+
+ /* Initialize buffer in system memory. */
+ for (i = 0, ptr = buffer->ptr; i < size / sizeof(*ptr); ++i)
+ ptr[i] = i;
+
+ /* Migrate memory to device. */
+ ret = hmm_dmirror_cmd(self->fd, HMM_DMIRROR_MIGRATE, buffer, npages);
+ ASSERT_EQ(ret, 0);
+ ASSERT_EQ(buffer->cpages, npages);
+
+ /* Check what the device read. */
+ for (i = 0, ptr = buffer->mirror; i < size / sizeof(*ptr); ++i)
+ ASSERT_EQ(ptr[i], i);
+
+ /* Fault pages back to system memory and check them. */
+ for (i = 0, ptr = buffer->ptr; i < size / sizeof(*ptr); ++i)
+ ASSERT_EQ(ptr[i], i);
+
+ buffer->ptr = old_ptr;
+ hmm_buffer_free(buffer);
+}
+
+/*
+ * Migrate private anonymous huge page with allocation errors.
+ */
+TEST_F(hmm, migrate_anon_huge_err)
+{
+ struct hmm_buffer *buffer;
+ unsigned long npages;
+ unsigned long size;
+ unsigned long i;
+ void *old_ptr;
+ void *map;
+ int *ptr;
+ int ret;
+
+ size = TWOMEG;
+
+ buffer = malloc(sizeof(*buffer));
+ ASSERT_NE(buffer, NULL);
+
+ buffer->fd = -1;
+ buffer->size = 2 * size;
+ buffer->mirror = malloc(2 * size);
+ ASSERT_NE(buffer->mirror, NULL);
+ memset(buffer->mirror, 0xFF, 2 * size);
+
+ old_ptr = mmap(NULL, 2 * size, PROT_READ | PROT_WRITE,
+ MAP_PRIVATE | MAP_ANONYMOUS, buffer->fd, 0);
+ ASSERT_NE(old_ptr, MAP_FAILED);
+
+ npages = size >> self->page_shift;
+ map = (void *)ALIGN((uintptr_t)old_ptr, size);
+ ret = madvise(map, size, MADV_HUGEPAGE);
+ ASSERT_EQ(ret, 0);
+ buffer->ptr = map;
+
+ /* Initialize buffer in system memory. */
+ for (i = 0, ptr = buffer->ptr; i < size / sizeof(*ptr); ++i)
+ ptr[i] = i;
+
+ /* Migrate memory to device but force a THP allocation error. */
+ ret = hmm_dmirror_cmd(self->fd, HMM_DMIRROR_FLAGS, buffer,
+ HMM_DMIRROR_FLAG_FAIL_ALLOC);
+ ASSERT_EQ(ret, 0);
+ ret = hmm_dmirror_cmd(self->fd, HMM_DMIRROR_MIGRATE, buffer, npages);
+ ASSERT_EQ(ret, 0);
+ ASSERT_EQ(buffer->cpages, npages);
+
+ /* Check what the device read. */
+ for (i = 0, ptr = buffer->mirror; i < size / sizeof(*ptr); ++i)
+ ASSERT_EQ(ptr[i], i);
+
+ /* Try faulting back a single (PAGE_SIZE) page. */
+ ptr = buffer->ptr;
+ ASSERT_EQ(ptr[2048], 2048);
+
+ /* unmap and remap the region to reset things. */
+ ret = munmap(old_ptr, 2 * size);
+ ASSERT_EQ(ret, 0);
+ old_ptr = mmap(NULL, 2 * size, PROT_READ | PROT_WRITE,
+ MAP_PRIVATE | MAP_ANONYMOUS, buffer->fd, 0);
+ ASSERT_NE(old_ptr, MAP_FAILED);
+ map = (void *)ALIGN((uintptr_t)old_ptr, size);
+ ret = madvise(map, size, MADV_HUGEPAGE);
+ ASSERT_EQ(ret, 0);
+ buffer->ptr = map;
+
+ /* Initialize buffer in system memory. */
+ for (i = 0, ptr = buffer->ptr; i < size / sizeof(*ptr); ++i)
+ ptr[i] = i;
+
+ /* Migrate THP to device. */
+ ret = hmm_dmirror_cmd(self->fd, HMM_DMIRROR_MIGRATE, buffer, npages);
+ ASSERT_EQ(ret, 0);
+ ASSERT_EQ(buffer->cpages, npages);
+
+ /*
+ * Force an allocation error when faulting back a THP resident in the
+ * device.
+ */
+ ret = hmm_dmirror_cmd(self->fd, HMM_DMIRROR_FLAGS, buffer,
+ HMM_DMIRROR_FLAG_FAIL_ALLOC);
+ ASSERT_EQ(ret, 0);
+ ptr = buffer->ptr;
+ ASSERT_EQ(ptr[2048], 2048);
+
+ buffer->ptr = old_ptr;
+ hmm_buffer_free(buffer);
+}
+
+/*
+ * Migrate private anonymous huge zero page with allocation errors.
+ */
+TEST_F(hmm, migrate_anon_huge_zero_err)
+{
+ struct hmm_buffer *buffer;
+ unsigned long npages;
+ unsigned long size;
+ unsigned long i;
+ void *old_ptr;
+ void *map;
+ int *ptr;
+ int ret;
+
+ size = TWOMEG;
+
+ buffer = malloc(sizeof(*buffer));
+ ASSERT_NE(buffer, NULL);
+
+ buffer->fd = -1;
+ buffer->size = 2 * size;
+ buffer->mirror = malloc(2 * size);
+ ASSERT_NE(buffer->mirror, NULL);
+ memset(buffer->mirror, 0xFF, 2 * size);
+
+ old_ptr = mmap(NULL, 2 * size, PROT_READ,
+ MAP_PRIVATE | MAP_ANONYMOUS, buffer->fd, 0);
+ ASSERT_NE(old_ptr, MAP_FAILED);
+
+ npages = size >> self->page_shift;
+ map = (void *)ALIGN((uintptr_t)old_ptr, size);
+ ret = madvise(map, size, MADV_HUGEPAGE);
+ ASSERT_EQ(ret, 0);
+ buffer->ptr = map;
+
+ /* Migrate memory to device but force a THP allocation error. */
+ ret = hmm_dmirror_cmd(self->fd, HMM_DMIRROR_FLAGS, buffer,
+ HMM_DMIRROR_FLAG_FAIL_ALLOC);
+ ASSERT_EQ(ret, 0);
+ ret = hmm_dmirror_cmd(self->fd, HMM_DMIRROR_MIGRATE, buffer, npages);
+ ASSERT_EQ(ret, 0);
+ ASSERT_EQ(buffer->cpages, npages);
+
+ /* Check what the device read. */
+ for (i = 0, ptr = buffer->mirror; i < size / sizeof(*ptr); ++i)
+ ASSERT_EQ(ptr[i], 0);
+
+ /* Try faulting back a single (PAGE_SIZE) page. */
+ ptr = buffer->ptr;
+ ASSERT_EQ(ptr[2048], 0);
+
+ /* unmap and remap the region to reset things. */
+ ret = munmap(old_ptr, 2 * size);
+ ASSERT_EQ(ret, 0);
+ old_ptr = mmap(NULL, 2 * size, PROT_READ,
+ MAP_PRIVATE | MAP_ANONYMOUS, buffer->fd, 0);
+ ASSERT_NE(old_ptr, MAP_FAILED);
+ map = (void *)ALIGN((uintptr_t)old_ptr, size);
+ ret = madvise(map, size, MADV_HUGEPAGE);
+ ASSERT_EQ(ret, 0);
+ buffer->ptr = map;
+
+ /* Initialize buffer in system memory (zero THP page). */
+ ret = ptr[0];
+ ASSERT_EQ(ret, 0);
+
+ /* Migrate memory to device but force a THP allocation error. */
+ ret = hmm_dmirror_cmd(self->fd, HMM_DMIRROR_FLAGS, buffer,
+ HMM_DMIRROR_FLAG_FAIL_ALLOC);
+ ASSERT_EQ(ret, 0);
+ ret = hmm_dmirror_cmd(self->fd, HMM_DMIRROR_MIGRATE, buffer, npages);
+ ASSERT_EQ(ret, 0);
+ ASSERT_EQ(buffer->cpages, npages);
+
+ /* Fault the device memory back and check it. */
+ for (i = 0, ptr = buffer->ptr; i < size / sizeof(*ptr); ++i)
+ ASSERT_EQ(ptr[i], 0);
+
+ buffer->ptr = old_ptr;
+ hmm_buffer_free(buffer);
+}
+
TEST_HARNESS_MAIN
--
2.20.1
Ralph Campbell
2020-Sep-02 16:58 UTC
[Nouveau] [PATCH v2 7/7] nouveau: support THP migration to private memory
Add support for migrating transparent huge pages to and from device
private memory.
Signed-off-by: Ralph Campbell <rcampbell at nvidia.com>
---
drivers/gpu/drm/nouveau/nouveau_dmem.c | 289 ++++++++++++++++++-------
drivers/gpu/drm/nouveau/nouveau_svm.c | 11 +-
drivers/gpu/drm/nouveau/nouveau_svm.h | 3 +-
3 files changed, 215 insertions(+), 88 deletions(-)
diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c
b/drivers/gpu/drm/nouveau/nouveau_dmem.c
index a13c6215bba8..78ad0ee77b3d 100644
--- a/drivers/gpu/drm/nouveau/nouveau_dmem.c
+++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c
@@ -82,6 +82,7 @@ struct nouveau_dmem {
struct list_head chunks;
struct mutex mutex;
struct page *free_pages;
+ struct page *free_huge_pages;
spinlock_t lock;
};
@@ -112,8 +113,13 @@ static void nouveau_dmem_page_free(struct page *page)
struct nouveau_dmem *dmem = chunk->drm->dmem;
spin_lock(&dmem->lock);
- page->zone_device_data = dmem->free_pages;
- dmem->free_pages = page;
+ if (PageHead(page)) {
+ page->zone_device_data = dmem->free_huge_pages;
+ dmem->free_huge_pages = page;
+ } else {
+ page->zone_device_data = dmem->free_pages;
+ dmem->free_pages = page;
+ }
WARN_ON(!chunk->callocated);
chunk->callocated--;
@@ -139,51 +145,100 @@ static void nouveau_dmem_fence_done(struct nouveau_fence
**fence)
static vm_fault_t nouveau_dmem_fault_copy_one(struct nouveau_drm *drm,
struct vm_fault *vmf, struct migrate_vma *args,
- dma_addr_t *dma_addr)
+ struct page *spage, bool is_huge, dma_addr_t *dma_addr)
{
+ struct nouveau_svmm *svmm = spage->zone_device_data;
struct device *dev = drm->dev->dev;
- struct page *dpage, *spage;
- struct nouveau_svmm *svmm;
-
- spage = migrate_pfn_to_page(args->src[0]);
- if (!spage || !(args->src[0] & MIGRATE_PFN_MIGRATE))
- return 0;
+ struct page *dpage;
+ unsigned int i;
- dpage = alloc_page_vma(GFP_HIGHUSER, vmf->vma, vmf->address);
+ if (is_huge)
+ dpage = alloc_transhugepage(vmf->vma, args->start);
+ else
+ dpage = alloc_page_vma(GFP_HIGHUSER, vmf->vma, vmf->address);
if (!dpage)
- return VM_FAULT_SIGBUS;
- lock_page(dpage);
+ return VM_FAULT_OOM;
+ WARN_ON_ONCE(compound_order(spage) != compound_order(dpage));
- *dma_addr = dma_map_page(dev, dpage, 0, PAGE_SIZE, DMA_BIDIRECTIONAL);
+ *dma_addr = dma_map_page(dev, dpage, 0, page_size(dpage),
+ DMA_BIDIRECTIONAL);
if (dma_mapping_error(dev, *dma_addr))
goto error_free_page;
- svmm = spage->zone_device_data;
+ lock_page(dpage);
+ i = (vmf->address - args->start) >> PAGE_SHIFT;
+ spage += i;
mutex_lock(&svmm->mutex);
nouveau_svmm_invalidate(svmm, args->start, args->end);
- if (drm->dmem->migrate.copy_func(drm, 1, NOUVEAU_APER_HOST, *dma_addr,
- NOUVEAU_APER_VRAM, nouveau_dmem_page_addr(spage)))
+ if (drm->dmem->migrate.copy_func(drm, compound_nr(dpage),
+ NOUVEAU_APER_HOST, *dma_addr, NOUVEAU_APER_VRAM,
+ nouveau_dmem_page_addr(spage)))
goto error_dma_unmap;
mutex_unlock(&svmm->mutex);
- args->dst[0] = migrate_pfn(page_to_pfn(dpage)) | MIGRATE_PFN_LOCKED;
+ args->dst[i] = migrate_pfn(page_to_pfn(dpage)) | MIGRATE_PFN_LOCKED;
+ if (is_huge)
+ args->dst[i] |= MIGRATE_PFN_COMPOUND;
return 0;
error_dma_unmap:
mutex_unlock(&svmm->mutex);
- dma_unmap_page(dev, *dma_addr, PAGE_SIZE, DMA_BIDIRECTIONAL);
+ unlock_page(dpage);
+ dma_unmap_page(dev, *dma_addr, page_size(dpage), DMA_BIDIRECTIONAL);
error_free_page:
__free_page(dpage);
return VM_FAULT_SIGBUS;
}
+static vm_fault_t nouveau_dmem_fault_chunk(struct nouveau_drm *drm,
+ struct vm_fault *vmf, struct migrate_vma *args)
+{
+ struct device *dev = drm->dev->dev;
+ struct nouveau_fence *fence;
+ struct page *spage;
+ unsigned long src = args->src[0];
+ bool is_huge = (src & (MIGRATE_PFN_MIGRATE | MIGRATE_PFN_COMPOUND)) =+
(MIGRATE_PFN_MIGRATE | MIGRATE_PFN_COMPOUND);
+ unsigned long dma_page_size;
+ dma_addr_t dma_addr;
+ vm_fault_t ret = 0;
+
+ spage = migrate_pfn_to_page(src);
+ if (!spage) {
+ ret = VM_FAULT_SIGBUS;
+ goto out;
+ }
+ if (is_huge) {
+ dma_page_size = PMD_SIZE;
+ ret = nouveau_dmem_fault_copy_one(drm, vmf, args, spage, true,
+ &dma_addr);
+ if (!ret)
+ goto fence;
+ /*
+ * If we couldn't allocate a huge page, fallback to migrating
+ * a single page.
+ */
+ }
+ dma_page_size = PAGE_SIZE;
+ ret = nouveau_dmem_fault_copy_one(drm, vmf, args, spage, false,
+ &dma_addr);
+ if (ret)
+ goto out;
+fence:
+ nouveau_fence_new(drm->dmem->migrate.chan, false, &fence);
+ migrate_vma_pages(args);
+ nouveau_dmem_fence_done(&fence);
+ dma_unmap_page(dev, dma_addr, dma_page_size, DMA_BIDIRECTIONAL);
+out:
+ migrate_vma_finalize(args);
+ return ret;
+}
+
static vm_fault_t nouveau_dmem_migrate_to_ram(struct vm_fault *vmf)
{
struct nouveau_drm *drm = page_to_drm(vmf->page);
- struct nouveau_dmem *dmem = drm->dmem;
- struct nouveau_fence *fence;
unsigned long src = 0, dst = 0;
- dma_addr_t dma_addr = 0;
+ struct page *page;
vm_fault_t ret;
struct migrate_vma args = {
.vma = vmf->vma,
@@ -192,39 +247,64 @@ static vm_fault_t nouveau_dmem_migrate_to_ram(struct
vm_fault *vmf)
.src = &src,
.dst = &dst,
.pgmap_owner = drm->dev,
- .flags = MIGRATE_VMA_SELECT_DEVICE_PRIVATE,
+ .flags = MIGRATE_VMA_SELECT_DEVICE_PRIVATE |
+ MIGRATE_VMA_SELECT_COMPOUND,
};
+ /*
+ * If the page was migrated to the GPU as a huge page, try to
+ * migrate it back the same way.
+ */
+ page = compound_head(vmf->page);
+ if (PageHead(page)) {
+ unsigned int order = compound_order(page);
+ unsigned int nr_pages = 1U << order;
+
+ args.start &= PAGE_MASK << order;
+ args.end = args.start + (PAGE_SIZE << order);
+ args.src = kmalloc_array(nr_pages, sizeof(*args.src),
+ GFP_KERNEL);
+ if (!args.src)
+ return VM_FAULT_OOM;
+ args.dst = kmalloc_array(nr_pages, sizeof(*args.dst),
+ GFP_KERNEL);
+ if (!args.dst) {
+ ret = VM_FAULT_OOM;
+ goto error_src;
+ }
+ }
+
/*
* FIXME what we really want is to find some heuristic to migrate more
* than just one page on CPU fault. When such fault happens it is very
* likely that more surrounding page will CPU fault too.
*/
- if (migrate_vma_setup(&args) < 0)
- return VM_FAULT_SIGBUS;
- if (!args.cpages)
- return 0;
-
- ret = nouveau_dmem_fault_copy_one(drm, vmf, &args, &dma_addr);
- if (ret || dst == 0)
- goto done;
-
- nouveau_fence_new(dmem->migrate.chan, false, &fence);
- migrate_vma_pages(&args);
- nouveau_dmem_fence_done(&fence);
- dma_unmap_page(drm->dev->dev, dma_addr, PAGE_SIZE, DMA_BIDIRECTIONAL);
-done:
- migrate_vma_finalize(&args);
+ if (migrate_vma_setup(&args))
+ ret = VM_FAULT_SIGBUS;
+ else
+ ret = nouveau_dmem_fault_chunk(drm, vmf, &args);
+ if (args.dst != &dst)
+ kfree(args.dst);
+error_src:
+ if (args.src != &src)
+ kfree(args.src);
return ret;
}
+static void nouveau_page_split(struct page *head, struct page *page)
+{
+ page->pgmap = head->pgmap;
+ page->zone_device_data = head->zone_device_data;
+}
+
static const struct dev_pagemap_ops nouveau_dmem_pagemap_ops = {
.page_free = nouveau_dmem_page_free,
.migrate_to_ram = nouveau_dmem_migrate_to_ram,
+ .page_split = nouveau_page_split,
};
-static int
-nouveau_dmem_chunk_alloc(struct nouveau_drm *drm, struct page **ppage)
+static int nouveau_dmem_chunk_alloc(struct nouveau_drm *drm, bool is_huge,
+ struct page **ppage)
{
struct nouveau_dmem_chunk *chunk;
struct resource *res;
@@ -278,16 +358,20 @@ nouveau_dmem_chunk_alloc(struct nouveau_drm *drm, struct
page **ppage)
pfn_first = chunk->pagemap.range.start >> PAGE_SHIFT;
page = pfn_to_page(pfn_first);
spin_lock(&drm->dmem->lock);
- for (i = 0; i < DMEM_CHUNK_NPAGES - 1; ++i, ++page) {
- page->zone_device_data = drm->dmem->free_pages;
- drm->dmem->free_pages = page;
- }
+ if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) && is_huge)
+ prep_transhuge_device_private_page(page);
+ else
+ for (i = 0; i < DMEM_CHUNK_NPAGES - 1; ++i, ++page) {
+ page->zone_device_data = drm->dmem->free_pages;
+ drm->dmem->free_pages = page;
+ }
*ppage = page;
chunk->callocated++;
spin_unlock(&drm->dmem->lock);
- NV_INFO(drm, "DMEM: registered %ldMB of device memory\n",
- DMEM_CHUNK_SIZE >> 20);
+ NV_INFO(drm, "DMEM: registered %ldMB of %sdevice memory %lx %lx\n",
+ DMEM_CHUNK_SIZE >> 20, is_huge ? "huge " : "",
pfn_first,
+ nouveau_dmem_page_addr(page));
return 0;
@@ -304,14 +388,20 @@ nouveau_dmem_chunk_alloc(struct nouveau_drm *drm, struct
page **ppage)
}
static struct page *
-nouveau_dmem_page_alloc_locked(struct nouveau_drm *drm)
+nouveau_dmem_page_alloc_locked(struct nouveau_drm *drm, bool is_huge)
{
struct nouveau_dmem_chunk *chunk;
struct page *page = NULL;
int ret;
spin_lock(&drm->dmem->lock);
- if (drm->dmem->free_pages) {
+ if (is_huge && drm->dmem->free_huge_pages) {
+ page = drm->dmem->free_huge_pages;
+ drm->dmem->free_huge_pages = page->zone_device_data;
+ chunk = nouveau_page_to_chunk(page);
+ chunk->callocated++;
+ spin_unlock(&drm->dmem->lock);
+ } else if (!is_huge && drm->dmem->free_pages) {
page = drm->dmem->free_pages;
drm->dmem->free_pages = page->zone_device_data;
chunk = nouveau_page_to_chunk(page);
@@ -319,7 +409,7 @@ nouveau_dmem_page_alloc_locked(struct nouveau_drm *drm)
spin_unlock(&drm->dmem->lock);
} else {
spin_unlock(&drm->dmem->lock);
- ret = nouveau_dmem_chunk_alloc(drm, &page);
+ ret = nouveau_dmem_chunk_alloc(drm, is_huge, &page);
if (ret)
return NULL;
}
@@ -567,31 +657,22 @@ nouveau_dmem_init(struct nouveau_drm *drm)
static unsigned long nouveau_dmem_migrate_copy_one(struct nouveau_drm *drm,
struct nouveau_svmm *svmm, unsigned long src,
- dma_addr_t *dma_addr, u64 *pfn)
+ struct page *spage, bool is_huge, dma_addr_t dma_addr, u64 *pfn)
{
- struct device *dev = drm->dev->dev;
- struct page *dpage, *spage;
+ struct page *dpage;
unsigned long paddr;
+ unsigned long dst;
- spage = migrate_pfn_to_page(src);
- if (!(src & MIGRATE_PFN_MIGRATE))
- goto out;
-
- dpage = nouveau_dmem_page_alloc_locked(drm);
+ dpage = nouveau_dmem_page_alloc_locked(drm, is_huge);
if (!dpage)
goto out;
paddr = nouveau_dmem_page_addr(dpage);
if (spage) {
- *dma_addr = dma_map_page(dev, spage, 0, page_size(spage),
- DMA_BIDIRECTIONAL);
- if (dma_mapping_error(dev, *dma_addr))
+ if (drm->dmem->migrate.copy_func(drm, compound_nr(dpage),
+ NOUVEAU_APER_VRAM, paddr, NOUVEAU_APER_HOST, dma_addr))
goto out_free_page;
- if (drm->dmem->migrate.copy_func(drm, 1,
- NOUVEAU_APER_VRAM, paddr, NOUVEAU_APER_HOST, *dma_addr))
- goto out_dma_unmap;
} else {
- *dma_addr = DMA_MAPPING_ERROR;
if (drm->dmem->migrate.clear_func(drm, page_size(dpage),
NOUVEAU_APER_VRAM, paddr))
goto out_free_page;
@@ -602,10 +683,11 @@ static unsigned long nouveau_dmem_migrate_copy_one(struct
nouveau_drm *drm,
((paddr >> PAGE_SHIFT) << NVIF_VMM_PFNMAP_V0_ADDR_SHIFT);
if (src & MIGRATE_PFN_WRITE)
*pfn |= NVIF_VMM_PFNMAP_V0_W;
- return migrate_pfn(page_to_pfn(dpage)) | MIGRATE_PFN_LOCKED;
+ dst = migrate_pfn(page_to_pfn(dpage)) | MIGRATE_PFN_LOCKED;
+ if (PageHead(dpage))
+ dst |= MIGRATE_PFN_COMPOUND;
+ return dst;
-out_dma_unmap:
- dma_unmap_page(dev, *dma_addr, PAGE_SIZE, DMA_BIDIRECTIONAL);
out_free_page:
nouveau_dmem_page_free_locked(drm, dpage);
out:
@@ -617,26 +699,64 @@ static void nouveau_dmem_migrate_chunk(struct nouveau_drm
*drm,
struct nouveau_svmm *svmm, struct migrate_vma *args,
dma_addr_t *dma_addrs, u64 *pfns)
{
+ struct device *dev = drm->dev->dev;
struct nouveau_fence *fence;
unsigned long addr = args->start, nr_dma = 0, i;
+ unsigned int page_shift = PAGE_SHIFT;
+ struct page *spage;
+ unsigned long src = args->src[0];
+ bool is_huge = (src & (MIGRATE_PFN_MIGRATE | MIGRATE_PFN_COMPOUND)) =+
(MIGRATE_PFN_MIGRATE | MIGRATE_PFN_COMPOUND);
+ unsigned long dma_page_size = is_huge ? PMD_SIZE : PAGE_SIZE;
+
+ if (is_huge) {
+ spage = migrate_pfn_to_page(src);
+ if (spage) {
+ dma_addrs[nr_dma] = dma_map_page(dev, spage, 0,
+ page_size(spage),
+ DMA_BIDIRECTIONAL);
+ if (dma_mapping_error(dev, dma_addrs[nr_dma]))
+ goto out;
+ nr_dma++;
+ }
+ args->dst[0] = nouveau_dmem_migrate_copy_one(drm, svmm, src,
+ spage, true, *dma_addrs, pfns);
+ if (args->dst[0] & MIGRATE_PFN_COMPOUND) {
+ page_shift = PMD_SHIFT;
+ i = 1;
+ goto fence;
+ }
+ }
- for (i = 0; addr < args->end; i++) {
- args->dst[i] = nouveau_dmem_migrate_copy_one(drm, svmm,
- args->src[i], dma_addrs + nr_dma, pfns + i);
- if (!dma_mapping_error(drm->dev->dev, dma_addrs[nr_dma]))
+ for (i = 0; addr < args->end; i++, addr += PAGE_SIZE) {
+ src = args->src[i];
+ if (!(src & MIGRATE_PFN_MIGRATE))
+ continue;
+ spage = migrate_pfn_to_page(src);
+ if (spage && !is_huge) {
+ dma_addrs[i] = dma_map_page(dev, spage, 0,
+ page_size(spage),
+ DMA_BIDIRECTIONAL);
+ if (dma_mapping_error(dev, dma_addrs[i]))
+ break;
nr_dma++;
- addr += PAGE_SIZE;
+ } else if (spage && is_huge && i != 0)
+ dma_addrs[i] = dma_addrs[i - 1] + PAGE_SIZE;
+ args->dst[i] = nouveau_dmem_migrate_copy_one(drm, svmm, src,
+ spage, false, dma_addrs[i], pfns + i);
}
+fence:
nouveau_fence_new(drm->dmem->migrate.chan, false, &fence);
migrate_vma_pages(args);
nouveau_dmem_fence_done(&fence);
- nouveau_pfns_map(svmm, args->vma->vm_mm, args->start, pfns, i);
+ nouveau_pfns_map(svmm, args->vma->vm_mm, args->start, pfns, i,
+ page_shift);
- while (nr_dma--) {
- dma_unmap_page(drm->dev->dev, dma_addrs[nr_dma], PAGE_SIZE,
- DMA_BIDIRECTIONAL);
- }
+ while (nr_dma)
+ dma_unmap_page(drm->dev->dev, dma_addrs[--nr_dma],
+ dma_page_size, DMA_BIDIRECTIONAL);
+out:
migrate_vma_finalize(args);
}
@@ -648,25 +768,25 @@ nouveau_dmem_migrate_vma(struct nouveau_drm *drm,
unsigned long end)
{
unsigned long npages = (end - start) >> PAGE_SHIFT;
- unsigned long max = min(SG_MAX_SINGLE_ALLOC, npages);
+ unsigned long max = min(1UL << (PMD_SHIFT - PAGE_SHIFT), npages);
dma_addr_t *dma_addrs;
struct migrate_vma args = {
.vma = vma,
.start = start,
.pgmap_owner = drm->dev,
- .flags = MIGRATE_VMA_SELECT_SYSTEM,
+ .flags = MIGRATE_VMA_SELECT_SYSTEM |
+ MIGRATE_VMA_SELECT_COMPOUND,
};
- unsigned long i;
u64 *pfns;
int ret = -ENOMEM;
if (drm->dmem == NULL)
return -ENODEV;
- args.src = kcalloc(max, sizeof(*args.src), GFP_KERNEL);
+ args.src = kmalloc_array(max, sizeof(*args.src), GFP_KERNEL);
if (!args.src)
goto out;
- args.dst = kcalloc(max, sizeof(*args.dst), GFP_KERNEL);
+ args.dst = kmalloc_array(max, sizeof(*args.dst), GFP_KERNEL);
if (!args.dst)
goto out_free_src;
@@ -678,8 +798,10 @@ nouveau_dmem_migrate_vma(struct nouveau_drm *drm,
if (!pfns)
goto out_free_dma;
- for (i = 0; i < npages; i += max) {
- args.end = start + (max << PAGE_SHIFT);
+ for (; args.start < end; args.start = args.end) {
+ args.end = min(end, ALIGN(args.start, PMD_SIZE));
+ if (args.start == args.end)
+ args.end = min(end, args.start + PMD_SIZE);
ret = migrate_vma_setup(&args);
if (ret)
goto out_free_pfns;
@@ -687,7 +809,6 @@ nouveau_dmem_migrate_vma(struct nouveau_drm *drm,
if (args.cpages)
nouveau_dmem_migrate_chunk(drm, svmm, &args, dma_addrs,
pfns);
- args.start = args.end;
}
ret = 0;
diff --git a/drivers/gpu/drm/nouveau/nouveau_svm.c
b/drivers/gpu/drm/nouveau/nouveau_svm.c
index 4f69e4c3dafd..3db0997f21b5 100644
--- a/drivers/gpu/drm/nouveau/nouveau_svm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_svm.c
@@ -681,7 +681,6 @@ nouveau_svm_fault(struct nvif_notify *notify)
nouveau_svm_fault_cancel_fault(svm, buffer->fault[fi]);
continue;
}
- SVMM_DBG(svmm, "addr %016llx", buffer->fault[fi]->addr);
/* We try and group handling of faults within a small
* window into a single update.
@@ -733,6 +732,10 @@ nouveau_svm_fault(struct nvif_notify *notify)
}
mmput(mm);
+ SVMM_DBG(svmm, "addr %llx %s %c", buffer->fault[fi]->addr,
+ args.phys[0] & NVIF_VMM_PFNMAP_V0_VRAM ?
+ "vram" : "sysmem",
+ args.i.p.size > PAGE_SIZE ? 'H' : 'N');
limit = args.i.p.addr + args.i.p.size;
for (fn = fi; ++fn < buffer->fault_nr; ) {
/* It's okay to skip over duplicate addresses from the
@@ -804,13 +807,15 @@ nouveau_pfns_free(u64 *pfns)
void
nouveau_pfns_map(struct nouveau_svmm *svmm, struct mm_struct *mm,
- unsigned long addr, u64 *pfns, unsigned long npages)
+ unsigned long addr, u64 *pfns, unsigned long npages,
+ unsigned int page_shift)
{
struct nouveau_pfnmap_args *args = nouveau_pfns_to_args(pfns);
int ret;
args->p.addr = addr;
- args->p.size = npages << PAGE_SHIFT;
+ args->p.page = page_shift;
+ args->p.size = npages << args->p.page;
mutex_lock(&svmm->mutex);
diff --git a/drivers/gpu/drm/nouveau/nouveau_svm.h
b/drivers/gpu/drm/nouveau/nouveau_svm.h
index e7d63d7f0c2d..3fd78662f17e 100644
--- a/drivers/gpu/drm/nouveau/nouveau_svm.h
+++ b/drivers/gpu/drm/nouveau/nouveau_svm.h
@@ -33,7 +33,8 @@ void nouveau_svmm_invalidate(struct nouveau_svmm *svmm, u64
start, u64 limit);
u64 *nouveau_pfns_alloc(unsigned long npages);
void nouveau_pfns_free(u64 *pfns);
void nouveau_pfns_map(struct nouveau_svmm *svmm, struct mm_struct *mm,
- unsigned long addr, u64 *pfns, unsigned long npages);
+ unsigned long addr, u64 *pfns, unsigned long npages,
+ unsigned int page_shift);
#else /* IS_ENABLED(CONFIG_DRM_NOUVEAU_SVM) */
static inline void nouveau_svm_init(struct nouveau_drm *drm) {}
static inline void nouveau_svm_fini(struct nouveau_drm *drm) {}
--
2.20.1
Yang Shi
2020-Sep-02 20:29 UTC
[Nouveau] [PATCH v2 1/7] mm/thp: fix __split_huge_pmd_locked() for migration PMD
On Wed, Sep 2, 2020 at 9:58 AM Ralph Campbell <rcampbell at nvidia.com> wrote:> > A migrating transparent huge page has to already be unmapped. Otherwise, > the page could be modified while it is being copied to a new page and > data could be lost. The function __split_huge_pmd() checks for a PMD > migration entry before calling __split_huge_pmd_locked() leading one to > think that __split_huge_pmd_locked() can handle splitting a migrating PMD. > However, the code always increments the page->_mapcount and adjusts the > memory control group accounting assuming the page is mapped. > Also, if the PMD entry is a migration PMD entry, the call to > is_huge_zero_pmd(*pmd) is incorrect because it calls pmd_pfn(pmd) instead > of migration_entry_to_pfn(pmd_to_swp_entry(pmd)). > Fix these problems by checking for a PMD migration entry.Thanks for catching this. The fix looks good to me. Reviewed-by: Yang Shi <shy828301 at gmail.com> I think this fix can go separately with the series.> > Signed-off-by: Ralph Campbell <rcampbell at nvidia.com> > --- > mm/huge_memory.c | 42 +++++++++++++++++++++++------------------- > 1 file changed, 23 insertions(+), 19 deletions(-) > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index 2a468a4acb0a..606d712d9505 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -2023,7 +2023,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, > put_page(page); > add_mm_counter(mm, mm_counter_file(page), -HPAGE_PMD_NR); > return; > - } else if (is_huge_zero_pmd(*pmd)) { > + } else if (pmd_trans_huge(*pmd) && is_huge_zero_pmd(*pmd)) { > /* > * FIXME: Do we want to invalidate secondary mmu by calling > * mmu_notifier_invalidate_range() see comments below inside > @@ -2117,30 +2117,34 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, > pte = pte_offset_map(&_pmd, addr); > BUG_ON(!pte_none(*pte)); > set_pte_at(mm, addr, pte, entry); > - atomic_inc(&page[i]._mapcount); > - pte_unmap(pte); > - } > - > - /* > - * Set PG_double_map before dropping compound_mapcount to avoid > - * false-negative page_mapped(). > - */ > - if (compound_mapcount(page) > 1 && !TestSetPageDoubleMap(page)) { > - for (i = 0; i < HPAGE_PMD_NR; i++) > + if (!pmd_migration) > atomic_inc(&page[i]._mapcount); > + pte_unmap(pte); > } > > - lock_page_memcg(page); > - if (atomic_add_negative(-1, compound_mapcount_ptr(page))) { > - /* Last compound_mapcount is gone. */ > - __dec_lruvec_page_state(page, NR_ANON_THPS); > - if (TestClearPageDoubleMap(page)) { > - /* No need in mapcount reference anymore */ > + if (!pmd_migration) { > + /* > + * Set PG_double_map before dropping compound_mapcount to avoid > + * false-negative page_mapped(). > + */ > + if (compound_mapcount(page) > 1 && > + !TestSetPageDoubleMap(page)) { > for (i = 0; i < HPAGE_PMD_NR; i++) > - atomic_dec(&page[i]._mapcount); > + atomic_inc(&page[i]._mapcount); > + } > + > + lock_page_memcg(page); > + if (atomic_add_negative(-1, compound_mapcount_ptr(page))) { > + /* Last compound_mapcount is gone. */ > + __dec_lruvec_page_state(page, NR_ANON_THPS); > + if (TestClearPageDoubleMap(page)) { > + /* No need in mapcount reference anymore */ > + for (i = 0; i < HPAGE_PMD_NR; i++) > + atomic_dec(&page[i]._mapcount); > + } > } > + unlock_page_memcg(page); > } > - unlock_page_memcg(page); > > smp_wmb(); /* make pte visible before pmd */ > pmd_populate(mm, pmd, pgtable); > -- > 2.20.1 > >
Zi Yan
2020-Sep-02 21:47 UTC
[Nouveau] [PATCH v2 1/7] mm/thp: fix __split_huge_pmd_locked() for migration PMD
On 2 Sep 2020, at 12:58, Ralph Campbell wrote:> A migrating transparent huge page has to already be unmapped. Otherwise, > the page could be modified while it is being copied to a new page and > data could be lost. The function __split_huge_pmd() checks for a PMD > migration entry before calling __split_huge_pmd_locked() leading one to > think that __split_huge_pmd_locked() can handle splitting a migrating PMD. > However, the code always increments the page->_mapcount and adjusts the > memory control group accounting assuming the page is mapped. > Also, if the PMD entry is a migration PMD entry, the call to > is_huge_zero_pmd(*pmd) is incorrect because it calls pmd_pfn(pmd) instead > of migration_entry_to_pfn(pmd_to_swp_entry(pmd)). > Fix these problems by checking for a PMD migration entry. > > Signed-off-by: Ralph Campbell <rcampbell at nvidia.com>Thanks for the fix. You can add Reviewed-by: Zi Yan <ziy at nvidia.com> I think you also want to add the Fixes tag and cc stable. Fixes 84c3fc4e9c56 (?mm: thp: check pmd migration entry in common path?) cc: stable at vger.kernel.org # 4.14+> --- > mm/huge_memory.c | 42 +++++++++++++++++++++++------------------- > 1 file changed, 23 insertions(+), 19 deletions(-) > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index 2a468a4acb0a..606d712d9505 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -2023,7 +2023,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, > put_page(page); > add_mm_counter(mm, mm_counter_file(page), -HPAGE_PMD_NR); > return; > - } else if (is_huge_zero_pmd(*pmd)) { > + } else if (pmd_trans_huge(*pmd) && is_huge_zero_pmd(*pmd)) { > /* > * FIXME: Do we want to invalidate secondary mmu by calling > * mmu_notifier_invalidate_range() see comments below inside > @@ -2117,30 +2117,34 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, > pte = pte_offset_map(&_pmd, addr); > BUG_ON(!pte_none(*pte)); > set_pte_at(mm, addr, pte, entry); > - atomic_inc(&page[i]._mapcount); > - pte_unmap(pte); > - } > - > - /* > - * Set PG_double_map before dropping compound_mapcount to avoid > - * false-negative page_mapped(). > - */ > - if (compound_mapcount(page) > 1 && !TestSetPageDoubleMap(page)) { > - for (i = 0; i < HPAGE_PMD_NR; i++) > + if (!pmd_migration) > atomic_inc(&page[i]._mapcount); > + pte_unmap(pte); > } > > - lock_page_memcg(page); > - if (atomic_add_negative(-1, compound_mapcount_ptr(page))) { > - /* Last compound_mapcount is gone. */ > - __dec_lruvec_page_state(page, NR_ANON_THPS); > - if (TestClearPageDoubleMap(page)) { > - /* No need in mapcount reference anymore */ > + if (!pmd_migration) { > + /* > + * Set PG_double_map before dropping compound_mapcount to avoid > + * false-negative page_mapped(). > + */ > + if (compound_mapcount(page) > 1 && > + !TestSetPageDoubleMap(page)) { > for (i = 0; i < HPAGE_PMD_NR; i++) > - atomic_dec(&page[i]._mapcount); > + atomic_inc(&page[i]._mapcount); > + } > + > + lock_page_memcg(page); > + if (atomic_add_negative(-1, compound_mapcount_ptr(page))) { > + /* Last compound_mapcount is gone. */ > + __dec_lruvec_page_state(page, NR_ANON_THPS); > + if (TestClearPageDoubleMap(page)) { > + /* No need in mapcount reference anymore */ > + for (i = 0; i < HPAGE_PMD_NR; i++) > + atomic_dec(&page[i]._mapcount); > + } > } > + unlock_page_memcg(page); > } > - unlock_page_memcg(page); > > smp_wmb(); /* make pte visible before pmd */ > pmd_populate(mm, pmd, pgtable); > -- > 2.20.1? Best Regards, Yan Zi -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 854 bytes Desc: OpenPGP digital signature URL: <https://lists.freedesktop.org/archives/nouveau/attachments/20200902/ace18574/attachment.sig>
Kirill A. Shutemov
2020-Sep-03 12:54 UTC
[Nouveau] [PATCH v2 1/7] mm/thp: fix __split_huge_pmd_locked() for migration PMD
On Wed, Sep 02, 2020 at 09:58:24AM -0700, Ralph Campbell wrote:> A migrating transparent huge page has to already be unmapped. Otherwise, > the page could be modified while it is being copied to a new page and > data could be lost. The function __split_huge_pmd() checks for a PMD > migration entry before calling __split_huge_pmd_locked() leading one to > think that __split_huge_pmd_locked() can handle splitting a migrating PMD. > However, the code always increments the page->_mapcount and adjusts the > memory control group accounting assuming the page is mapped. > Also, if the PMD entry is a migration PMD entry, the call to > is_huge_zero_pmd(*pmd) is incorrect because it calls pmd_pfn(pmd) instead > of migration_entry_to_pfn(pmd_to_swp_entry(pmd)). > Fix these problems by checking for a PMD migration entry. > > Signed-off-by: Ralph Campbell <rcampbell at nvidia.com>Hm. Could you remind me what codepath splits migration PMD? Maybe it should wait until migration is complete? We could avoid a lot of complexity this way. -- Kirill A. Shutemov
Possibly Parallel Threads
- [PATCH v3 1/6] mm/thp: add prep_transhuge_device_private_page()
- [PATCH v3 1/6] mm/thp: add prep_transhuge_device_private_page()
- [PATCH v3 1/6] mm/thp: add prep_transhuge_device_private_page()
- [PATCH v3 3/6] mm: support THP migration to device private memory
- [PATCH v2 0/7] mm/hmm/nouveau: add THP migration to migrate_vma_*