David Hildenbrand
2025-Jan-29 11:57 UTC
[PATCH v1 0/4] mm: cleanups for device-exclusive entries (hmm)
This is a follow-up to [1], performing some related cleanups. There are more cleanups to be had, but I'll have to focus on some other stuff next. I might come back to that once I'm stuck on (annoyed by :) ) other things. Cc: Andrew Morton <akpm at linux-foundation.org> Cc: "J?r?me Glisse" <jglisse at redhat.com> Cc: Jonathan Corbet <corbet at lwn.net> Cc: Alex Shi <alexs at kernel.org> Cc: Yanteng Si <si.yanteng at linux.dev> Cc: Karol Herbst <kherbst at redhat.com> Cc: Lyude Paul <lyude at redhat.com> Cc: Danilo Krummrich <dakr at kernel.org> Cc: David Airlie <airlied at gmail.com> Cc: Simona Vetter <simona at ffwll.ch> Cc: "Liam R. Howlett" <Liam.Howlett at oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes at oracle.com> Cc: Vlastimil Babka <vbabka at suse.cz> Cc: Jann Horn <jannh at google.com> Cc: Pasha Tatashin <pasha.tatashin at soleen.com> Cc: Peter Xu <peterx at redhat.com> Cc: Alistair Popple <apopple at nvidia.com> Cc: Jason Gunthorpe <jgg at nvidia.com> [1] https://lkml.kernel.org/r/20250129115411.2077152-1-david at redhat.com David Hildenbrand (4): lib/test_hmm: make dmirror_atomic_map() consume a single page mm/mmu_notifier: drop owner from MMU_NOTIFY_EXCLUSIVE mm/memory: pass folio and pte to restore_exclusive_pte() mm/memory: document restore_exclusive_pte() drivers/gpu/drm/nouveau/nouveau_svm.c | 6 +-- include/linux/mmu_notifier.h | 4 +- include/linux/rmap.h | 2 +- lib/test_hmm.c | 35 ++++++----------- mm/memory.c | 54 +++++++++++++++++++-------- mm/rmap.c | 3 +- 6 files changed, 54 insertions(+), 50 deletions(-) -- 2.48.1
David Hildenbrand
2025-Jan-29 11:57 UTC
[PATCH v1 1/4] lib/test_hmm: make dmirror_atomic_map() consume a single page
The caller now always passes a single page; let's simplify, and return
"0" on success.
Signed-off-by: David Hildenbrand <david at redhat.com>
---
lib/test_hmm.c | 33 ++++++++++-----------------------
1 file changed, 10 insertions(+), 23 deletions(-)
diff --git a/lib/test_hmm.c b/lib/test_hmm.c
index 9e1b07a227a3..1c0a58279db9 100644
--- a/lib/test_hmm.c
+++ b/lib/test_hmm.c
@@ -706,34 +706,23 @@ static int dmirror_check_atomic(struct dmirror *dmirror,
unsigned long start,
return 0;
}
-static int dmirror_atomic_map(unsigned long start, unsigned long end,
- struct page **pages, struct dmirror *dmirror)
+static int dmirror_atomic_map(unsigned long addr, struct page *page,
+ struct dmirror *dmirror)
{
- unsigned long pfn, mapped = 0;
- int i;
+ void *entry;
/* Map the migrated pages into the device's page tables. */
mutex_lock(&dmirror->mutex);
- for (i = 0, pfn = start >> PAGE_SHIFT; pfn < (end >>
PAGE_SHIFT); pfn++, i++) {
- void *entry;
-
- if (!pages[i])
- continue;
-
- entry = pages[i];
- entry = xa_tag_pointer(entry, DPT_XA_TAG_ATOMIC);
- entry = xa_store(&dmirror->pt, pfn, entry, GFP_ATOMIC);
- if (xa_is_err(entry)) {
- mutex_unlock(&dmirror->mutex);
- return xa_err(entry);
- }
-
- mapped++;
+ entry = xa_tag_pointer(page, DPT_XA_TAG_ATOMIC);
+ entry = xa_store(&dmirror->pt, addr >> PAGE_SHIFT, entry,
GFP_ATOMIC);
+ if (xa_is_err(entry)) {
+ mutex_unlock(&dmirror->mutex);
+ return xa_err(entry);
}
mutex_unlock(&dmirror->mutex);
- return mapped;
+ return 0;
}
static int dmirror_migrate_finalize_and_map(struct migrate_vma *args,
@@ -803,9 +792,7 @@ static int dmirror_exclusive(struct dmirror *dmirror,
break;
}
- ret = dmirror_atomic_map(addr, addr + PAGE_SIZE, &page, dmirror);
- if (!ret)
- ret = -EBUSY;
+ ret = dmirror_atomic_map(addr, page, dmirror);
folio_unlock(folio);
folio_put(folio);
--
2.48.1
David Hildenbrand
2025-Jan-29 11:58 UTC
[PATCH v1 2/4] mm/mmu_notifier: drop owner from MMU_NOTIFY_EXCLUSIVE
We no longer get a MMU_NOTIFY_EXCLUSIVE on conversion with the owner set
that one has to filter out: if there already *is* a device-exclusive
entry (e.g., other device, we don't have that information), GUP will
convert it back to an ordinary PTE and notify via
remove_device_exclusive_entry().
Signed-off-by: David Hildenbrand <david at redhat.com>
---
drivers/gpu/drm/nouveau/nouveau_svm.c | 6 +-----
include/linux/mmu_notifier.h | 4 +---
include/linux/rmap.h | 2 +-
lib/test_hmm.c | 2 +-
mm/rmap.c | 3 +--
5 files changed, 5 insertions(+), 12 deletions(-)
diff --git a/drivers/gpu/drm/nouveau/nouveau_svm.c
b/drivers/gpu/drm/nouveau/nouveau_svm.c
index 39e3740980bb..4758fee182b4 100644
--- a/drivers/gpu/drm/nouveau/nouveau_svm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_svm.c
@@ -510,10 +510,6 @@ static bool nouveau_svm_range_invalidate(struct
mmu_interval_notifier *mni,
struct svm_notifier *sn container_of(mni, struct svm_notifier, notifier);
- if (range->event == MMU_NOTIFY_EXCLUSIVE &&
- range->owner == sn->svmm->vmm->cli->drm->dev)
- return true;
-
/*
* serializes the update to mni->invalidate_seq done by caller and
* prevents invalidation of the PTE from progressing while HW is being
@@ -609,7 +605,7 @@ static int nouveau_atomic_range_fault(struct nouveau_svmm
*svmm,
notifier_seq = mmu_interval_read_begin(¬ifier->notifier);
mmap_read_lock(mm);
- page = make_device_exclusive(mm, start, drm->dev, &folio);
+ page = make_device_exclusive(mm, start, &folio);
mmap_read_unlock(mm);
if (IS_ERR(page)) {
ret = -EINVAL;
diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
index d4e714661826..bac2385099dd 100644
--- a/include/linux/mmu_notifier.h
+++ b/include/linux/mmu_notifier.h
@@ -44,9 +44,7 @@ struct mmu_interval_notifier;
* owner field matches the driver's device private pgmap owner.
*
* @MMU_NOTIFY_EXCLUSIVE: to signal a device driver that the device will no
- * longer have exclusive access to the page. When sent during creation of an
- * exclusive range the owner will be initialised to the value provided by the
- * caller of make_device_exclusive(), otherwise the owner will be NULL.
+ * longer have exclusive access to the page.
*/
enum mmu_notifier_event {
MMU_NOTIFY_UNMAP = 0,
diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index 86425d42c1a9..3b216b91d2e5 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -664,7 +664,7 @@ void try_to_migrate(struct folio *folio, enum ttu_flags
flags);
void try_to_unmap(struct folio *, enum ttu_flags flags);
struct page *make_device_exclusive(struct mm_struct *mm, unsigned long addr,
- void *owner, struct folio **foliop);
+ struct folio **foliop);
/* Avoid racy checks */
#define PVMW_SYNC (1 << 0)
diff --git a/lib/test_hmm.c b/lib/test_hmm.c
index 1c0a58279db9..8520c1d1b21b 100644
--- a/lib/test_hmm.c
+++ b/lib/test_hmm.c
@@ -786,7 +786,7 @@ static int dmirror_exclusive(struct dmirror *dmirror,
struct folio *folio;
struct page *page;
- page = make_device_exclusive(mm, addr, &folio, NULL);
+ page = make_device_exclusive(mm, addr, &folio);
if (IS_ERR(page)) {
ret = PTR_ERR(page);
break;
diff --git a/mm/rmap.c b/mm/rmap.c
index 4acc9f6d743a..d99dbf59adc6 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -2397,7 +2397,6 @@ void try_to_migrate(struct folio *folio, enum ttu_flags
flags)
* make_device_exclusive() - Mark an address for exclusive use by a device
* @mm: mm_struct of associated target process
* @addr: the virtual address to mark for exclusive device access
- * @owner: passed to MMU_NOTIFY_EXCLUSIVE range notifier to allow filtering
* @foliop: folio pointer will be stored here on success.
*
* This function looks up the page mapped at the given address, grabs a
@@ -2421,7 +2420,7 @@ void try_to_migrate(struct folio *folio, enum ttu_flags
flags)
* Returns: pointer to mapped page on success, otherwise a negative error.
*/
struct page *make_device_exclusive(struct mm_struct *mm, unsigned long addr,
- void *owner, struct folio **foliop)
+ struct folio **foliop)
{
struct folio *folio, *fw_folio;
struct vm_area_struct *vma;
--
2.48.1
David Hildenbrand
2025-Jan-29 11:58 UTC
[PATCH v1 3/4] mm/memory: pass folio and pte to restore_exclusive_pte()
Let's pass the folio and the pte to restore_exclusive_pte(), so we
can avoid repeated page_folio() and ptep_get(). To do that,
pass the pte to try_restore_exclusive_pte() and use a folio in there
already.
While at it, just avoid the "swp_entry_t entry" variable in
try_restore_exclusive_pte() and add a folio-locked check to
restore_exclusive_pte().
Signed-off-by: David Hildenbrand <david at redhat.com>
---
mm/memory.c | 29 ++++++++++++++---------------
1 file changed, 14 insertions(+), 15 deletions(-)
diff --git a/mm/memory.c b/mm/memory.c
index cd689cd8a7c8..46956994aaff 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -719,14 +719,13 @@ struct folio *vm_normal_folio_pmd(struct vm_area_struct
*vma,
#endif
static void restore_exclusive_pte(struct vm_area_struct *vma,
- struct page *page, unsigned long address,
- pte_t *ptep)
+ struct folio *folio, struct page *page, unsigned long address,
+ pte_t *ptep, pte_t orig_pte)
{
- struct folio *folio = page_folio(page);
- pte_t orig_pte;
pte_t pte;
- orig_pte = ptep_get(ptep);
+ VM_WARN_ON_FOLIO(!folio_test_locked(folio), folio);
+
pte = pte_mkold(mk_pte(page, READ_ONCE(vma->vm_page_prot)));
if (pte_swp_soft_dirty(orig_pte))
pte = pte_mksoft_dirty(pte);
@@ -756,16 +755,15 @@ static void restore_exclusive_pte(struct vm_area_struct
*vma,
* Tries to restore an exclusive pte if the page lock can be acquired without
* sleeping.
*/
-static int
-try_restore_exclusive_pte(pte_t *src_pte, struct vm_area_struct *vma,
- unsigned long addr)
+static int try_restore_exclusive_pte(struct vm_area_struct *vma,
+ unsigned long addr, pte_t *ptep, pte_t orig_pte)
{
- swp_entry_t entry = pte_to_swp_entry(ptep_get(src_pte));
- struct page *page = pfn_swap_entry_to_page(entry);
+ struct page *page = pfn_swap_entry_to_page(pte_to_swp_entry(orig_pte));
+ struct folio *folio = page_folio(page);
- if (trylock_page(page)) {
- restore_exclusive_pte(vma, page, addr, src_pte);
- unlock_page(page);
+ if (folio_trylock(folio)) {
+ restore_exclusive_pte(vma, folio, page, addr, ptep, orig_pte);
+ folio_unlock(folio);
return 0;
}
@@ -871,7 +869,7 @@ copy_nonpresent_pte(struct mm_struct *dst_mm, struct
mm_struct *src_mm,
* (ie. COW) mappings.
*/
VM_BUG_ON(!is_cow_mapping(src_vma->vm_flags));
- if (try_restore_exclusive_pte(src_pte, src_vma, addr))
+ if (try_restore_exclusive_pte(src_vma, addr, src_pte, orig_pte))
return -EBUSY;
return -ENOENT;
} else if (is_pte_marker_entry(entry)) {
@@ -3979,7 +3977,8 @@ static vm_fault_t remove_device_exclusive_entry(struct
vm_fault *vmf)
vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, vmf->address,
&vmf->ptl);
if (likely(vmf->pte && pte_same(ptep_get(vmf->pte),
vmf->orig_pte)))
- restore_exclusive_pte(vma, vmf->page, vmf->address, vmf->pte);
+ restore_exclusive_pte(vma, folio, vmf->page, vmf->address,
+ vmf->pte, vmf->orig_pte);
if (vmf->pte)
pte_unmap_unlock(vmf->pte, vmf->ptl);
--
2.48.1
David Hildenbrand
2025-Jan-29 11:58 UTC
[PATCH v1 4/4] mm/memory: document restore_exclusive_pte()
Let's document how this function is to be used, and why the requirement for the folio lock might maybe be dropped in the future. Signed-off-by: David Hildenbrand <david at redhat.com> --- mm/memory.c | 25 +++++++++++++++++++++++++ 1 file changed, 25 insertions(+) diff --git a/mm/memory.c b/mm/memory.c index 46956994aaff..caaae8df11a9 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -718,6 +718,31 @@ struct folio *vm_normal_folio_pmd(struct vm_area_struct *vma, } #endif +/** + * restore_exclusive_pte - Restore a device-exclusive entry + * @vma: VMA covering @address + * @folio: the mapped folio + * @page: the mapped folio page + * @address: the virtual address + * @ptep: PTE pointer into the locked page table mapping the folio page + * @orig_pte: PTE value at @ptep + * + * Restore a device-exclusive non-swap entry to an ordinary present PTE. + * + * The folio and the page table must be locked, and MMU notifiers must have + * been called to invalidate any (exclusive) device mappings. In case of + * fork(), MMU_NOTIFY_PROTECTION_PAGE is triggered, and in case of a page + * fault MMU_NOTIFY_EXCLUSIVE is triggered. + * + * Locking the folio makes sure that anybody who just converted the PTE to + * a device-private entry can map it into the device, before unlocking it; so + * the folio lock prevents concurrent conversion to device-exclusive. + * + * TODO: the folio lock does not protect against all cases of concurrent + * page table modifications (e.g., MADV_DONTNEED, mprotect), so device drivers + * must already use MMU notifiers to sync against any concurrent changes + * Maybe the requirement for the folio lock can be dropped in the future. + */ static void restore_exclusive_pte(struct vm_area_struct *vma, struct folio *folio, struct page *page, unsigned long address, pte_t *ptep, pte_t orig_pte) -- 2.48.1